[Larceny-users] Performance Of SHA-256 Implementation

Discussion:

Ray Racine

2010-07-05 21:35:26 UTC

I coded up a somewhat naive literal translation of the the SHA-256 hash algo
along with the corresponding HMAC using Larceny (Funny In Head) ERR5RS. I'm
actually using a slightly modified build from the latest trunk.

http://github.com/RayRacine/rl3/blob/master/rl3/crypto/hash/sha256.sls

Here is the transcript of an example from Amazon's API doc which requires a
SHA-256 HMAC.

(import (rnrs)
(rl3 crypto hash sha256)
(rl3 crypto base64)
(primitives time))

(define msg "GET\nwebservices.amazon.com
\n/onca/xml\nAWSAccessKeyId=00000000000000000000&ItemId=0679722769&Operation=ItemLookup&ResponseGroup=ItemAttributes%2COffers%2CImages%2CReviews&Service=AWSECommerceService&Timestamp=2009-01-01T12%3A00%3A00Z&Version=2009-01-06")
(define key "1234567890")
(time (base64-encode (hmac-sha256 key msg)))

Words allocated: 4194032

Words reclaimed: 0
Elapsed time...: 998 ms (User: 981 ms; System: 1 ms)
Elapsed GC time: 0 ms (CPU: 2 in 4 collections.)
"Nace+U3Az4OhN7tISqgs1vdLBHBEijWcBeCqL5xN9xg="

The answer is correct, however, the 1 sec calculation time is a bit longer
than anticipated along with the 4 million allocated words. I understand why
bitwise manipulations are going to be slower in dynamically typed language
with type tag bits. Thought I'd toss it out in the hopes someone may spot a
low hanging fruit change in my implementation, recommend some compiler
options etc. At a minimum it might make a decent benchmark for grading
(rnrs arithmetic bitwise) implementations.

If nothing obvious is noted I'll go with an FFI solution.

Ray

P.S. Noticed the Larceny SVN and mail list has been a bit quiet. Anything
new and notable coming down on the Larceny front? 64Bit? Native threads?

--
The object of life is not to be on the side of the majority, but to escape
finding oneself in the ranks of the insane. - Marcus Aurelius

Felix S Klock II (Larceny-users list proxy)

2010-07-06 12:41:34 UTC

Permalink

Ray (cc'ing larceny-users)-

If you want some inspiration as to what changes one might make to
bit-twiddling code to make it go faster in Larceny, you could look at:

http://pnkfif.blogspot.com/2009/08/from-90sec-to-9sec.html

which documents what optimizations I tried applying to some code I was
playing with, and what effect each had, if any.

In particular, I noticed the following comment in the code:

;; - Replacing fxarithmetic-shift-left and fxarithmetic-shift-right
with fxlsh and fxrsh
;; was huge; brought time down to 40.5sec

(that was from 77.7sec). That one looks like it might be applicable
to you: I do not know if the compiler is smart about optimizing
bitwise-arithmetic-shift-* the way that it can with fx[lr]sh. (Of
course using fx[lr]sh might require significant refactoring of one's
code...)

The other big leap in my code was for me to get rid of uses of
quotient and remainder; you should try replacing your use of
fxdiv-and-mod with appropriate shifting operators, since you are only
dividing by 64 anyway. (There's also a comment in my code saying that
using fxdiv-and-mod made its performance worse.)

As for the svn and list being quiet, I've been pretty distracted by my
new job (working on Tamarin [1]; ), and most of my work on Larceny has
been on a private branch that I've been keeping in a local git
repository. I should sync it up with the main repository by the end
of summer.

-Felix

[1] Tamarin is Adobe's Actionscript Virtual Machine, used in the Flash
Player. See http://www.mozilla.org/projects/tamarin/

Post by Ray Racine
I coded up a somewhat naive literal translation of the the SHA-256 hash algo
along with the corresponding HMAC using Larceny (Funny In Head) ERR5RS. I'm
actually using a slightly modified build from the latest trunk.
http://github.com/RayRacine/rl3/blob/master/rl3/crypto/hash/sha256.sls
Here is the transcript of an example from Amazon's API doc which requires a
SHA-256 HMAC.
(import (rnrs)
(rl3 crypto hash sha256)
(rl3 crypto base64)
(primitives time))
(define msg
"GET\nwebservices.amazon.com\n/onca/xml\nAWSAccessKeyId=00000000000000000000&ItemId=0679722769&Operation=ItemLookup&ResponseGroup=ItemAttributes%2COffers%2CImages%2CReviews&Service=AWSECommerceService&Timestamp=2009-01-01T12%3A00%3A00Z&Version=2009-01-06")
(define key "1234567890")
(time (base64-encode (hmac-sha256 key msg)))

Words allocated: 4194032

Words reclaimed: 0
Elapsed time...: 998 ms (User: 981 ms; System: 1 ms)
Elapsed GC time: 0 ms (CPU: 2 in 4 collections.)
"Nace+U3Az4OhN7tISqgs1vdLBHBEijWcBeCqL5xN9xg="
The answer is correct, however, the 1 sec calculation time is a bit longer
than anticipated along with the 4 million allocated words. I understand why
bitwise manipulations are going to be slower in dynamically typed language
with type tag bits. Thought I'd toss it out in the hopes someone may spot a
low hanging fruit change in my implementation, recommend some compiler
options etc. At a minimum it might make a decent benchmark for grading
(rnrs arithmetic bitwise) implementations.
If nothing obvious is noted I'll go with an FFI solution.
Ray
P.S. Noticed the Larceny SVN and mail list has been a bit quiet. Anything
new and notable coming down on the Larceny front? 64Bit? Native threads?
--
The object of life is not to be on the side of the majority, but to escape
finding oneself in the ranks of the insane. - Marcus Aurelius
_______________________________________________
Larceny-users mailing list
https://lists.ccs.neu.edu/bin/listinfo/larceny-users

William D Clinger

2010-07-06 16:34:36 UTC

Permalink

Post by Ray Racine
The answer is correct, however, the 1 sec calculation time is a bit longer
than anticipated along with the 4 million allocated words. I understand why
bitwise manipulations are going to be slower in dynamically typed language
with type tag bits. Thought I'd toss it out in the hopes someone may spot a
low hanging fruit change in my implementation, recommend some compiler
options etc. At a minimum it might make a decent benchmark for grading
(rnrs arithmetic bitwise) implementations.

Thanks. Felix has already pointed you to the workarounds
appropriate for the current system. Some of the allocation
may come from bignums generated as intermediate values.

There is hope for improvement in future releases. Most of
the current inefficiency comes from implementing the R6RS
semantics without regard for performance. The great primop
cleanup began with v0.97 "Funny in the Head", but only a
few bitwise operations were improved, and almost all of the
improvements applied mainly to bignums, not fixnums. Most
of the changes in the next release are likely to consist of
minor bug fixes and performance improvements. The bitwise
and bytevector operations are among the lowest-hanging fruit
for that release.

Post by Ray Racine
P.S. Noticed the Larceny SVN and mail list has been a bit quiet. Anything
new and notable coming down on the Larceny front? 64Bit? Native threads?

Our main priority right now is Felix's thesis defense and
completing transfer of his maintenance responsibilities to
me. So far as new releases of Larceny go, integrating his
new garbage collector is the first priority. A 64-bit
Larceny is essential, but we'll do a 64-bit Petit Larceny
first because Sassy doesn't support the 64-bit instructions
we need, and coming up with the 64-bit runtime will be easier
if we don't have to change the compiler and assembler at the
same time. I think 64 bits is a higher priority than native
threads.