[Larceny-users] turning on optimizations

Discussion:

Marijn Schouten (hkBst)

2009-03-15 11:52:31 UTC

Hi,

I am trying to get more performance from my program by turning on optimizations.
However performance seems to be largely unaffected.

No extra optimization directives:
$ larceny -- -e "(require 'srfi-0)" ising.scm -e "(time (main 10 100))" -e "(quit)"
Larceny v0.97a4 (alpha test) (Mar 13 2009 15:30:01, precise:Linux:unified)
<snip />
Words allocated: 341821384
Words reclaimed: 0
Elapsed time...: 4782 ms (User: 4575 ms; System: 32 ms)
Elapsed GC time: 22 ms (CPU: 35 in 326 collections.)

All possible optimizations turned on:
$ larceny -- -e "(require 'srfi-0)(compiler-switches
'fast-unsafe)(benchmark-block-mode #t)" ising.scm -e "(time (main 10 100))" -e
"(quit)"
Larceny v0.97a4 (alpha test) (Mar 13 2009 15:30:01, precise:Linux:unified)
<snip />
Words allocated: 341821874
Words reclaimed: 0
Elapsed time...: 4714 ms (User: 4498 ms; System: 38 ms)
Elapsed GC time: 23 ms (CPU: 24 in 326 collections.)

Turning on similar optimizations in Gambit resulted in a twofold speed increase.

Then I tried to precompile my program with all optimizations:
Larceny v0.97a4 (alpha test) (Mar 13 2009 15:30:01, precise:Linux:unified)

Error: pass1-block must be modified to deal with the new regime for syntax
environments.
Entering debugger; type "?" for help.
debug>

Am I doing it wrong or is there something that is just not implemented yet here?

Thanks,

Marijn

PS My code is here
http://dynamo.iro.umontreal.ca/~gambit/wiki/images/9/90/Ising-20090315.scm

- --
Gods do not want you to think, lest they lose existence.
Religions do not want you to think, lest they lose power.

Marijn Schouten (hkBst), Gentoo Lisp project, Gentoo ML
<http://www.gentoo.org/proj/en/lisp/>, #gentoo-{lisp,ml} on FreeNode

William D Clinger

2009-03-15 21:25:57 UTC

Permalink

Post by Marijn Schouten (hkBst)
Am I doing it wrong or is there something that is just not
implemented yet here?

Larceny's benchmark-block-mode doesn't work, and hasn't for
some time. The incremental compiler just ignores that switch,
but compile-file reports the error you encountered if that
switch is turned on. We should delete that switch until we
fix it again. I apologize for the time you wasted on it.

For a small program like the one you posted, you can get the
effect of block compilation by enclosing the definitions of
your program within a let, like this:

(define main
(let ()
(define (SPIN+) +1)
...
(define (main delta-t measurements)
...)
main))

For your program, in Larceny, that made little difference.
Gambit is probably performing some optimizations that Larceny
isn't. For example, Gambit's compiler may notice that (N)
always returns the same result; Larceny's compiler doesn't.

Post by Marijn Schouten (hkBst)
(require 'Debugger/profile)
(profile (main 10 100))

That profile told me your program was spending a quarter of
its time in half-delta-E-for-flip and another quarter in expt.

Changing (N) to compute (expt (L) (D)) only once made your
program run 20% faster.

I noticed some mixed-mode arithmetic, so I used fixnum-specific
and flonum-specific operations to detect and then to eliminate
(most of) the mixed-mode arithmetic. I also replaced the two
uses of the (apply + (map ... ...)) idiom with loops. Those
changes made your program run about 40% faster in Larceny, and
would probably improve its performance in Gambit as well.

After those changes, Larceny's profiler told me the program was
spending 90% of its time in the init-sample procedure, which
includes the time spent in random-real. That probably means
its performance in Larceny, following those changes, is limited
by the speed of Larceny's implementation of random-real.

Will

Marijn Schouten (hkBst)

2009-03-16 10:01:34 UTC

Permalink

Hi Will,

thanks for looking my program over and if you don't read anything else:
Please mail me the modifications you made.

Post by William D Clinger

Post by Marijn Schouten (hkBst)
Am I doing it wrong or is there something that is just not
implemented yet here?

It was my pleasure and a risk I accepted when I decided to use prerelease software.

Post by William D Clinger
For a small program like the one you posted, you can get the
effect of block compilation by enclosing the definitions of
(define main
(let ()
(define (SPIN+) +1)
...
(define (main delta-t measurements)
...)
main))
For your program, in Larceny, that made little difference.
Gambit is probably performing some optimizations that Larceny
isn't. For example, Gambit's compiler may notice that (N)
always returns the same result; Larceny's compiler doesn't.

I was wondering about that. I recently changed all the global constants to
functions in an attempt to help the compiler recognize that these things are
just constants. In C these would be macros, in C++ constant variables, but I
don't know what the best way is to express the same thing in Scheme. I know I
can use macros to do it, but I was trying to avoid that as it is supposedly not
good to use macros just for optimization. But in the end I am trying harder to
avoid using C++ ;P

Post by William D Clinger

Post by Marijn Schouten (hkBst)
(require 'Debugger/profile)
(profile (main 10 100))

That profile told me your program was spending a quarter of
its time in half-delta-E-for-flip and another quarter in expt.

All calls to expt are outside the main loop which does repeated calls to exp
which are hopefully constant folded and inlined such that (random-real) is
compared to a precomputed constant.

Post by William D Clinger
Changing (N) to compute (expt (L) (D)) only once made your
program run 20% faster.

What is the best way to do that? Macros, force/delay, some other method,
explicit set! maybe?

Post by William D Clinger
I noticed some mixed-mode arithmetic, so I used fixnum-specific
and flonum-specific operations to detect and then to eliminate
(most of) the mixed-mode arithmetic. I also replaced the two
uses of the (apply + (map ... ...)) idiom with loops. Those
changes made your program run about 40% faster in Larceny, and
would probably improve its performance in Gambit as well.
After those changes, Larceny's profiler told me the program was
spending 90% of its time in the init-sample procedure, which
includes the time spent in random-real. That probably means
its performance in Larceny, following those changes, is limited
by the speed of Larceny's implementation of random-real.

Do you really mean init-sample here? It is commented out in my code sample.
Maybe you meant init? In any case it is only run once. The only things that I
really care about are time-step which is called all the time and secondly
measure which is called at a designated interval.If those are fully optimized
the performance of longer runs and bigger samples will be acceptable.

Post by William D Clinger
Will

Please mail me the modifications you made.

Thanks a lot,

Marijn

- --
Gods do not want you to think, lest they lose existence.
Religions do not want you to think, lest they lose power.

Marijn Schouten (hkBst), Gentoo Lisp project, Gentoo ML
<http://www.gentoo.org/proj/en/lisp/>, #gentoo-{lisp,ml} on FreeNode

William D Clinger

2009-03-16 14:32:43 UTC

Permalink

Post by Marijn Schouten (hkBst)
Do you really mean init-sample here?

My mistake! I was confused by a defect in the profiler.
My transformation of your code (to mimic block compilation)
turned your entire program into an anonymous closure, which
(being nameless) would never show up in the profiler! The
loop1 I was seeing in the profile was not the loop1 in your
code (after my renaming of loops), but a loop1 in Larceny,
probably in the REPL.

I'll look at the execution profile for your program some
more after I fix the profiler to be less confusing.

Post by Marijn Schouten (hkBst)
I was wondering about that. I recently changed all the global
constants to functions in an attempt to help the compiler
recognize that these things are just constants.

Compilers are more likely to recognize them as constants if
they're constants.

Post by Marijn Schouten (hkBst)
In C these would be macros, in C++ constant variables, but I
don't know what the best way is to express the same thing in
Scheme.

I think it's okay to use macros for this, but your reliance
on constant-folding was okay too.

The problem is that not all Scheme compilers fold constants.
In particular, Twobit (Larceny's compiler) folds some constants
(e.g. sums and differences of small exact integers) but is very
conservative about constant folding because Twobit is often used
as a cross-compiler, where the host system may be something other
than Larceny. Some host systems may not support bignums, or may
use floating-point precisions other than IEEE double. To avoid
those problems, Twobit never folds operations that involve
inexact constants, and never folds operations whose results
might be bignums. That's why Twobit never performs constant
folding for exp. Twobit probably should fold (expt 40 2), but
expt is not currently one of the operations that Twobit folds,
mainly because expt is likely to involve bignums, non-integral
rationals, floating point, or non-real numbers.

Post by Marijn Schouten (hkBst)
What is the best way to do that? Macros, force/delay, some other method,
explicit set! maybe?

I'd suggest you define procedures that compute the values of
computed constants from the input parameters, and use those
procedures to define constants. The rest of your program
would then use those defined constants instead of the procedures
that computed them.

Will