[Larceny-users] profiling code

Discussion:

Eduardo Cavazos

2009-02-14 12:55:07 UTC

Will,

I noticed your note on comp.lang.scheme from today where you say:

----------------------------------------------------------------------
In addition to the portability problems, I should
have mentioned that, after making the many changes
to the benchmark that were necessary to make it
run in Larceny, the benchmark spent about 95% of
its time in the random and expt procedures.
----------------------------------------------------------------------

I'm interested in finding out where *my* code is spending all it's time.
:-) Is there a profiler for Larceny? Available in ERR5RS mode?

Ed

William D Clinger

2009-02-14 18:37:39 UTC

Permalink

Post by Eduardo Cavazos
I'm interested in finding out where *my* code is spending all it's time.
:-) Is there a profiler for Larceny?

Yes, but it has very crude resolution. In R5RS mode:

--------

Post by Eduardo Cavazos
(require 'profile)

Post by Eduardo Cavazos
(run-with-profiling

(lambda ()
(let* ((x1 (vector->list (make-vector 1000000 13)))
(x2 (append x1 '()))
(y (vector->list (make-vector 1000000 x1)))
(z (vector->list (make-vector 1000000 x2))))
(equal? y z))))
% topmost named procedure
98 equal?
2 run-with-profiling

% active procedures
100 r5rs-entry-point
100 repl
100 run-with-profiling
98 equal?
--------

Post by Eduardo Cavazos
Available in ERR5RS mode?

Yes. By the way, the following example generalizes
to work with any R5RS library that doesn't export
macros.

--------

Post by Eduardo Cavazos
(import (rnrs base)

(primitives r5rs:require run-with-profiling))

Post by Eduardo Cavazos
(r5rs:require 'profile)

Post by Eduardo Cavazos
(run-with-profiling

(lambda ()
(let* ((x1 (vector->list (make-vector 1000000 13)))
(x2 (append x1 '()))
(y (vector->list (make-vector 1000000 x1)))
(z (vector->list (make-vector 1000000 x2))))
(equal? y z))))
% topmost named procedure
97 equal?
2 run-with-profiling
1 append

% active procedures
100 r5rs-entry-point
100 repl
100 run-with-profiling
97 equal?
1 append
--------

Will

Eduardo Cavazos

2009-02-14 21:07:48 UTC

Permalink

Post by Eduardo Cavazos
I'm interested in finding out where *my* code is spending all it's time.
:-) Is there a profiler for Larceny?

Thank you! This is a big help.

My suspicions have been confirmed. :-)

So what do the numbers mean? There are two tables which are output each
time.

Below are the results of running the profiler on some of my code 4
times. 'hsva' is pretty consistently low in the first table while high
in the second. 'get-modelview-matrix' is middle to high in both tables.

If anybody is curious to see the code being profiled, it's here:

http://proteus.freeshell.org/_profiling-spirales-a.scm

I normally use macros and a nice framework for running cfdg models. The
above is a manual expansion of the macro. Also, the work is actually
done in a procedure which is registered with OpenGL as a callback. So
instead of registering it, I run it explicitly and profile it.

Ed

----------------------------------------------------------------------
% topmost named procedure
28 run-with-profiling
9 \x1;get-modelview-matrix
8 \x1;double-vector
7 map
7 big-multiply-through-by
5 \x1;gl-flip
5 ffi/apply
4 list->vector
4 \x1;->
4 number-type
2 big*-
2 flonum-check
2 div-and-mod
2 big-divide-digits
2 fixnum->bignum
2 %flonum->bignum
2 big-limited-normalize!
2 expt
1 \x1;hsva
1 dynamic-wind
1 syscall
1 %bignum-length
1 bignum-alloc

% active procedures
100 r5rs-entry-point
100 repl
100 run-with-profiling
22 \x1;hsva
20 %flonum->integer
20 %flonum->bignum
19 \x1;gl-flip
13 \x1;double-vector
13 bignum-quotient
13 big-divide-digits
11 \x1;get-modelview-matrix
8 map
7 big*-
7 big-multiply-through-by
5 ffi/apply
4 \x1;vec
4 list->vector
4 \x1;->
4 number-type
3 div-and-mod
3 fixnum->bignum
3 expt
2 flonum-check
2 big-limited-normalize!
1 dynamic-wind
1 \x1;gl-color
1 syscall
1 %bignum-length
1 bignum-alloc
----------------------------------------------------------------------
% topmost named procedure
37 run-with-profiling
9 map
7 big-multiply-through-by
6 \x1;->
5 ffi/apply
5 \x1;double-vector
4 \x1;get-modelview-matrix
4 flonum-check
4 big-divide-digits
3 dynamic-wind
2 \x1;gl-flip
2 big*-
2 list->vector
2 number-type
2 expt
2 \x1;hsva
1 bignum-quotient
1 big-normalize!
1 %bignum-length
1 div-and-mod
1 big-limited-normalize!
1 syscall
1 bignum->fixnum

% active procedures
100 r5rs-entry-point
100 repl
100 run-with-profiling
21 \x1;hsva
19 %flonum->integer
19 %flonum->bignum
15 bignum-quotient
14 big-divide-digits
10 \x1;gl-flip
10 map
8 \x1;double-vector
7 \x1;get-modelview-matrix
7 big-multiply-through-by
6 \x1;->
5 ffi/apply
4 flonum-check
3 big*-
3 dynamic-wind
2 \x1;vec
2 list->vector
2 number-type
2 expt
1 big-normalize!
1 bignum-zero?
1 %bignum-length
1 div-and-mod
1 big-limited-normalize!
1 syscall
1 bignum->fixnum
----------------------------------------------------------------------
% topmost named procedure
26 run-with-profiling
9 map
7 \x1;get-modelview-matrix
7 ffi/apply
6 \x1;->
6 big-multiply-through-by
5 \x1;double-vector
5 \x1;gl-flip
4 big-divide-digits
3 flonum-check
3 list->vector
3 number-type
2 big*-
2 div-and-mod
2 fixnum->bignum
2 %bignum-length
2 big-limited-normalize!
2 expt
1 bignum-quotient
1 big-normalize!
1 \x1;hsva
1 syscall
1 bignum->fixnum
1 %flonum->bignum
1 bignum-alloc

% active procedures
100 r5rs-entry-point
100 repl
100 run-with-profiling
25 \x1;hsva
24 %flonum->integer
24 %flonum->bignum
16 bignum-quotient
15 big-divide-digits
14 \x1;gl-flip
12 map
9 \x1;get-modelview-matrix
8 \x1;double-vector
7 ffi/apply
6 \x1;->
6 big-multiply-through-by
5 big*-
3 flonum-check
3 \x1;vec
3 list->vector
3 number-type
3 fixnum->bignum
3 expt
2 div-and-mod
2 %bignum-length
2 big-limited-normalize!
1 big-normalize!
1 bignum-zero?
1 \x1;gl-color
1 syscall
1 bignum->fixnum
1 bignum-alloc
----------------------------------------------------------------------
% topmost named procedure
32 run-with-profiling
8 map
7 \x1;double-vector
7 \x1;get-modelview-matrix
6 ffi/apply
6 big-multiply-through-by
5 \x1;gl-flip
4 dynamic-wind
3 number-type
3 list->vector
3 \x1;->
3 fixnum->bignum
3 big-divide-digits
2 big*-
2 %flonum->bignum
2 div-and-mod
1 collect-arguments
1 expt
1 big-limited-normalize!
1 syscall
1 flonum-check

% active procedures
100 r5rs-entry-point
100 repl
100 run-with-profiling
18 \x1;hsva
17 %flonum->integer
17 %flonum->bignum
17 \x1;gl-flip
12 bignum-quotient
12 big-divide-digits
11 \x1;double-vector
11 \x1;get-modelview-matrix
8 map
6 ffi/apply
6 big-multiply-through-by
5 big*-
4 dynamic-wind
3 number-type
3 div-and-mod
3 \x1;vec
3 list->vector
3 \x1;->
3 fixnum->bignum
2 \x1;gl-color
1 collect-arguments
1 expt
1 big-limited-normalize!
1 syscall
1 flonum-check
----------------------------------------------------------------------

Eduardo Cavazos

2009-02-16 07:57:10 UTC

Permalink

Post by William D Clinger

Post by Eduardo Cavazos
(require 'profile)

Post by Eduardo Cavazos
(run-with-profiling

(lambda ()
(let* ((x1 (vector->list (make-vector 1000000 13)))
(x2 (append x1 '()))
(y (vector->list (make-vector 1000000 x1)))
(z (vector->list (make-vector 1000000 x2))))
(equal? y z))))

It would be nice if 'run-with-profiling' returned the result of the
calling the thunk, so as to make it easier to slip into existing code.

It would also be nice to have a 'profile' macro which wrapped the
expression in a thunk for you.

With those two changes, 'profile' would act much like 'time'.

This works for me:

----------------------------------------------------------------------

(define (run-with-profiling* thunk)
(define val #f)
(if (not (procedure? thunk))
(assertion-violation 'run-with-profiling
"argument should be thunk"
thunk))
(reset-profiler!)
(start-profiler!)
(set! val (thunk))
(stop-profiler!)
(report-profiler!)

val)

(define-macro (profile expr)
`(run-with-profiling* (lambda () ,expr)))
----------------------------------------------------------------------

If they are sound, can we add them to 'profile.sch'?

Ed

William D Clinger

2009-02-16 01:10:42 UTC

Permalink

Post by Eduardo Cavazos
So what do the numbers mean? There are two tables
which are output each time.

Roughly speaking, the first table shows the percentage
of time each procedure is executing its own code. The
second table shows the percentage of time each procedure
is executing its own code or is suspended while waiting
for the result of some procedure it calls. Hence the
percentages for the second table will always be at least
as high as for the first table.

Your profiles show that about 20% of the time is being
spent in %flonum->bignum. I doubt whether your problem
really needs any bignum arithmetic at all, so I suspect
that you are using exact numbers for calculations that
were intended to use inexact numbers.

That, I think, is the most likely reason Larceny is
running slower than Ypsilon. Interpreted systems
often perform well on benchmarks that do a lot of
bignum arithmetic, because there is no interpretation
involved during the bignum calculations themselves.
Furthermore Larceny's bignum arithmetic is known to be
especially slow, so an interpreter can easily be faster
than Larceny on a bignum-intensive benchmark.

I'd suggest you go through your code and convert every
arithmetic operation that should be taking inexact
arguments into the corresponding flonum-specific
operation. For example, change + to fl+. Then you'll
get an exception every time an exact argument is passed
to one of the flonum-specific operations, and that will
make it easier for you to track down the exact numbers
that (in all likelihood) are making the program run so
slowly.

The flonum-specific operations should be significantly
faster than the generic operations, so the transformed
program (after you have debugged it) should run faster
in both Larceny and in Ypsilon.

Will

Eduardo Cavazos

2009-02-16 05:50:57 UTC

Permalink

Post by William D Clinger

Post by Eduardo Cavazos
So what do the numbers mean? There are two tables
which are output each time.

Your profiles show that about 20% of the time is being
spent in %flonum->bignum. I doubt whether your problem
really needs any bignum arithmetic at all

Yes, I was quite surprised to see bignum related procedures in the
profile reports.

Post by William D Clinger
, so I suspect
that you are using exact numbers for calculations that
were intended to use inexact numbers.

Eh... I know that it's good to avoid bignum arithmetic where possible.
But I didn't know that using exact numbers (i.e. integer literals) will
throw Larceny into "bignum mode". Isn't there a middle ground for fixnums?

So to be sure, are you saying that if I have a literal '1' in my code, I
should change it to '0.0'?

Ed

William D Clinger

2009-02-16 15:17:20 UTC

Permalink

Post by Eduardo Cavazos
It would be nice if 'run-with-profiling' returned the result of the
calling the thunk, so as to make it easier to slip into existing code.

I have logged this request as ticket #610.

Will

William D Clinger

2009-02-16 17:25:39 UTC

Permalink

Post by Eduardo Cavazos
Yes, I was quite surprised to see bignum related procedures in the
profile reports.

That was probably a performance bug in Larceny. (See below.)

Post by Eduardo Cavazos
Eh... I know that it's good to avoid bignum arithmetic where possible.
But I didn't know that using exact numbers (i.e. integer literals) will
throw Larceny into "bignum mode". Isn't there a middle ground for fixnums?

If x is an inexact real (flonum), then (exact x) is an exact
rational whose computation generally involves bignum arithmetic,
because IEEE double precision numbers have a 53-bit significand.

If x is an inexact integer in the fixnum range, however, then
Larceny could recognize that special case and avoid the bignum
arithmetic. Larceny wasn't recognizing that special case. As
of changeset:6062, it does. That special case is now over
twice as fast as the general case.

For your program, you have to convert the inexact results of
a computation into the exact coordinates of a pixel. If you
round the inexact results before you convert them to exact,
then you are exercising the special case.

Post by Eduardo Cavazos
So to be sure, are you saying that if I have a literal '1' in my
code, I should change it to '0.0'?

More likely to '1.0', but you have to be careful. If the
constant is an input to a calculation using inexact numbers,
then the constant should be inexact. If the constant is an
input to a calculation that should use exact numbers, then
the constant should be exact.

You had remarked earlier that "it's no fun to litter the
code with" exact->inexact, but you may need to do some of
that to avoid unnecessary mixed-mode arithmetic, which is
expensive. On the other hand, it may be that all of the
bignum arithmetic was coming from Larceny's failure to
recognize the special case; if so, then the change I just
made will eliminate the bignum arithmetic.

Will

Eduardo Cavazos

2009-02-16 18:13:03 UTC

Permalink

Post by William D Clinger
If x is an inexact integer in the fixnum range, however, then
Larceny could recognize that special case and avoid the bignum
arithmetic. Larceny wasn't recognizing that special case. As
of changeset:6062, it does. That special case is now over
twice as fast as the general case.

I updated from svn and rebuilt larceny. I also spent some time yesterday
implementing your suggestions regarding eliminating mixed
fixnum/flonum arithmetic.

There are three profile reports shown below for spirales. They are
drastically different from the original ones. No mention of bignum! :-)

Thanks for the fix!

Ed

----------------------------------------------------------------------
Words allocated: 41940382
Words reclaimed: 0
Elapsed time...: 11333 ms (User: 9521 ms; System: 1552 ms)
Elapsed GC time: 19 ms (CPU: 8 in 40 collections.)

% topmost named procedure
34 \x0;line~19CqCO~1267
18 \x1;get-modelview-matrix
16 map
9 \x1;->
7 ffi/apply
4 div-and-mod
4 flonum-check
3 dynamic-wind
3 \x1;vector-of-floats->bytevector
1 syscall
1 \x1;gl-flip

% active procedures
100 r5rs-entry-point
100 repl
100 \x1;run-with-profiling*
100 run-with-stats
100 \x0;line~19CqCO~1267
22 \x1;get-modelview-matrix
16 map
9 \x1;->
7 ffi/apply
5 \x1;gl-flip
4 div-and-mod
4 flonum-check
3 dynamic-wind
3 \x1;vector-of-floats->bytevector
1 syscall
----------------------------------------------------------------------
Words reclaimed: 0
Elapsed time...: 11238 ms (User: 9313 ms; System: 1732 ms)
Elapsed GC time: 19 ms (CPU: 16 in 40 collections.)

% topmost named procedure
44 \x0;line~19CqDy~1267
14 map
10 ffi/apply
8 \x1;get-modelview-matrix
7 \x1;->
6 \x1;gl-flip
4 dynamic-wind
3 \x1;vector-of-floats->bytevector
1 expt
1 flonum-check
1 div-and-mod

% active procedures
100 r5rs-entry-point
100 repl
100 \x1;run-with-profiling*
100 run-with-stats
100 \x0;line~19CqDy~1267
14 map
14 \x1;get-modelview-matrix
10 ffi/apply
8 \x1;gl-flip
7 \x1;->
4 dynamic-wind
3 \x1;vector-of-floats->bytevector
1 %flonum->integer
1 expt
1 flonum-check
1 div-and-mod
----------------------------------------------------------------------
Words allocated: 41940356
Words reclaimed: 0
Elapsed time...: 11592 ms (User: 9613 ms; System: 1796 ms)
Elapsed GC time: 21 ms (CPU: 16 in 40 collections.)
% topmost named procedure
43 \x0;line~19CqE4~1267
12 map
10 \x1;get-modelview-matrix
10 \x1;->
8 ffi/apply
6 \x1;gl-flip
4 dynamic-wind
3 \x1;vector-of-floats->bytevector
1 expt
1 flonum-check
1 div-and-mod

% active procedures
100 r5rs-entry-point
100 repl
100 \x1;run-with-profiling*
100 run-with-stats
100 \x0;line~19CqE4~1267
15 \x1;get-modelview-matrix
12 map
10 \x1;->
8 ffi/apply
8 \x1;gl-flip
4 dynamic-wind
3 \x1;vector-of-floats->bytevector
1 %flonum->integer
1 expt
1 flonum-check
1 div-and-mod
----------------------------------------------------------------------