James E Keenan on 11 Jan 2004 05:03:43 -0000


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: What use is the Schwartzian Transform?


On a cold Saturday night in January, Mark Jason Dominus 
<mjd-perl-pm@plover.com> wrote:
>
> # Code A
> @sorted = sort { -M $b <=> -M $a } @filenames;
>
> This is the obvious, natural way to sort a list of filenames into
> order by last modification date.  For long lists, it can be slow,
> because the -M operation is done many times for each file.

Apparently a list must be _very_ long for the cost of the -M operations 
to add up.  See below.
>
> # Code B
> @sorted = map { $_->[0] }
> sort { $b->[1] <=> $a->[1] }
> map { [$_, -M $_] } @filenames;
>
> # Code C
> my %date;
> $date{$_} = -M $_ for @filenames;
> @sorted = sort { $date{$b} <=> $date{$a} } @filenames; undef %date;
>
> The obvious drawback of C compared with B is that it's fractionally
> slower; about 5%.  

I got a much larger differential than 5%; see below.

>
> # Code D
> @sorted = map  $_->[0],
> sort { $b->[1] <=> $a->[1] }
> map [$_, -M $_], @filenames;
>
>
> So here's the question:  why do we bother with B at all?  Why not
> always use C? -

I decided to apply Benchmark to these various approaches.  I first 
compiled a list of 9952 filenames, then sorted them 10**7 times using 
each of the 4 approaches:

timethese(10**7, {
	'CODE A' => '@sorted = sort { -M $b <=> -M $a } @filenames;',
	'CODE B' => '@sorted = map { $_->[0] } 
	                       sort {$b->[1] <=> $a->[1]} 
	                       map {[$_, -M $_]} @filenames;',
	'CODE C' => '$date{$_} = -M $_ for @filenames; 
	             @sorted = sort {$date{$b} <=> $date{$a} } @filenames; 
	             undef %date;',
	'CODE D' => '@sorted = map  $_->[0], 
	                       sort {$b->[1] <=> $a->[1]} 
	                       map [$_, -M $_], @filenames;',
	}
);

Results:
Benchmark: timing 10000000 iterations of CODE A, CODE B, CODE C, CODE 
D...
    CODE A: 39 wallclock secs 
        (38.50 usr +  0.00 sys = 38.50 CPU) @ 259740.26/s (n=10000000)
    CODE B: 42 wallclock secs 
        (42.57 usr +  0.00 sys = 42.57 CPU) @ 234907.21/s (n=10000000)
    CODE C: 93 wallclock secs 
        (91.94 usr +  0.00 sys = 91.94 CPU) @ 108766.59/s (n=10000000)
    CODE D: 43 wallclock secs 
        (42.13 usr +  0.00 sys = 42.13 CPU) @ 237360.55/s (n=10000000)

Note that, contrary to expectation, Code A was the fastest.  Perhaps if 
the list were an order of magnitude greater, A would no longer be the 
fastest.  But this result was consistent across several runs.

Note further that, as you predicted, Code D was fractionally faster than 
Code B.  However, contrary to your prediction, Code B was more than 
twice as fast as Code C -- not just 5% faster.

So this is a good reason to use the Schwartzian Transform -- but how big 
a list do we have to have before the Transform becomes faster than a 
simple sort?

Jim Keenan


-
**Majordomo list services provided by PANIX <URL:http://www.panix.com>**
**To Unsubscribe, send "unsubscribe phl" to majordomo@lists.pm.org**