|
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
|
Re: What use is the Schwartzian Transform?
|
On a cold Saturday night in January, Mark Jason Dominus
<mjd-perl-pm@plover.com> wrote:
>
> # Code A
> @sorted = sort { -M $b <=> -M $a } @filenames;
>
> This is the obvious, natural way to sort a list of filenames into
> order by last modification date. For long lists, it can be slow,
> because the -M operation is done many times for each file.
Apparently a list must be _very_ long for the cost of the -M operations
to add up. See below.
>
> # Code B
> @sorted = map { $_->[0] }
> sort { $b->[1] <=> $a->[1] }
> map { [$_, -M $_] } @filenames;
>
> # Code C
> my %date;
> $date{$_} = -M $_ for @filenames;
> @sorted = sort { $date{$b} <=> $date{$a} } @filenames; undef %date;
>
> The obvious drawback of C compared with B is that it's fractionally
> slower; about 5%.
I got a much larger differential than 5%; see below.
>
> # Code D
> @sorted = map $_->[0],
> sort { $b->[1] <=> $a->[1] }
> map [$_, -M $_], @filenames;
>
>
> So here's the question: why do we bother with B at all? Why not
> always use C? -
I decided to apply Benchmark to these various approaches. I first
compiled a list of 9952 filenames, then sorted them 10**7 times using
each of the 4 approaches:
timethese(10**7, {
'CODE A' => '@sorted = sort { -M $b <=> -M $a } @filenames;',
'CODE B' => '@sorted = map { $_->[0] }
sort {$b->[1] <=> $a->[1]}
map {[$_, -M $_]} @filenames;',
'CODE C' => '$date{$_} = -M $_ for @filenames;
@sorted = sort {$date{$b} <=> $date{$a} } @filenames;
undef %date;',
'CODE D' => '@sorted = map $_->[0],
sort {$b->[1] <=> $a->[1]}
map [$_, -M $_], @filenames;',
}
);
Results:
Benchmark: timing 10000000 iterations of CODE A, CODE B, CODE C, CODE
D...
CODE A: 39 wallclock secs
(38.50 usr + 0.00 sys = 38.50 CPU) @ 259740.26/s (n=10000000)
CODE B: 42 wallclock secs
(42.57 usr + 0.00 sys = 42.57 CPU) @ 234907.21/s (n=10000000)
CODE C: 93 wallclock secs
(91.94 usr + 0.00 sys = 91.94 CPU) @ 108766.59/s (n=10000000)
CODE D: 43 wallclock secs
(42.13 usr + 0.00 sys = 42.13 CPU) @ 237360.55/s (n=10000000)
Note that, contrary to expectation, Code A was the fastest. Perhaps if
the list were an order of magnitude greater, A would no longer be the
fastest. But this result was consistent across several runs.
Note further that, as you predicted, Code D was fractionally faster than
Code B. However, contrary to your prediction, Code B was more than
twice as fast as Code C -- not just 5% faster.
So this is a good reason to use the Schwartzian Transform -- but how big
a list do we have to have before the Transform becomes faster than a
simple sort?
Jim Keenan
-
**Majordomo list services provided by PANIX <URL:http://www.panix.com>**
**To Unsubscribe, send "unsubscribe phl" to majordomo@lists.pm.org**
|
|