James E Keenan on 11 Jan 2004 05:03:43 -0000 |
On a cold Saturday night in January, Mark Jason Dominus <mjd-perl-pm@plover.com> wrote: > > # Code A > @sorted = sort { -M $b <=> -M $a } @filenames; > > This is the obvious, natural way to sort a list of filenames into > order by last modification date. For long lists, it can be slow, > because the -M operation is done many times for each file. Apparently a list must be _very_ long for the cost of the -M operations to add up. See below. > > # Code B > @sorted = map { $_->[0] } > sort { $b->[1] <=> $a->[1] } > map { [$_, -M $_] } @filenames; > > # Code C > my %date; > $date{$_} = -M $_ for @filenames; > @sorted = sort { $date{$b} <=> $date{$a} } @filenames; undef %date; > > The obvious drawback of C compared with B is that it's fractionally > slower; about 5%. I got a much larger differential than 5%; see below. > > # Code D > @sorted = map $_->[0], > sort { $b->[1] <=> $a->[1] } > map [$_, -M $_], @filenames; > > > So here's the question: why do we bother with B at all? Why not > always use C? - I decided to apply Benchmark to these various approaches. I first compiled a list of 9952 filenames, then sorted them 10**7 times using each of the 4 approaches: timethese(10**7, { 'CODE A' => '@sorted = sort { -M $b <=> -M $a } @filenames;', 'CODE B' => '@sorted = map { $_->[0] } sort {$b->[1] <=> $a->[1]} map {[$_, -M $_]} @filenames;', 'CODE C' => '$date{$_} = -M $_ for @filenames; @sorted = sort {$date{$b} <=> $date{$a} } @filenames; undef %date;', 'CODE D' => '@sorted = map $_->[0], sort {$b->[1] <=> $a->[1]} map [$_, -M $_], @filenames;', } ); Results: Benchmark: timing 10000000 iterations of CODE A, CODE B, CODE C, CODE D... CODE A: 39 wallclock secs (38.50 usr + 0.00 sys = 38.50 CPU) @ 259740.26/s (n=10000000) CODE B: 42 wallclock secs (42.57 usr + 0.00 sys = 42.57 CPU) @ 234907.21/s (n=10000000) CODE C: 93 wallclock secs (91.94 usr + 0.00 sys = 91.94 CPU) @ 108766.59/s (n=10000000) CODE D: 43 wallclock secs (42.13 usr + 0.00 sys = 42.13 CPU) @ 237360.55/s (n=10000000) Note that, contrary to expectation, Code A was the fastest. Perhaps if the list were an order of magnitude greater, A would no longer be the fastest. But this result was consistent across several runs. Note further that, as you predicted, Code D was fractionally faster than Code B. However, contrary to your prediction, Code B was more than twice as fast as Code C -- not just 5% faster. So this is a good reason to use the Schwartzian Transform -- but how big a list do we have to have before the Transform becomes faster than a simple sort? Jim Keenan - **Majordomo list services provided by PANIX <URL:http://www.panix.com>** **To Unsubscribe, send "unsubscribe phl" to majordomo@lists.pm.org**
|
|