mjd-perl-pm on 11 Jan 2004 00:06:40 -0000


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

What use is the Schwartzian Transform?


Consider the following code:

        # Code A        
        my @filenames = readdir D;
        @sorted = sort { -M $b <=> -M $a } @filenames;

This is the obvious, natural way to sort a list of filenames into
order by last modification date.  For long lists, it can be slow,
because the -M operation is done many times for each file.  

The Schwartzian Transform is a method for speeding this up, by
reducing the number of redundant -M operations:

        # Code B 
        my @filenames = readdir D;
        @sorted = map { $_->[0] }
                  sort { $b->[1] <=> $a->[1] } 
                  map { [$_, -M $_] } @filenames;
        
I was teaching this technique in a class last month, and one of the
students asked why one wouldn't simply use this:

        # Code C
        my @filenames = readdir D;
        my %date;
        $date{$_} = -M $_ for @filenames;
        @sorted = sort { $date{$b} <=> $date{$a} } @filenames;
        undef %date;

Compared with B, code C has the benefit of being much simpler and
easier to understand.  I think it's an obvious, natural way to solve
the problem, and that B isn't.

The obvious drawback of C compared with B is that it's fractionally
slower; about 5%.  I don't think the 5% is significant, and I don't
think other people think it's significant either.  As evidence of
this, I note that people don't usually suggest that B be replaced with
the following variation, which is somewhat faster than B:

        # Code D
        my @filenames = readdir D;
        @sorted = map  $_->[0],
                  sort { $b->[1] <=> $a->[1] } 
                  map [$_, -M $_], @filenames;
        


So here's the question:  why do we bother with B at all?  Why not
always use C?
-
**Majordomo list services provided by PANIX <URL:http://www.panix.com>**
**To Unsubscribe, send "unsubscribe phl" to majordomo@lists.pm.org**