Mark Dominus on 27 Sep 2005 14:03:06 -0000 |
A guy in comp.lang.perl.moderated has a pretty interesting problem. He has digits of pi in what is called "BCD" format. This is a format that encodes two decimal digits per character. The two digits $a and $b are packed into a character with $a in the high bits and $b in the low bits. For example, the character "R" = 0x52 represents the two decimal digits 5 and 2. To represent digits of pi, say the ten digits "1415926536", we pack each pair of digits into a character. The easy way to do this in Perl is pack "H*", "1415926536" and the result is the 5-character string "\x14\x15\x92\x65\x36". Now suppose we have a very large string, say one million characters long, which therefore represents two million digits of pi. And we want to know if the digit sequence "123456" appears anywhere. Obviously, one way to do this is to unpack the two million digits and search the resulting two-megabyte string: my $pos = index(unpack("H*", $pi), "123456"); But the guy in clp.moderated doesn't want to do this; he wants to search the string *without* unpacking it. Without intending any comment on whether this is a sensible thing to want to do, I wrote code to solve the problem. It came out very nice, and I thought other folks might be interested in the exercise. My code will show up in clp.moderated when the moderator gets around to approving it. In the meantime, you might want to try it yourself; I found it fun. What I learned from this project: it's a lot easier to write good code when I've gotten a full night's sleep. - **Majordomo list services provided by PANIX <URL:http://www.panix.com>** **To Unsubscribe, send "unsubscribe phl" to majordomo@lists.pm.org**
|
|