Aaron J. Mackey on 2 Mar 2004 14:09:17 -0000


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

pattern finding problem



On the BioPerl mailing list we often get requests like the following:

Within a given biosequence with length X, find substrings of min. length A and max. length B that contain the pattern P at least C times but no more than D times.

A more concrete example: Find all substrings 12 characters long (A = B = 12) that have at least 7 (C = 7, D = 12 implictly) 'I' or 'L' characters (P = [IL]) in it.

The naive approach is a "sliding window" method, but it seems to me that a pattern matching approach would be more efficient. And it sounds like a great little challenge for the brilliant minds of FWP. The "best" version will find it's way into a BioPerl module (with appropriate attribution, of course). Golfing is not the goal here (but Golf-ed solutions are still welcome, if you must).

Enjoy,

-Aaron

-
**Majordomo list services provided by PANIX <URL:http://www.panix.com>**
**To Unsubscribe, send "unsubscribe phl" to majordomo@lists.pm.org**