Aaron J. Mackey on 2 Mar 2004 14:09:17 -0000 |
Within a given biosequence with length X, find substrings of min. length A and max. length B that contain the pattern P at least C times but no more than D times. A more concrete example: Find all substrings 12 characters long (A = B = 12) that have at least 7 (C = 7, D = 12 implictly) 'I' or 'L' characters (P = [IL]) in it. The naive approach is a "sliding window" method, but it seems to me that a pattern matching approach would be more efficient. And it sounds like a great little challenge for the brilliant minds of FWP. The "best" version will find it's way into a BioPerl module (with appropriate attribution, of course). Golfing is not the goal here (but Golf-ed solutions are still welcome, if you must). Enjoy, -Aaron - **Majordomo list services provided by PANIX <URL:http://www.panix.com>** **To Unsubscribe, send "unsubscribe phl" to majordomo@lists.pm.org**
|
|