Kyle R . Burton on Wed, 7 Nov 2001 12:53:24 -0500 |
> this code looks very interesting. here's a thought off the > top of my head, dating from 1999 when I was doing coursework > in a related field. > > i wonder if it would be feasible to evolve your regular > expressions as genetic algorithms in the usual hill-climbing > way. instead of refining a single pattern, try keeping a > stable of, say, a hundred possible patterns that each may > match only a certain subset of the input data. refine each > pattern through random mutations, so that a pattern from a > parent generation produces more than one child. the > selection pressure would be based primary on success against > the input data space and secondarily on length of regexp. Actualy, I was thinking of doing something genetic, but I decided to try the approach in the linked to code because I was able to think it up and write it in about an hour's worth of time, and it works for most of the situations where I currently have a need for this type of thing. The genetic approaches I was considering would have taken alot more effort to write (partly because I have very little knowledge/experience with genetic programming). My current approach is extremly simplistic, goes from the left, character by character and selects what it think is the most specific candidate regex that matches all the examples. It performs less well when the examples are of varaible length, and makes no efforts to recognize stemming between the examples. I'd think a GP approach would better overall, but I don't know a good way to go about it. It would require somehow determining a set of starting regexes to use right? I guess generating the initial set shouldn't be too hard...survival critera should center around how many of the examples it matches, but somehow I'd want it to create general, but fairly specific regexes. For phone numbers, I'd rather have: \(?\d{3}\) \d{3}-\d{3} than: .+\s\d+-\d+ or, worse: .+ k -- ------------------------------------------------------------------------------ Just get rid of the false and you will automaticly realize the true. -- Ho-Shan mortis@voicenet.com http://www.voicenet.com/~mortis ------------------------------------------------------------------------------ **Majordomo list services provided by PANIX <URL:http://www.panix.com>** **To Unsubscribe, send "unsubscribe phl" to majordomo@lists.pm.org**
|
|