Kyle R . Burton on Wed, 7 Nov 2001 12:09:08 -0500


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

generating regexes?


I've been reading up on is machine learning.  One of the things I've been 
toying with is the ability to generate a regex to match a given example
set of data.  My particualr examples would be for things like phone numbers,
or zip codes, or information that consists of single data elements.  

I've looked on CPAN for any possible existing work, but haven't been able
to find anything.  Does anyone know of anything along the lines of what
I'm describing?  The Regexp package provides some common examples, but what
I really want is a tool I can use to generate regexes for data in a generic,
automated fashion.

I've tried writing some simplistic code, and it has some success with
data that has a consistient format - though it creates some horrible looking
regexes for less consistient data, and fails completely for inconsistient
data.  I'm almost embarassed to offer this up, but if you're interested the
code I wrote to try this out is available here:

 http://www.bgw.org/projects/perl/machine_learning/

Any advice or pointers would be great.


Thanks,
Kyle R. Burton

-- 

------------------------------------------------------------------------------
Just get rid of the false and you will
automaticly realize the true.
        -- Ho-Shan
mortis@voicenet.com                            http://www.voicenet.com/~mortis
------------------------------------------------------------------------------
**Majordomo list services provided by PANIX <URL:http://www.panix.com>**
**To Unsubscribe, send "unsubscribe phl" to majordomo@lists.pm.org**