JP Vossen on 21 Oct 2009 12:50:09 -0700 |
Date: Wed, 21 Oct 2009 14:45:25 -0400 From: Walt Mankowski <waltman@pobox.com> > The way you describe it, it sounds impossible. As you said, every > normal string is already a valid regex, and many regex sequences can > occur normally within strings. Short of writing your own regex > parser, it seems like the best you can do is search for some common > patterns that occur in regexes. Yeah, the latter is what I'm trying to do now. It'll probably work for my data-set, but I was hoping to find a more elegant solution. So far I have this, which is NOT even close to universal and does NOT even work on 100% of my data yet. m/\b\\[dDwWsShHvVRCpPbBAzZG]{1}/ ) { # The regex above is derived from the following: # http://perldoc.perl.org/perlreref.html#CHARACTER-CLASSES # http://perldoc.perl.org/perlreref.html#ANCHORS I was hoping for a more subtle trick, such as the following. There is an ugly, ugly hack when grepping the process list to avoid your grep: ps auwx | grep 'foo' | grep -v 'grep' The better way to do that is: ps auwx | grep '[f]oo' The string '[f]oo' does not match the string 'foo' but the regex '[f]oo' does match the string 'foo' so grep sees the "foo" process you want but does not see itself. I want some Perl trickery that does something like that. So far I haven't figured it out, so I'm falling back to brute force to get the code written. :-( > But I'm not really clear why you need to do this in the first place. > Since static strings are valid regular expressions, what's the harm in > just treating everything as a regex? Also, since the users entering > these strings presumably knows whether or not they're supposed to be > regexes, isn't there any way you can get them to indicate it in the > data somehow? I have 300K+ valid regular expressions. I am abstracting out one more level on about 100K of them, and instead of being matched as regexps, they will be matched as static strings in a hash table (in Java). Something like "foo: \s+bar baz \!" cannot be represented accurately as a static string, whereas "foo: bar baz \!" is fine. So I need to be able to tell the difference, convert the static ones into hash keys, and ignore that ones that must remain regexps. And the 100K source regexps are for matching Snort signatures, which in part may include PCRE. Are we having fun yet? (Actually, yes. Sick, ain't it? :) So in effect, *I'm* the user that needs to go into the 100K records and flag them as regexp or string (hash). > Finally, I'm confused by your use of "PCRE". Do you mean "perl > regular expressions", or the the PCRE library for "Perl Compatible > Regular Expressions"? Yes. My development work is in Perl (and the Regex Coach), but some of the production parts run in the Java/Jakarta ORA PCRE lib. Sigh. Thanks, JP PS--Jeff, sure I've got lots more problems than these, some of which are even treatable. It's all about the context man! ----------------------------|:::======|------------------------------- JP Vossen, CISSP |:::======| http://bashcookbook.com/ My Account, My Opinions |=========| http://www.jpsdomain.org/ ----------------------------|=========|------------------------------- "Microsoft Tax" = the additional hardware & yearly fees for the add-on software required to protect Windows from its own poorly designed and implemented self, while the overhead incidentally flattens Moore's Law. ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
|
|