JP Vossen on 21 Oct 2009 22:00:54 -0700 |
> Date: Wed, 21 Oct 2009 23:20:29 -0400 > From: Walt Mankowski <waltman@pobox.com> > > On Wed, Oct 21, 2009 at 03:50:01PM -0400, JP Vossen wrote: >> I have 300K+ valid regular expressions. I am abstracting out one more >> level on about 100K of them, and instead of being matched as regexps, >> they will be matched as static strings in a hash table (in Java). > > Are you doing that because the hash table lookup is faster? It's not > obvious to me that it should be. Doing a regex comparison on a plain > string should be pretty close to just checking if the two strings are > equal, and that's going to be O(n). A hash lookup also has to be O(n) > since you've got to read the string to hash it. We've already read the string, now we're trying to figure out what it is. A hash lookup will be faster to fail than trying 300K+ regex matches and not hitting on any, and it will be pretty close to faster in the best case... But the more important aspect is RAM use, we're close to 1G RAM for 300K+ compiled regexps (Java/Jakarta ORA, not Perl). RAM upgrades for many hundreds of devices in the field is not an option. The hash solution takes almost no RAM when compared to the compiled regexps. So far I've now got this, which is "working" for my data-set, for some value of "working" that needs more testing. It is NOT a universal will-always-work solution, though. m/(?:^|[^\\])(?:\\[dDwWsShHvVRCpPbBAzZG]{1}|[[{])/; # Ether start-of-line or NOT \ # then \ followed by one of the meta-chars or anchors # or [ or { # # The regex above is derived from the following: # http://perldoc.perl.org/perlreref.html#CHARACTER-CLASSES # http://perldoc.perl.org/perlreref.html#ANCHORS # Started trying to use \b but that missed a lot of stuff that was # not at a "word" boundary. Thanks, JP ----------------------------|:::======|------------------------------- JP Vossen, CISSP |:::======| http://bashcookbook.com/ My Account, My Opinions |=========| http://www.jpsdomain.org/ ----------------------------|=========|------------------------------- "Microsoft Tax" = the additional hardware & yearly fees for the add-on software required to protect Windows from its own poorly designed and implemented self, while the overhead incidentally flattens Moore's Law. ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
|
|