JP Vossen on 10 Oct 2008 14:50:08 -0700 |
> Date: Fri, 10 Oct 2008 16:53:51 -0400 > From: "Michael Lazin" <microlaser@gmail.com> > > What I am doing is picking up the IP addresses, my problem seems to be > finding a "||" immediately following the IP address. What I tried > seems to be picking up a | followed by a string and another | followed > by a string. What I am looking for is [IP address][||] Other folks already mentioned the various problems with the IPA part of the pattern, and suggested [[:digit:]] or [0-9] solutions, so I'll skip that. (OK, I lied, I won't. You have been warned.) If I understand, you've got a file with '|' as a delimiter and want to find an IPA with a *single '|' on each end? So you have two problems, a) how to find an IPA (kinda solved) and b) how not to find an IPA ending with '||'. If that's right, how about this: $ cat grepme.txt foo|bar|10.10.10.10|baz|match abc|efg|10.10.10.20||No match $ egrep '\|[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\|[^|]' grepme.txt foo|bar|10.10.10.10|baz|match In PCRE (Perl Compatible Regular Expression) terms, you want a negative look-ahead to find /|/ but not /||/, but I don't think that's implemented in egrep. However you can fake it with a negated character class, which is what I did. So [^|] or [!|] means any character that isn't a pipe. The definitive book for regular expressions is _Mastering Regular Expressions 3_ (AKA MRE, http://oreilly.com/catalog/9780596528126/index.html), which is an incredible book. But it's also incredibly dense, bring aspirin. :-) So, from my copy of MRE2, page 189, here is the regex that only matches an IPA. This may well be vast overkill for your need, since the patterns above are usually Good Enough. I throw it in here to demonstrate how good and how dense MRE is. In PCRE it is: ([01]?\d\d?|2[0-4]\d|25[0-5])\.([01]?\d\d?|2[0-4]\d|25[0-5])\.([01]?\d\d?|2[0-4]\d|25[0-5])\.([01]?\d\d?|2[0-4]\d|25[0-5]) In egrep it is (unreadably): $ egrep '([01]?[0-9][0-9]?|2[0-4][0-9]|25[0-5])\.([01]?[0-9][0-9]?|2[0-4][0-9]|25[0-5])\.([01]?[0-9][0-9]?|2[0-4][0-9]|25[0-5])\.([01]?[0-9][0-9]?|2[0-4][0-9]|25[0-5])\|[^|]' grepme.txt foo|bar|10.10.10.10|baz|match Still with me? :-) Be darn sure you comment the heck out of your code! Just for fun, the GNU grep -o switch: $ egrep -o '\|[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\|[^|]' grepme.txt |10.10.10.10|b $ egrep -o '\|[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\|[^|]' \ grepme.txt | cut -d'|' -f2 10.10.10.10 Finally, on re-reading, I think maybe I got it wrong and you *want* only the double pipe. In that case: $ egrep '\|[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\|\|' grepme.txt abc|efg|10.10.10.20||No match Bonus, if you want to sort the resulting list of IPAs, I cover that in recipe 8.3 of the _bash Cookbook_: ... | sort -t . -k 1,1n -k 2,2n -k 3,3n -k 4,4n HTH, JP PS--No one talked about anchors, which can *vastly* speed up execution time. But I don't have enough info about your data file to tell if they are useful or not. Briefly, ^ is start of line (^foo), $ is end of line (bar$) and there are lots of others. Search the docs for 'anchor'. PPS--I'm sorry if attempting to read this on a Friday evening makes anyones head explode. Don't blame me, I didn't invent regular expressions. I just use 'em every day. :-) ----------------------------|:::======|------------------------------- JP Vossen, CISSP |:::======| jp{at}jpsdomain{dot}org My Account, My Opinions |=========| http://www.jpsdomain.org/ ----------------------------|=========|------------------------------- "Microsoft Tax" = the additional hardware & yearly fees for the add-on software required to protect Windows from its own poorly designed and implemented self, while the overhead incidentally flattens Moore's Law. ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
|
|