JP Vossen on 10 Oct 2008 14:50:08 -0700

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] egrep help

 > Date: Fri, 10 Oct 2008 16:53:51 -0400
 > From: "Michael Lazin" <>
 > What I am doing is picking up the IP addresses, my problem seems to be
 > finding a "||" immediately following the IP address.  What I tried
 > seems to be picking up a | followed by a string and another | followed
 >  by a string. What I am looking for is [IP address][||]

Other folks already mentioned the various problems with the IPA part of 
the pattern, and suggested [[:digit:]] or [0-9] solutions, so I'll skip 
that.  (OK, I lied, I won't.  You have been warned.)

If I understand, you've got a file with '|' as a delimiter and want to 
find an IPA with a *single '|' on each end?  So you have two problems, 
a) how to find an IPA (kinda solved) and b) how not to find an IPA 
ending with '||'.

If that's right, how about this:
	$ cat grepme.txt
	abc|efg|||No match

	$ egrep '\|[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\|[^|]' 

In PCRE (Perl Compatible Regular Expression) terms, you want a negative 
look-ahead to find /|/ but not /||/, but I don't think that's 
implemented in egrep.  However you can fake it with a negated character 
class, which is what I did.  So [^|] or [!|] means any character that 
isn't a pipe.

The definitive book for regular expressions is _Mastering Regular 
Expressions 3_ (AKA MRE,, which is an 
incredible book.  But it's also incredibly dense, bring aspirin. :-)

So, from my copy of MRE2, page 189, here is the regex that only matches 
an IPA.  This may well be vast overkill for your need, since the 
patterns above are usually Good Enough.  I throw it in here to 
demonstrate how good and how dense MRE is.

In PCRE it is:

In egrep it is (unreadably):
$ egrep 

Still with me?  :-)  Be darn sure you comment the heck out of your code!

Just for fun, the GNU grep -o switch:

$ egrep -o '\|[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\|[^|]' 

$ egrep -o '\|[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\|[^|]' \
	grepme.txt | cut -d'|' -f2

Finally, on re-reading, I think maybe I got it wrong and you *want* only 
the double pipe.  In that case:
$ egrep '\|[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\|\|' grepme.txt
abc|efg|||No match

Bonus, if you want to sort the resulting list of IPAs, I cover that in 
recipe 8.3 of the _bash Cookbook_:
	... | sort -t . -k 1,1n -k 2,2n -k 3,3n -k 4,4n


PS--No one talked about anchors, which can *vastly* speed up execution 
time.  But I don't have enough info about your data file to tell if they 
are useful or not.  Briefly, ^ is start of line (^foo), $ is end of line 
(bar$) and there are lots of others.  Search the docs for 'anchor'.

PPS--I'm sorry if attempting to read this on a Friday evening makes 
anyones head explode.  Don't blame me, I didn't invent regular 
expressions.  I just use 'em every day.  :-)
JP Vossen, CISSP            |:::======|        jp{at}jpsdomain{dot}org
My Account, My Opinions     |=========|
"Microsoft Tax" = the additional hardware & yearly fees for the add-on
software required to protect Windows from its own poorly designed and
implemented self, while the overhead incidentally flattens Moore's Law.
Philadelphia Linux Users Group         --
Announcements -
General Discussion  --