Adam Turoff on Mon, 26 Jun 2000 16:13:36 -0400 (EDT)


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

fwd: Re: regex question


----- Forwarded message -----

Date: Mon, 26 Jun 2000 16:10:54 -0400 (EDT)
From: "Schuyler D. Erle" <sderle@asi.alphaaccess.net>
To: phl@lists.pm.org
Subject: Re: regex question

> Notice that in the last three lines, <BR> tags appear embedded within 
> another HTML tag.  I need a regular expression that will read in each
> line, and if the <BR> is within a larger <...> string, the <BR> will be 
> removed.  Otherwise, the line will be left alone.
> 
> I can't quite pull this off (so far).  Does anyone have a suggestion?

How about: 

perl -pi -e 's/(<[^<>]+)<BR>([^<>]+>)/$1$2/gios' broken.html

The first backreference captures the start of the surrounding tag (which
begins with < and contains one or more of anything other than < or >).
Then <BR> is matched. Finally, the second back reference captures all
non-angle-bracket chars up to the closing >. The substitution just
concatenates the two halves of the surrounding tag minus the <BR>.

SDE





----- End forwarded message -----
**Majordomo list services provided by PANIX <URL:http://www.panix.com>**
**To Unsubscribe, send "unsubscribe phl" to majordomo@lists.pm.org**