Eric on 3 Dec 2009 17:04:00 -0800


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] sed newbie question


Michael:

There was a recent thread somewhere - I don't recall where - that concluded
"Don't use regular expressions to parse html!"  REs are very powerful but html
can be quite complex and even irregular and REs are not the right tool to make a
parser.

While writing this I noticed that Sean suggested DOM manipulation with Python.
Excellent idea.  SED, BASH, etc. just don't have what you're going to need to
create a reliable, effective solution.

Good luck.

Eric

Michael Lazin wrote:
> Yeah, it's just a proof of concept, obviously this is gonna take some
> work.  Out of curiosity is there a way to insert with sed, so you could
> do something like inserting <!-- --> around the <iframe></iframe> tags? 
> This might be better than removing a whole line of code.
> 
>> On Dec 3, 2009 6:46 PM, "Douglas Muth" <doug.muth@gmail.com
>> <mailto:doug.muth@gmail.com>> wrote:
>>
>> On Thu, Dec 3, 2009 at 6:30 PM, Michael Lazin <microlaser@gmail.com
>> <mailto:microlaser@gmail.com>> wrote: > Hi, I am interested in...
>>
>> No idea, but I can tell you how I would do it:
>>
>> cat test.html | sed -e s/iframe//g
>>
>> Keep in mind that with that specific regexp, you'll be left with
>> broken HTML code.  I assume that's a proof of concept, though. :-)
>>
>> Hope that helps,
>>
>> -- Doug
>> ___________________________________________________________________________
>> Philadelphia Linux Users Group         --      
>>  http://www.phillylinux.org
>> Announcements -
>> http://lists.phillylinux.org/mailman/listinfo/plug-announce
>> General Discussion  --  
>> http://lists.phillylinux.org/mailman/listinfo/plug
> 
> 
> ------------------------------------------------------------------------
> 
> ___________________________________________________________________________
> Philadelphia Linux Users Group         --        http://www.phillylinux.org
> Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
> General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug

-- 
#  Eric Lucas
#
#                "Oh, I have slipped the surly bond of earth
#                 And danced the skies on laughter-silvered wings...
#                                        -- John Gillespie Magee Jr
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug