Eric on 3 Dec 2009 17:04:00 -0800 |
Michael: There was a recent thread somewhere - I don't recall where - that concluded "Don't use regular expressions to parse html!" REs are very powerful but html can be quite complex and even irregular and REs are not the right tool to make a parser. While writing this I noticed that Sean suggested DOM manipulation with Python. Excellent idea. SED, BASH, etc. just don't have what you're going to need to create a reliable, effective solution. Good luck. Eric Michael Lazin wrote: > Yeah, it's just a proof of concept, obviously this is gonna take some > work. Out of curiosity is there a way to insert with sed, so you could > do something like inserting <!-- --> around the <iframe></iframe> tags? > This might be better than removing a whole line of code. > >> On Dec 3, 2009 6:46 PM, "Douglas Muth" <doug.muth@gmail.com >> <mailto:doug.muth@gmail.com>> wrote: >> >> On Thu, Dec 3, 2009 at 6:30 PM, Michael Lazin <microlaser@gmail.com >> <mailto:microlaser@gmail.com>> wrote: > Hi, I am interested in... >> >> No idea, but I can tell you how I would do it: >> >> cat test.html | sed -e s/iframe//g >> >> Keep in mind that with that specific regexp, you'll be left with >> broken HTML code. I assume that's a proof of concept, though. :-) >> >> Hope that helps, >> >> -- Doug >> ___________________________________________________________________________ >> Philadelphia Linux Users Group -- >> http://www.phillylinux.org >> Announcements - >> http://lists.phillylinux.org/mailman/listinfo/plug-announce >> General Discussion -- >> http://lists.phillylinux.org/mailman/listinfo/plug > > > ------------------------------------------------------------------------ > > ___________________________________________________________________________ > Philadelphia Linux Users Group -- http://www.phillylinux.org > Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce > General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug -- # Eric Lucas # # "Oh, I have slipped the surly bond of earth # And danced the skies on laughter-silvered wings... # -- John Gillespie Magee Jr ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
|
|