Doug Stewart on 28 Oct 2009 15:31:55 -0700 |
On 10/26/09, JP Vossen <jp@jpsdomain.org> wrote: > To replace all "word\nword" (newlines) with "word word" (space), try: > perl -0777 -pe 's/(\w+)\n(\w+)/$1 $2/g' bad_file > good_file JP: Yours was the closest answer. The secret was in the multi-line /m flag. I passed the text through this: perl -0777 -pe 's/\n(\S+)/$1 $2/gm' bad_file > good file ...and it resulted in a much, MUCH cleaner file. Note that I removed the requirement for a \w word match to begin the expression and subbed in a \S for the second \w; with the multi-line flag and a subbing-out for non-whitespace instead of word characters (because a line could conceivably start with a number or a quote), I reached almost-data-processing-Nirvana. There's still a little manual clean-up (and I'm going to want to trim leading whitespace off all lines -- a trivial task now), but by-and-large, after much experimentation (and cursing at http://regexr.com), I've got the file into a workable format. Thanks much! -- -Doug http://literalbarrage.org/blog/ ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
|
|