|
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
|
Re: [PLUG] Looking for a poke in the right sed/awk/regex direction
|
On 10/26/09, JP Vossen <jp@jpsdomain.org> wrote:
> To replace all "word\nword" (newlines) with "word word" (space), try:
> perl -0777 -pe 's/(\w+)\n(\w+)/$1 $2/g' bad_file > good_file
JP:
Yours was the closest answer. The secret was in the multi-line /m flag.
I passed the text through this:
perl -0777 -pe 's/\n(\S+)/$1 $2/gm' bad_file > good file
...and it resulted in a much, MUCH cleaner file. Note that I removed
the requirement for a \w word match to begin the expression and subbed
in a \S for the second \w; with the multi-line flag and a subbing-out
for non-whitespace instead of word characters (because a line could
conceivably start with a number or a quote), I reached
almost-data-processing-Nirvana.
There's still a little manual clean-up (and I'm going to want to trim
leading whitespace off all lines -- a trivial task now), but
by-and-large, after much experimentation (and cursing at
http://regexr.com), I've got the file into a workable format.
Thanks much!
--
-Doug
http://literalbarrage.org/blog/
___________________________________________________________________________
Philadelphia Linux Users Group -- http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
|
|