Doug Stewart on 28 Oct 2009 15:31:55 -0700

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Looking for a poke in the right sed/awk/regex direction

On 10/26/09, JP Vossen <> wrote:
> To replace all "word\nword" (newlines) with "word word" (space), try:
> 	perl -0777 -pe 's/(\w+)\n(\w+)/$1 $2/g' bad_file > good_file

Yours was the closest answer. The secret was in the multi-line /m flag.

I passed the text through this:

perl -0777 -pe 's/\n(\S+)/$1 $2/gm' bad_file > good file

...and it resulted in a much, MUCH cleaner file.  Note that I removed
the requirement for a \w word match to begin the expression and subbed
in a \S for the second \w; with the multi-line flag and a subbing-out
for non-whitespace instead of word characters (because a line could
conceivably start with a number or a quote), I reached

There's still a little manual clean-up (and I'm going to want to trim
leading whitespace off all lines -- a trivial task now), but
by-and-large, after much experimentation (and cursing at, I've got the file into a workable format.

Thanks much!

Philadelphia Linux Users Group         --
Announcements -
General Discussion  --