Doug Stewart on 25 Oct 2009 07:07:45 -0700 |
Howdy all, I've got a flat text file that contains a lot of text that was copied from a PDF. Unfortunately, the copy process retained the formatted file's line breaks, meaning that the flat text file has many unnecessary line breaks that mess with the formatting if you change the dimensions of the view port. So what I need is a little sed/awk/regex magic that will search the text file for all unnecessary line breaks and strip them out. You can identify the unnecessary line breaks as follows: 1) Proper line breaks are followed by a space on the beginning of the next line, e.g. " The quick brown fox" 2) Improper line breaks have no space at the beginning, e.g. "jumps over the lazy dog" So, I need to 1) Detect all occurrences of lines with a leading space that 2) Are followed by a line with NO leading space and 3) Delete the line break between the two, essentially merging the two lines. Any ideas? -- -Doug http://literalbarrage.org/blog/ ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
|
|