Walt Mankowski on 31 Aug 2010 11:46:00 -0700 |
On Tue, Aug 31, 2010 at 01:49:00PM -0400, Daniel.Roberts@sanofi-aventis.com wrote: > Actually.. > Here is a small snippet of the huge file that I am trying to process > > Note that the filename is always in the fifth column of a tab delimited > file...and the filename itself could contain periods....but it will > always end in the ".CEL" filename extension which is what I am looking > to remove... > The filename has many different conventions, it may contain any > combination of numbers and letters, but always ends in a .CEL file name > extension... > So If I could re-write the same file w/o the .CEL extensions that would > be great! > Dan > > > > 10 3EDD188D-91D3-4104-8992-E12D4B5F4785 3242 > AFFY_LIMS_DATA_OLD 012799Kas19KA85305_26.CEL > \\DGMappafs01\archivedata\1999 > 11 3EDD188D-91D3-4104-8992-E12D4B5F4785 3243 > AFFY_LIMS_DATA_OLD 012799Kas19KA85305_33.CEL > \\DGMappafs01\archivedata\1999 I'll assume those lines wrapped and what's above is really just two lines, one beginning with 10 and one with 11. If so, here's one way to do it in perl: perl -ane '$F[4] =~ s/\.CEL$//; print join "\t", @F; print "\n"' oldfile.txt >newfile.txt If you have perl 5.10 or later, you can shorten that to perl -anE '$F[4] =~ s/\.CEL$//; say join "\t", @F' oldfile.txt >newfile.txt Walt Attachment:
signature.asc ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
|
|