Martin Dellwo on 12 Oct 2012 21:58:27 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] perl -i question


I think in every case where you have 'while (<>) { some_code }', the file(s) are iterated over during the {} block.  Once you exit that block, the file handles are all closed (or at least seek position is at the end) and output returned to STDOUT.  Apparently the same thing maintains for '$line($_)++.  You probably can't  fix it by doing a 'select(ARGV)' because... what would that mean if you have more than one input file??

Because I'm trying to learn python and because what you're doing with hashes looked a lot to me like python dictionaries (key/values, key must be unique), I wrote a little test program to do the same in python.  It's definitely not a one-liner, but does the same thing; and using fileinput lets you process multiple files given as arguments.  It can process compressed files (.gz or .bz2) due to the openhook argument.  What's interesting is that it suffers from the exact same problem:  you can try to process in-place by providing the arguments "inplace=1,backup='.bak'" along with openhook=, but you get empty files upon output because the print happens later (lucky for that backup flag).  Here's the code:
--------------------------------------------
#!/usr/bin/env python

import fileinput

A={}
for line in fileinput.input(openhook=fileinput.hook_compressed):
  A[line]=fileinput.lineno()
B=dict(zip(A.values(),A.keys()))
for lineno in sorted(B):
  print B[lineno],
--------------------------------------------

Note the comma at the end of the print line, to suppress adding an end-of-line since the string already contains it.  By the way this pulls out the dups across multiple files... if that's not what you wanted (ie you wanted each file de-duped separately), I think you'd want a list of dictionaries, you'd want something like L[fileinput.fileno()]={} and go from there.
Here's info on the fileinput class: http://docs.python.org/library/fileinput.html

Marty

On Oct 12, 2012, at 7:15 PM, Walt Mankowski <waltman@pobox.com> wrote:

> On Fri, Oct 12, 2012 at 06:05:28PM -0400, JP Vossen wrote:
>> See my previous "Perl one-liner to remove duplicates without
>> changing file order" email for background.
>> 
>> I can understand why this might not work (note: -i'.bak'):
>> 	$ perl -ni'.bak' -e '$line{$_} = $.; END { for
>> (sort{$line{$a}<=>$line{$b}} keys %line) {print} }' /tmp/sample.hist
>> 
>> I'm *guessing* the END{} block executes *after* all the file handles
>> are closed, so you end up writing nothing to your file.  (Hope you
>> have a backup.)
>> 
>> But why doesn't this work:
>> 	$ perl -i'.bak' -e 'while (<>) {$line{$_} = $.} for
>> (sort{$line{$a}<=>$line{$b}} keys %line) {print}' /tmp/sample.hist
>> 
>> These *do* work:
>> 	$ perl -ni'.bak' -e '$line{$_}++ or print' /tmp/sample.hist
>> 	$ perl -i'.bak' -e 'while (<>) { $line{$_}++ or print }' /tmp/sample.hist
>> 
>> But these do not either:
>> 	$ perl -ni'.bak' -e '$line{$_}++; END { for
>> (sort{$line{$a}<=>$line{$b}} keys %line) {print} }' /tmp/sample.hist
>> 	$ perl -i'.bak' -e 'while (<>) {$line{$_}++} for
>> (sort{$line{$a}<=>$line{$b}} keys %line) {print}' /tmp/sample.hist
>> 
>> So WTH???  I've Googled the heck out and this and read
>> http://perldoc.perl.org/perlrun.html#*-i*[_extension_] 10 times, but
>> nothing I try works, print always goes to STDOUT and my file always
>> ends up zero bytes.  I'm guessing the "select(STDOUT);" in line 21
>> of the listing in the URL is what is getting me, but I can't figure
>> out how to work around it in Perl.  (I can easily work around this
>> outside of Perl, I just don't wanna.)  I've tried all kinds of crazy
>> stuff like:
>> 	$ perl -i'.bak' -e 'while (<>) {$line{$_} = $.;} select(ARGV); for
>> (sort{$line{$a}<=>$line{$b}} keys %line) {print}' /tmp/sample.hist
>> 	$ perl -i'.bak' -e 'while (<>) {$line{$_} = $.;} select(ARGVOUT);
>> for (sort{$line{$a}<=>$line{$b}} keys %line) {print}'
>> /tmp/sample.hist
>> 	$ perl -i'.bak' -e 'while (<>) {$line{$_} = $.;} for
>> (sort{$line{$a}<=>$line{$b}} keys %line) {print ARGV}'
>> /tmp/sample.hist
>> 	$ perl -i'.bak' -e 'while (<>) {$line{$_} = $.;} for
>> (sort{$line{$a}<=>$line{$b}} keys %line) {print ARGVOUT}'
>> /tmp/sample.hist
>> 
>> And anything else I can think of, but it all silently fails (and
>> nukes my file) on both Debian Perl 5.10.1 and XP ActiveState (810)
>> 5.8.4, so I'm pretty sure Perl is WaD and I'm just missing something
>> obvious.
>> 
>> Anyone know what I am doing wrong?
> 
> Yes, the "select(STDOUT) is what's getting you.  Notice that it's
> outside of the "while (<>)" loop.  Once you fall out of that loop,
> print goes to STDOUT instead of the file.
> 
> At a higher level, what's really getting you is that the -i flag has
> special logic to handle multiple files on the command line.  Whenever
> if gets to EOF on one file, it closes that one and opens the next one.
> (That's what's going on when it checks $ARGV.)  Because of that,
> you've got to print the lines while you've still got the file open.
> You're circumventing that by gathering them all up printing the lines
> later.
> 
> Walt
> ___________________________________________________________________________
> Philadelphia Linux Users Group         --        http://www.phillylinux.org
> Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
> General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug

--
Martin Dellwo
cell: (484) 437-3662
martin.dellwo@gmail.com




___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug