JP Vossen on 13 Oct 2012 14:16:00 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] perl -i question


Date: Fri, 12 Oct 2012 19:15:19 -0400
From: Walt Mankowski <waltman@pobox.com>

On Fri, Oct 12, 2012 at 06:05:28PM -0400, JP Vossen wrote:
See my previous "Perl one-liner to remove duplicates without
changing file order" email for background.
[...]
Anyone know what I am doing wrong?

Yes, the "select(STDOUT) is what's getting you.  Notice that it's
outside of the "while (<>)" loop.  Once you fall out of that loop,
print goes to STDOUT instead of the file.

At a higher level, what's really getting you is that the -i flag has
special logic to handle multiple files on the command line.  Whenever
if gets to EOF on one file, it closes that one and opens the next one.
(That's what's going on when it checks $ARGV.)  Because of that,
you've got to print the lines while you've still got the file open.
You're circumventing that by gathering them all up printing the lines
later.

OK, that's what I was afraid of, after re-reading the PerlRun stuff some more.

Any ideas how to work-around it in Perl? Or am I just stuck with the usual shell 'cp file file.bak && perl -ne ... file.bak > file' method?




Date: Sat, 13 Oct 2012 00:58:18 -0400
From: Martin Dellwo <martin.dellwo@gmail.com>

[...]
...Apparently the same thing maintains for '$line($_)++.

I think it's the while(<>){} loop; once I'm outside of that I'm toast, as Walt noted. I think the implementation is basically doing an #INCLUDE of my code into the middle of that code block I referenced in the PerlRun URL...

> You probably can't fix it by doing a 'select(ARGV)' because...
> what would that mean if you have more than one input file??

That's part of my problem, Perl does update those for every input file in that outer loop, but that loop exits and resets back to STDOUT before either my END{} block or the rest of my code gets to execute. :-/

That's Perl's "try to do the right thing" getting me.  It happens.


Because I'm trying to learn python and because what you're doing with
hashes looked a lot to me like python dictionaries (key/values, key
must be unique),

Yup.


I wrote a little test program to do the same in
python.  It's definitely not a one-liner, but does the same thing;
and using fileinput lets you process multiple files given as
arguments.  It can process compressed files (.gz or .bz2) due to the
openhook argument.  What's interesting is that it suffers from the

Cool.  That's a bit less easy to do in Perl IIRC.

 exact same problem:  you can try to process in-place by providing the
arguments "inplace=1,backup='.bak'" along with openhook=, but you get
empty files upon output because the print happens later (lucky for
that backup flag).  Here's the code:

--------------------------------------------
#!/usr/bin/env python

import fileinput

A={}
for line in fileinput.input(openhook=fileinput.hook_compressed):
   A[line]=fileinput.lineno()
B=dict(zip(A.values(),A.keys()))
for lineno in sorted(B):
   print B[lineno],
--------------------------------------------

Note the comma at the end of the print line, to suppress adding an
end-of-line since the string already contains it. By the way this pulls
out the dups across multiple files... if that's not what you wanted (ie
you wanted each file de-duped separately), I think you'd want a list of
dictionaries, you'd want something like L[fileinput.fileno()]={} and go
from there.

I did not think of multiple files and that is not my use-case. But it doesn't matter since I only call it on 1 file at a time. (Actually, my Perl code is embedded in a bash script.)

I was actually interested in implementations in other languages. Python is on my list to learn, but I'm so much faster in bash or Perl that it's hard to find the time. I can follow that code, but the "A" and "B" things are distracting. Names matter *a lot* to code readability, IMO, so perhaps a small tweak of those names could help. Also, I'm used to seeing {} delimit blocks like shell & Perl, and that white space thing is hard for me to "see" as a block.

Regardless, thanks to you both for thinking about it and for the feedback.

Thanks,
JP
----------------------------|:::======|-------------------------------
JP Vossen, CISSP            |:::======|      http://bashcookbook.com/
My Account, My Opinions     |=========|      http://www.jpsdomain.org/
----------------------------|=========|-------------------------------
"Microsoft Tax" = the additional hardware & yearly fees for the add-on
software required to protect Windows from its own poorly designed and
implemented self, while the overhead incidentally flattens Moore's Law.
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug