|
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
|
Re: [PLUG] finding duplicate files
|
Date: Sun, 23 Nov 2008 08:48:23 -0500
> From: "David A. Harding" <dave@dtrt.org>
>
> A common example of a incorrectly removed file:
<snip stuff with spaces>
> A unlikely but disastrous possibility:
<snip>
Both good points, which is why I stressed testing.
> I suggest you use GNU rm's -- option when removing arbitrary filenames.
> This option prevents rm from interpreting filenames as command line
> options. For example, imagine removing a file named "-rf" [*].
Good point. See:
http://www.gnu.org/software/coreutils/faq/coreutils-faq.html#How-do-I-remove-files-that-start-with-a-dash_003f
> I also suggest you use while-read loops for file names. Using the read
> builtin lets us work with whole lines. For example, a rewrite of J.P.'s
> code using a while-read loop follows:
>
> $ cut -d' ' -f1 /tmp/md5s | sort | uniq -d | while read hash ; do \
> grep "$hash" /tmp/md5s | cut -d' ' -f3- | tail -n+2 | while read
duplicate_file ; \
> do rm -- "$duplicate_file" ; done ; done
Yeah, you got me there! And I can't think of a way to handle spaces
using my method, other that this. Good one. Though I'd still TEST,
TEST, TEST first, by replacing the 'rm' with 'echo'.
> Unless you plan on removing files names starting with a dash, I
> suggest you change the rm line to the following line:
>
> test -f "$duplicate_file" && rm -- "$duplicate_file"
>
> The test may catch something I didn't anticipate.
That's cool. I'd write that like this, but they are the same, use
whichever you like:
[ -f "$duplicate_file" ] && echo -- "$duplicate_file"
So you get:
$ cut -d' ' -f1 /tmp/md5s | sort | uniq -d | \
while read hash ; do grep "$hash" /tmp/md5s|cut -d' ' -f3-|tail -n+2 | \
while read duplicate_file; \
do [ -f "$duplicate_file" ] && echo -- "$duplicate_file" ; done ; done
>>> * $() :sub-shell, legacy as backticks ``, but those are harder
>>> to read and not nestable. I've nested here.
>
> Technically, backtics are nestable (even in POSIX shell), but I'm
> pretty sure they can only be parsed by a computer. For example:
>
> $ echo `echo \`echo \\\`echo foo\\\` bar\` baz` quux
> foo bar baz quux
Damn, got me again! I'd have sworn they were not nestable (and have
said that in various presos), but you are correct. I can trace my
knowledge of that issue to the footnote on page 100 of _Learning the
bash Shell 3rd_, but I'd read or remembered it wrong. It says "less
conducive to nesting" but I'd remembered that as not possible.
Thanks for the great catches, hope I didn't good Art up,
JP
----------------------------|:::======|-------------------------------
JP Vossen, CISSP |:::======| jp{at}jpsdomain{dot}org
My Account, My Opinions |=========| http://www.jpsdomain.org/
----------------------------|=========|-------------------------------
"Microsoft Tax" = the additional hardware & yearly fees for the add-on
software required to protect Windows from its own poorly designed and
implemented self, while the overhead incidentally flattens Moore's Law.
___________________________________________________________________________
Philadelphia Linux Users Group -- http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
|
|