David A. Harding on 23 Nov 2008 16:02:44 -0800 |
On Sun, Nov 23, 2008 at 06:00:00PM -0500, Matthew Rosewarne wrote: > Instead of hacking together some script, just use finddup from the > "perforate" package. I agree with Mr. Rosewarne: using an existing command is probably the best solution. finddup is written in perl, which saves it from most of bash's filename quirks, and it uses the same basic method J.P. and I used: 1. Get a list of files 2. Look at the file size (J.P. and I didn't do this) 3. Compute MD5 checksum for files with the same file size 4. Remove files with the same file size Step two makes finddup run a lot faster on large files than J.P. or my code will and also adds a statistically insignificant amount of extra protection against accidental deletions: two files can have different contents but share a MD5 checksum; if that happens, they probably won't share the same file size, so finddup won't delete them. But step two also means finddup won't find a duplicate file if the original file is sparse and the duplicate is filled, or vice versa. I find that deliciously ironic for a program in the perforate package. :) A possible disadvantage of finddup is that its error messages are written in German. -Dave -- David A. Harding Website: http://dtrt.org/ 1 (609) 997-0765 Email: dave@dtrt.org Jabber/XMPP: dharding@jabber.org ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
|
|