Re: [PLUG] finding duplicate files

David A. Harding on 23 Nov 2008 16:02:44 -0800

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] finding duplicate files

From: "David A. Harding" <dave@dtrt.org>

To: Philadelphia Linux User's Group Discussion List <plug@lists.phillylinux.org>

Subject: Re: [PLUG] finding duplicate files

Date: Sun, 23 Nov 2008 19:02:28 -0500

Reply-to: Philadelphia Linux User's Group Discussion List <plug@lists.phillylinux.org>

Sender: plug-bounces@lists.phillylinux.org

User-agent: Mutt/1.5.18 (2008-05-17)

On Sun, Nov 23, 2008 at 06:00:00PM -0500, Matthew Rosewarne wrote: > Instead of hacking together some script, just use finddup from the > "perforate" package. I agree with Mr. Rosewarne: using an existing command is probably the best solution. finddup is written in perl, which saves it from most of bash's filename quirks, and it uses the same basic method J.P. and I used: 1. Get a list of files 2. Look at the file size (J.P. and I didn't do this) 3. Compute MD5 checksum for files with the same file size 4. Remove files with the same file size Step two makes finddup run a lot faster on large files than J.P. or my code will and also adds a statistically insignificant amount of extra protection against accidental deletions: two files can have different contents but share a MD5 checksum; if that happens, they probably won't share the same file size, so finddup won't delete them. But step two also means finddup won't find a duplicate file if the original file is sparse and the duplicate is filled, or vice versa. I find that deliciously ironic for a program in the perforate package. :) A possible disadvantage of finddup is that its error messages are written in German. -Dave -- David A. Harding Website: http://dtrt.org/ 1 (609) 997-0765 Email: dave@dtrt.org Jabber/XMPP: dharding@jabber.org ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug

Follow-Ups:

Re: [PLUG] finding duplicate files
From: "Art Alexion" <art.alexion@gmail.com>

References:

[PLUG] finding duplicate files
From: Art Alexion <art.alexion@gmail.com>

Re: [PLUG] finding duplicate files
From: Matthew Rosewarne <mrosewarne@inoutbox.com>

Prev by Date: Re: [PLUG] finding duplicate files

Next by Date: Re: [PLUG] finding duplicate files

Previous by thread: Re: [PLUG] finding duplicate files

Next by thread: Re: [PLUG] finding duplicate files

Index(es):

Date

Thread