Stephen Gran on 22 Nov 2008 11:13:42 -0800


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] finding duplicate files


On Sat, Nov 22, 2008 at 01:32:17PM -0500, Art Alexion said:
> I have a directory with a lot of files, a number of which are identical, 
> except for filename.  What is the most efficient way to find (and ultimately 
> delete) the duplicates?

Offhand, I'd say write a little script that creates a data structure
like:

files = {
  hash_of_some_sort_A = (file1, file2, file3),
  hash_of_some_sort_B = (file4, file5),
}

and so on, where hash_of_some_sort is an md5sum, sha1sum, mashup of
size/date-stamp/whatever you are calling 'identical' in this context.
Then iterate over the list, for each item in it, pop the first member
off the array and put it somewhere else, delete what's left.  It looks
like 10 minutes work in perl, but if you need help, give a shout.
-- 
 --------------------------------------------------------------------------
|  Stephen Gran                  | Let's not complicate our relationship   |
|  steve@lobefin.net             | by trying to communicate with each      |
|  http://www.lobefin.net/~steve | other.                                  |
 --------------------------------------------------------------------------

Attachment: signature.asc
Description: Digital signature

___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug