Stephen Gran on 22 Nov 2008 11:13:42 -0800 |
On Sat, Nov 22, 2008 at 01:32:17PM -0500, Art Alexion said: > I have a directory with a lot of files, a number of which are identical, > except for filename. What is the most efficient way to find (and ultimately > delete) the duplicates? Offhand, I'd say write a little script that creates a data structure like: files = { hash_of_some_sort_A = (file1, file2, file3), hash_of_some_sort_B = (file4, file5), } and so on, where hash_of_some_sort is an md5sum, sha1sum, mashup of size/date-stamp/whatever you are calling 'identical' in this context. Then iterate over the list, for each item in it, pop the first member off the array and put it somewhere else, delete what's left. It looks like 10 minutes work in perl, but if you need help, give a shout. -- -------------------------------------------------------------------------- | Stephen Gran | Let's not complicate our relationship | | steve@lobefin.net | by trying to communicate with each | | http://www.lobefin.net/~steve | other. | -------------------------------------------------------------------------- Attachment:
signature.asc ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
|
|