JP Vossen on 23 Nov 2008 12:54:00 -0800 |
> Date: Sat, 22 Nov 2008 17:43:24 -0500 > From: "K.S. Bhaskar" <bhaskar@bhaskars.com> > > Let's see if we can do it in 1 line... How about something like: > > find . -type f -exec md5sum {} \; | sort | uniq -d -w 32 I'd skip the find [1], since this does the same unless you need to recurs into directories, which Art didn't say: md5sum * | sort | uniq -d -w 32 But, this method does not work for me. At least, I don't think it does. My solution gives you a list of files to delete, skipping the first of the duplicates. This solution (either as written or my shorter one) only gives the first duplicated line. So you either miss some if you have more than 1 dup, or you have to keep running it until it doesn't find anything else to delete. 'uniq -D' helps that, but then I don't see how to break out different sets of dups so you can clean up the ones you don't want: $ md5sum * | sort | uniq -D -w 32 484bade6c8b3c8147cc03728af90b096 dup1 484bade6c8b3c8147cc03728af90b096 orig1 c04e01c8718c20c983f6cbf6f07911f8 dup2.a c04e01c8718c20c983f6cbf6f07911f8 dup2.b c04e01c8718c20c983f6cbf6f07911f8 dup2.c c04e01c8718c20c983f6cbf6f07911f8 dup2.d c04e01c8718c20c983f6cbf6f07911f8 orig2 $ md5sum * 484bade6c8b3c8147cc03728af90b096 dup1 c04e01c8718c20c983f6cbf6f07911f8 dup2.a c04e01c8718c20c983f6cbf6f07911f8 dup2.b c04e01c8718c20c983f6cbf6f07911f8 dup2.c c04e01c8718c20c983f6cbf6f07911f8 dup2.d 484bade6c8b3c8147cc03728af90b096 orig1 c04e01c8718c20c983f6cbf6f07911f8 orig2 c7da4fb9f3d537c45b12d3431ed21864 single1 c101e03d872787713f0d6ae169f616cb single2 $ md5sum * | sort | uniq -d -w 32 484bade6c8b3c8147cc03728af90b096 dup1 c04e01c8718c20c983f6cbf6f07911f8 dup2.a $ md5sum * | sort | uniq -D -w 32 484bade6c8b3c8147cc03728af90b096 dup1 484bade6c8b3c8147cc03728af90b096 orig1 c04e01c8718c20c983f6cbf6f07911f8 dup2.a c04e01c8718c20c983f6cbf6f07911f8 dup2.b c04e01c8718c20c983f6cbf6f07911f8 dup2.c c04e01c8718c20c983f6cbf6f07911f8 dup2.d c04e01c8718c20c983f6cbf6f07911f8 orig2 I must admit I either had forgotten about or wasn't aware of the uniq -w argument. That's very handy. And I noticed -D when re-reading the man page. And I certainly tend to come up with complicated solutions. Though in this case I started simple and kept adding testable layers until I got a solution. :-) Later, JP __________________________ [1] Note that 'find ... -exec {} \;' will swap a subshell for the exec for each hit, which is very, very slow. To pick a more clear and common example, never do this: find ... -exec chmod 0775 {} \; do this: find ... -print0 | xargs -0 chmod 0775 The -print0 and -0 use NULL as the field separator, which works around things like spaces in file/dir names. ----------------------------|:::======|------------------------------- JP Vossen, CISSP |:::======| jp{at}jpsdomain{dot}org My Account, My Opinions |=========| http://www.jpsdomain.org/ ----------------------------|=========|------------------------------- "Microsoft Tax" = the additional hardware & yearly fees for the add-on software required to protect Windows from its own poorly designed and implemented self, while the overhead incidentally flattens Moore's Law. ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
|
|