Art Alexion on 22 Nov 2008 18:16:52 -0800


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] finding duplicate files


On Sat, Nov 22, 2008 at 3:06 PM, JP Vossen <jp@jpsdomain.org> wrote:
>  > Date: Sat, 22 Nov 2008 13:32:17 -0500
>  > From: Art Alexion <art.alexion@gmail.com>
>  >
>  > I have a directory with a lot of files, a number of which are
>  > identical, except for filename.  What is the most efficient way to
>  > find (and ultimately delete) the duplicates?
>
> How about this?  TEST, TEST, TEST first!
>
> # Assumes a recent version of bash [for nested $()]
> # BACKUP, then capture md5 [1] hashes (don't put the output file in your
> CWD or you may recurs!)

This is kubuntu 8.04, which I understand uses dash instead of bash.
Should I be OK?


>
>
> ~~~~~~~~~~~~~~~~~~~~~~
> Interesting commands:
>
> * cut:  -d' ' uses space as the delimiter, -f3- for fields 3 to the end
> * uniq: -d shows only duplicated lines (hashes)
> * tail: -n+2 starts at line 2 and goes to the end (i.e., skips line 1)
> * $():  sub-shell, legacy as backticks ``, but those are harder to read
> and not nestable.  I've nested here.
> * for...done:   Takes each hash and greps for it, then give you just the
> file part
>
>
> This is a good one, I'll add it to the second edition of the _bash
> Cookbook_, if/when.  Let me know how you make out.

I'll give it a try.  If things go awry, the backed up copy of the
directory that the first couple of steps accomplish should assure I am
OK.


>
> Later,
> JP

Thanks.


-- 

--
artAlexion
sent unsigned from webmail interface
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug