Home > mailing lists

Re: Using database to find file doublettes in my computer - Mailing list pgsql-general

From	Gerhard Heift
Subject	Re: Using database to find file doublettes in my computer
Date	November 18, 2008 08:42:41
Msg-id	20081118124228.GA9802@toaster.kawo1.rwth-aachen.de Whole thread Raw
In response to	Re: Using database to find file doublettes in my computer (Sam Mason <sam@samason.me.uk>)
List	pgsql-general

Tree view

On Tue, Nov 18, 2008 at 12:36:42PM +0000, Sam Mason wrote:
> On Mon, Nov 17, 2008 at 11:22:47AM -0800, Lothar Behrens wrote:
> > I have a problem to find as fast as possible files that are double or
> > in other words, identical.
> > Also identifying those files that are not identical.
>
> I'd probably just take a simple Unix command line approach, something
> like:
>
>   find /base/dir -type f -exec md5sum {} \; | sort | uniq -Dw 32

You save a little bit of time by using

find /base/dir -type f -print0 | xargs -0 md5sum | sort | uniq -Dw 32

> this will give you a list of files whose contents are identical
> (according to an MD5 hash).  An alternative would be to put the hashes
> into a database and run the matching up there.
>
>
>   Sam

Gerhard

Attachment

signature.asc

pgsql-general by date:

From: Sam Mason
Date: 18 November 2008, 08:36:49
Subject: Re: Using database to find file doublettes in my computer

From: Bill Moran
Date: 18 November 2008, 09:02:57
Subject: Re: FreeBSD 7 needing to allocate lots of shared memory

Re: Using database to find file doublettes in my computer - Mailing list pgsql-general

Attachment

Previous

Next