Thread: Storing a file hash as primary key

Storing a file hash as primary key

From
Eduardo Pérez Ureta
Date:
I was wondering what the best way is to store a file hash (MD5 or SHA1)
and make it primary key indexed.
I have seen some people storing the hexadecimal encoded MD5 in a
CHAR(32) but it may be a better idea to use a CHAR(16) without encoding
the string, but that may cause some problems.

What do you recommend?
Do you have any experiences storing file hashes in a database?
Do you know any good opensource software that stores file hashes in the
database (to take a look)?

Re: Storing a file hash as primary key

From
Greg Stark
Date:
Eduardo Pérez Ureta <eperez@it.uc3m.es> writes:

> I was wondering what the best way is to store a file hash (MD5 or SHA1)
> and make it primary key indexed.
> I have seen some people storing the hexadecimal encoded MD5 in a
> CHAR(32) but it may be a better idea to use a CHAR(16) without encoding
> the string, but that may cause some problems.

I would say either char(32) or bytea(16). Not char(16) since you don't want to
treat the raw binary data using any specific character encoding or sort it
according to any locale specific rules etc.

Personally I would have preferred bytea(16) but for some reason the php
drivers seem to jut drop NULL there when I try to store raw binary md5 hashes.
So for now I just declared it bytea with no length specification and store the
hex encoded hash.

If anyone knows how to get Pear::DB to store binary data in a bytea column, by
all means.

--
greg

Re: Storing a file hash as primary key

From
Joe Conway
Date:
Greg Stark wrote:
> Personally I would have preferred bytea(16) but for some reason the php
> drivers seem to jut drop NULL there when I try to store raw binary md5 hashes.
> So for now I just declared it bytea with no length specification and store the
> hex encoded hash.
>
> If anyone knows how to get Pear::DB to store binary data in a bytea column, by
> all means.

Did you try using pg_escape_bytea()?

Joe