Home > mailing lists

Re: Hash Function: MD5 or other? - Mailing list pgsql-general

From	Alex Stapleton
Subject	Re: Hash Function: MD5 or other?
Date	June 14, 2005 06:07:00
Msg-id	34DEB1AE-95EE-4462-8E51-9FEE07A9C565@advfn.com Whole thread Raw
In response to	Hash Function: MD5 or other? (Peter Fein <pfein@pobox.com>)
Responses	Re: Hash Function: MD5 or other?
List	pgsql-general

Tree view

On 13 Jun 2005, at 23:49, Peter Fein wrote:

> Hi-
>
> I wanted to use a partially unique index (dependent on a flag) on a
> TEXT
> column, but the index row size was too big for btrees.  See the thread
> "index row size 2728 exceeds btree maximum, 2713" from the
> beginning of
> this month for someone with a similar problem.  In it, someone
> suggests
> indexing on a hash of the text.  I'm fine with this, as the texts in
> question are similar enough to each other to make collisions unlikely
> and a collision won't really cause any serious problems.
>
> My question is: is the builtin MD5 appropriate for this use or
> should I
> be using a function from pl/something?  Figures on collision rates
> would
> be nice as well - the typical chunk of text is probably 1k-8k.
>
> Thanks!
>

As others have said MD5 isn't the fastest one out there. However no
cryptographically  secure hashes are really that fast. However you
can probably get away with using a CRC hash which is long enough to
reduce your chances of collision a lot. However, PostgreSQL doesn't
have a built in CRC function, which is a bit of a pain unless your
prepared to implement one, or use pl/* to do it, which sounds like
overkill. I suggest you run some benchmarks on MD5 and see if it's
fast enough to meet your current (and perhaps future) needs.

You could of course, just use a hash index on your text field! I
think that would probably cope with larger data sets OK. It has the
advantage of handling collisions for you as well :) However it means
you have to transfer large amounts of data around, so if network
speed ever becomes a limitation, MD5 hashing (or enabling compression
on your PgSQL connection) may help.

> --
> Peter Fein                 pfein@pobox.com
> 773-575-0694
>
> Basically, if you're not a utopianist, you're a schmuck. -J. Feldman
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>
>
>

pgsql-general by date:

From: Ioannis Theoharis
Date: 14 June 2005, 06:06:16
Subject: Re: suse 9.2

From: Együd Csaba
Date: 14 June 2005, 08:36:22
Subject: Re: PG 8.0.1 is getting slow in 24 hours. Only daily VACUUM

Re: Hash Function: MD5 or other? - Mailing list pgsql-general

Previous

Next