Re: Hash Function: MD5 or other? - Mailing list pgsql-general

From Alex Stapleton
Subject Re: Hash Function: MD5 or other?
Date
Msg-id 34DEB1AE-95EE-4462-8E51-9FEE07A9C565@advfn.com
Whole thread Raw
In response to Hash Function: MD5 or other?  (Peter Fein <pfein@pobox.com>)
Responses Re: Hash Function: MD5 or other?
List pgsql-general
On 13 Jun 2005, at 23:49, Peter Fein wrote:


> Hi-
>
> I wanted to use a partially unique index (dependent on a flag) on a
> TEXT
> column, but the index row size was too big for btrees.  See the thread
> "index row size 2728 exceeds btree maximum, 2713" from the
> beginning of
> this month for someone with a similar problem.  In it, someone
> suggests
> indexing on a hash of the text.  I'm fine with this, as the texts in
> question are similar enough to each other to make collisions unlikely
> and a collision won't really cause any serious problems.
>
> My question is: is the builtin MD5 appropriate for this use or
> should I
> be using a function from pl/something?  Figures on collision rates
> would
> be nice as well - the typical chunk of text is probably 1k-8k.
>
> Thanks!
>

As others have said MD5 isn't the fastest one out there. However no
cryptographically  secure hashes are really that fast. However you
can probably get away with using a CRC hash which is long enough to
reduce your chances of collision a lot. However, PostgreSQL doesn't
have a built in CRC function, which is a bit of a pain unless your
prepared to implement one, or use pl/* to do it, which sounds like
overkill. I suggest you run some benchmarks on MD5 and see if it's
fast enough to meet your current (and perhaps future) needs.

You could of course, just use a hash index on your text field! I
think that would probably cope with larger data sets OK. It has the
advantage of handling collisions for you as well :) However it means
you have to transfer large amounts of data around, so if network
speed ever becomes a limitation, MD5 hashing (or enabling compression
on your PgSQL connection) may help.


> --
> Peter Fein                 pfein@pobox.com
> 773-575-0694
>
> Basically, if you're not a utopianist, you're a schmuck. -J. Feldman
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>
>
>



pgsql-general by date:

Previous
From: Ioannis Theoharis
Date:
Subject: Re: suse 9.2
Next
From: Együd Csaba
Date:
Subject: Re: PG 8.0.1 is getting slow in 24 hours. Only daily VACUUM