Home > mailing lists

Re: Hash Function: MD5 or other? - Mailing list pgsql-general

From	Shelby Cain
Subject	Re: Hash Function: MD5 or other?
Date	June 13, 2005 21:55:40
Msg-id	20050614005520.35675.qmail@web50103.mail.yahoo.com Whole thread Raw
In response to	Hash Function: MD5 or other? (Peter Fein <pfein@pobox.com>)
Responses	Re: Hash Function: MD5 or other?
List	pgsql-general

Tree view

--- Peter Fein <pfein@pobox.com> wrote:

> Hi-
>
> I wanted to use a partially unique index (dependent on a flag) on a
> TEXT
> column, but the index row size was too big for btrees.  See the
> thread
> "index row size 2728 exceeds btree maximum, 2713" from the beginning
> of
> this month for someone with a similar problem.  In it, someone
> suggests
> indexing on a hash of the text.  I'm fine with this, as the texts in
> question are similar enough to each other to make collisions unlikely
> and a collision won't really cause any serious problems.
>
> My question is: is the builtin MD5 appropriate for this use or should
> I
> be using a function from pl/something?  Figures on collision rates
> would
> be nice as well - the typical chunk of text is probably 1k-8k.
>
> Thanks!
>
> --

I believe the odds of two arbitrary inputs yielding the same MD5 hash
would be 1 in 2^128.  Even though the odds of collision are small
you'll want to write your query such that use use the index on the hash
and then filter on the text field to guarantee you get the result you
are interested in.

Regards,

Shelby Cain

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

pgsql-general by date:

From: Tom Lane
Date: 13 June 2005, 20:58:39
Subject: Re: pg_dumpall not working?

From: stig erikson
Date: 13 June 2005, 22:25:18
Subject: postgresql rpms (7.4.2, 7.4.5, 7.4.8) for redhat 8.0

Re: Hash Function: MD5 or other? - Mailing list pgsql-general

Previous

Next