Home > mailing lists

Re: Hash index todo list item - Mailing list pgsql-hackers

From	Brian Hurt
Subject	Re: Hash index todo list item
Date	September 7, 2007 15:10:50
Msg-id	46E1695D.5000909@janestcapital.com Whole thread Raw
In response to	Re: Hash index todo list item (Kenneth Marshall <ktm@rice.edu>)
Responses	Re: Hash index todo list item
List	pgsql-hackers

Tree view

Kenneth Marshall wrote:<br /><blockquote cite="mid20070907145507.GJ19403@it.is.rice.edu" type="cite"><blockquote
type="cite"><blockquotetype="cite"><pre wrap="">      </pre></blockquote><pre wrap="">How likely is it that you will
geta hash collision, two strings that are 
 
different that will hash to the same value?  To avoid this requires a very 
large hash key (128 bits, minimum)- otherwise you get into birthday attack 
problems.  With a 32-bit hash, the likelyhood is greater than 50% that two 
strings in a collection of 100,000 will hash to the same value.  With a 
64-bit hash, the likelyhood is greater than 50% that two strings in a 
collection of 10 billion will has to same value.  10 billion is a large 
number, but not an unreasonable number, of strings to want to put into a 
hash table- and it's exactly this case where the O(1) cost of hashtables 
starts being a real win.

Brian
   </pre></blockquote><pre wrap="">Yes, there is a non-negligible chance of collision (In a DB is there
any chance that is non-negligible? :) ) and the values must be checked
against the actual. The win is the collapse of the index size and only
needed to check a small fraction of the actual tuples.

 </pre></blockquote><br /> Ah, OK- I misunderstood you.  I thought you were saying that the hash values would need to
beunique, and you wouldn't check the original values at all.  My bad.<br /><br /> Brian<br /><br />

pgsql-hackers by date:

From: Markus Schiltknecht
Date: 07 September 2007, 15:02:33
Subject: terms for database replication: synchronous vs eager

From: Dave Page
Date: 07 September 2007, 15:17:01
Subject: Re: [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

Re: Hash index todo list item - Mailing list pgsql-hackers

Previous

Next