Re: Hash index todo list item - Mailing list pgsql-hackers

From Mark Mielke
Subject Re: Hash index todo list item
Date
Msg-id 46E32897.8080106@mark.mielke.cc
Whole thread Raw
In response to Re: Hash index todo list item  (Kenneth Marshall <ktm@rice.edu>)
Responses Re: Hash index todo list item
Re: Hash index todo list item
List pgsql-hackers
Kenneth Marshall wrote:<br /><blockquote cite="mid:20070908214900.GB5679@it.is.rice.edu" type="cite"><pre wrap="">The
valueof storing the
 
actual value, possibly as an option, is that the key check can be done
in the index without requiring a heap lookup to check the actual value
which would be a benefit for a unique index. I agree that supporting
variable length data would complicate the index and reduce performance.
I am not willing to assume that ~1 entry per hash bucket is necessarily
what we want, at least without some testing.</pre></blockquote> I think that if the case of >1 entry per hash
becomescommon enough to be significant, and the key is stored in the hash, that a btree will perform equal or better,
andthere is no point in pursuing  such a hash index model. This is where we are today.<br /><br /><blockquote
cite="mid:20070908214900.GB5679@it.is.rice.edu"type="cite"><pre wrap="">It seems reasonable that
 
with the performance differences between L1/L2/L3 cache, main memory,
and the disk subsystem a higher load factor would result in better
overall system performance by reducing cache line misses and improving
hardware pre-fetch efficiency.</pre></blockquote> If the key is stored, all of these factors likely favor the btree
formatover the hash format. Again, this is where we are today.<br /><br /><blockquote
cite="mid:20070908214900.GB5679@it.is.rice.edu"type="cite"><pre wrap="">Along with the hypothetical performance
 
wins, the hash index space efficiency would be improved by a similar
factor. Obviously, all of these ideas would need to be tested in
various workload environments. In the large index arena, 10^6 to 10^9
keys and more, space efficiency will help keep the index manageable
in todays system memories. </pre></blockquote> Space efficiency is provided by not storing the key, nor the header data
required(length prefix?).<br /><blockquote cite="mid:20070908214900.GB5679@it.is.rice.edu" type="cite"><pre
wrap="">Pleasekeep the ideas and comments coming. I am certain that a synthesis
 
of them will provide an implementation with the performance characteristics
that we are seeking. </pre></blockquote> The subject interests me. :-)<br /><br /> Cheers,<br /> mark<br /><br /><pre
class="moz-signature"cols="72">-- 
 
Mark Mielke <a class="moz-txt-link-rfc2396E" href="mailto:mark@mielke.cc"><mark@mielke.cc></a>
</pre>

pgsql-hackers by date:

Previous
From: apoc9009
Date:
Subject: Re: [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
Next
From: Alvaro Herrera
Date:
Subject: Re: Just-in-time Background Writer Patch+Test Results