[GSoC08]some detail plan of improving hash index - Mailing list pgsql-hackers

From Xiao Meng
Subject [GSoC08]some detail plan of improving hash index
Date
Msg-id ded849dd0805151942j5b66d480u3f06aceae2b464af@mail.gmail.com
Whole thread Raw
Responses Re: [GSoC08]some detail plan of improving hash index  (Kenneth Marshall <ktm@rice.edu>)
Re: [GSoC08]some detail plan of improving hash index  (Josh Berkus <josh@agliodbs.com>)
List pgsql-hackers
Hi, hackers.<br /><br />I'm reading the source codes of hash and reviewing Neil's old patch of improving hash index.<br
/>Hereis some detail plan. I'm trying to adjust Neil's patch to the current version of PostgreSQL first. I'm not quite
familarwith the code yet, so please make some comment.<br /><br />* Phase 1. Just store hash value instead of hash
keys<br/><br />First, define a macro to make it optional.<br /><br />Second, add a new function _hash_form_item to
constructIndexTuple with hash code to replace index_form_tuple uaed in hash access method. It seems easy since we
did'ntneed to deal with TOAST.<br /><br />Third, modify _hash_checkqual. We can first compare the hash value; if it's
thesame, we compare the real key value.<br />Also, HashScanOpaqueData adds an element hashso_sk_hash to hold scan key's
hashvalue to support scan function.<br /><br />Finally, it seems the system catalog pg_amop also need to be
modified.<br/>In Neil's patch, he set the amopreqcheck to be True. <br />In the documents, it means index hit must be
recheckedin the document. But I'm not so clear. Does it just means we need to recheck the value when use 
_hash_chechqual? <br /><br /><br />* Phase 2. Sort the hash value when insert into the bucket and use binary search
whenscan<br />Add a function _hash_binsearch to support binary search in a bucket;<br />It involved in all functions
whenwe need to search, insert and delete. <br /><br />* Phase 3. When it's necessary, store the real value.<br />When
weinsert a new index tuple , for example tp , to a bucket, we can check whether there's the same hash code.<br />If
there'salready an index tuple with such a hash code, we store both the hash code and real key of tp.<br />
Alternatively,we add a flag to represent the tuple stores a real value or just hash code. It seems a little complex.<br
/><br/>Phase 1 seems extremely easy. I'm trying to do it first. <br />Additionally, I need a benchmark to test the
performance.It seems there's some tools list in <a
href="http://wiki.postgresql.org/wiki/Performances_QA_testing">http://wiki.postgresql.org/wiki/Performances_QA_testing</a>
.Any advice?<br clear="all" /><br />-- <br />Have a good day;-)<br />Best Regards,<br />Xiao Meng<br /><br
/>━━━━━━━━━━━━━━━━━━━━━━━<br/>Data and Knowledge Engineering Research Center<br />Harbin Institute of Technology,
China<br/>Gtalk: <a href="mailto:mx.cogito@gmail.com" target="_blank">mx.cogito@gmail.com</a><br /> MSN: <a
href="mailto:cnEnder@live.com"target="_blank">cnEnder@live.com</a><br /><a href="http://xiaomeng.yo2.cn"
target="_blank"></a>

pgsql-hackers by date:

Previous
From: Neil Conway
Date:
Subject: Re: What to do with inline warnings?
Next
From: "Pavan Deolasee"
Date:
Subject: Re: deadlock while doing VACUUM and DROP