Thread: [PATCH]-hash index improving
The patch store hash code only in the index tuple. It based on Neil Conway's patch with an old version of PostgreSQL. It passes the regression test but I didn't test the performance yet. Anyone interested can make a performance test;-) You can undefine the macro HASHVALUE_ONLY in hash.h to get the original implementation. It's a preliminary implementation and I'm looking for input here. Hope to hear from you. -- Best Regards, Xiao Meng DKERC, Harbin Institute of Technology, China Gtalk: mx.cogito@gmail.com MSN: cnEnder@live.com http://xiaomeng.yo2.cn
Attachment
On Thu, Jul 17, 2008 at 5:26 AM, Xiao Meng <mx.cogito@gmail.com> wrote: > The patch store hash code only in the index tuple. > It based on Neil Conway's patch with an old version of PostgreSQL. > It passes the regression test but I didn't test the performance yet. > Anyone interested can make a performance test;-) > You can undefine the macro HASHVALUE_ONLY in hash.h to get the > original implementation. > It's a preliminary implementation and I'm looking for input here. > Hope to hear from you. Tom, Simon, Heikki, Greg, we need to make sure this SoC project succeeds and would appreciate your review and input. Based on some of Kenneth Marshall and Tom Raney's past hash index test cases, Xiao and I are going to work on some benchmarks for measuring the performance-related aspects of this project. Thanks! -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com Edison, NJ 08837 | http://www.enterprisedb.com/
Xiao Meng escribió: > The patch store hash code only in the index tuple. > It based on Neil Conway's patch with an old version of PostgreSQL. > It passes the regression test but I didn't test the performance yet. > Anyone interested can make a performance test;-) > You can undefine the macro HASHVALUE_ONLY in hash.h to get the > original implementation. I think having the HASHVALUE_ONLY define is not a good idea -- it just makes the patch harder to read. I suggest just removing the old code and putting the new code in place. (That's why we have revision control.) -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Thu, Jul 17, 2008 at 12:42 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > I think having the HASHVALUE_ONLY define is not a good idea -- it just > makes the patch harder to read. I suggest just removing the old code > and putting the new code in place. (That's why we have revision > control.) Agreed. I'm also getting a crash on it with large data sets, so we'll make sure to get those fixed in the next version of the patch. -Jonah
On Thu, Jul 17, 2008 at 12:42:39PM -0400, Alvaro Herrera wrote: > Xiao Meng escribi?: > > The patch store hash code only in the index tuple. > > It based on Neil Conway's patch with an old version of PostgreSQL. > > It passes the regression test but I didn't test the performance yet. > > Anyone interested can make a performance test;-) > > You can undefine the macro HASHVALUE_ONLY in hash.h to get the > > original implementation. > > I think having the HASHVALUE_ONLY define is not a good idea -- it just > makes the patch harder to read. I suggest just removing the old code > and putting the new code in place. (That's why we have revision > control.) > One thing it helps is building an old version and a new version for comparative testing. Otherwise, you could end up with an apples-to- oranges comparison. I certainly think that the final patch should not have it, but it is useful now for testing and comparisons. My two cents, Ken
On Thu, Jul 17, 2008 at 02:00:07PM -0400, Jonah H. Harris wrote: > On Thu, Jul 17, 2008 at 1:54 PM, Kenneth Marshall <ktm@rice.edu> wrote: > >> I think having the HASHVALUE_ONLY define is not a good idea -- it just > >> makes the patch harder to read. I suggest just removing the old code > >> and putting the new code in place. (That's why we have revision > >> control.) > >> > > One thing it helps is building an old version and a new version > > for comparative testing. Otherwise, you could end up with an apples-to- > > oranges comparison. I certainly think that the final patch should not > > have it, but it is useful now for testing and comparisons. > > Yes, that's why Xiao did it that way. However, we traditionally just > submit a patch with only the changes and it's up to the person testing > to have an identical build-tree without the patch for testing. > Another reason for it is that even if you build without the define, > the patch author may have mistakenly added something outside the ifdef > which could impact testing. > > I agree with Alvaro that we should submit it as a standard change patch. Okay, that makes sense. Ken
On Thu, Jul 17, 2008 at 1:54 PM, Kenneth Marshall <ktm@rice.edu> wrote: >> I think having the HASHVALUE_ONLY define is not a good idea -- it just >> makes the patch harder to read. I suggest just removing the old code >> and putting the new code in place. (That's why we have revision >> control.) >> > One thing it helps is building an old version and a new version > for comparative testing. Otherwise, you could end up with an apples-to- > oranges comparison. I certainly think that the final patch should not > have it, but it is useful now for testing and comparisons. Yes, that's why Xiao did it that way. However, we traditionally just submit a patch with only the changes and it's up to the person testing to have an identical build-tree without the patch for testing. Another reason for it is that even if you build without the define, the patch author may have mistakenly added something outside the ifdef which could impact testing. I agree with Alvaro that we should submit it as a standard change patch. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com Edison, NJ 08837 | http://www.enterprisedb.com/
Kenneth Marshall escribió: > On Thu, Jul 17, 2008 at 12:42:39PM -0400, Alvaro Herrera wrote: > > I think having the HASHVALUE_ONLY define is not a good idea -- it just > > makes the patch harder to read. I suggest just removing the old code > > and putting the new code in place. (That's why we have revision > > control.) > > > One thing it helps is building an old version and a new version > for comparative testing. Otherwise, you could end up with an apples-to- > oranges comparison. I certainly think that the final patch should not > have it, but it is useful now for testing and comparisons. For this purpose I think it would be easier to have a separate tree with the patch, and one without it. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Thu, Jul 17, 2008 at 5:26 AM, Xiao Meng <mx.cogito@gmail.com> wrote: > The patch store hash code only in the index tuple. > It based on Neil Conway's patch with an old version of PostgreSQL. > It passes the regression test but I didn't test the performance yet. > Anyone interested can make a performance test;-) > You can undefine the macro HASHVALUE_ONLY in hash.h to get the > original implementation. > It's a preliminary implementation and I'm looking for input here. > Hope to hear from you. I've spent some time today performing tests similar to those mentioned here (http://archives.postgresql.org/pgsql-hackers/2007-09/msg00208.php) Using a word list of 2650024 unique words (maximum length is 118 bytes), build times are still high, but I'm not really seeing any performance improvements over b-tree. I haven't profiled it yet, but my test is as follows: - Created the dict table - Loaded the dict table - Counted the records in the dict table - Created the index - Shutdown the database - Randomly selected 200 entries from the word list and built a file full of (SELECT * FROM dict WHERE word = '<word>') queries using them. - Cleared out the kernel cache - Started the database - Ran the query file The result of this is between 5-10ms improvement in the overall execution time of all 200 queries. The time-per-query is practically unnoticeable. As this is in the range of noise, methinks there's a larger problem with hash indexes. I haven't looked heavily into their implementation, but do you any of you know of any major design flaws? -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com Edison, NJ 08837 | http://www.enterprisedb.com/
On Thu, Jul 17, 2008 at 04:24:28PM -0400, Jonah H. Harris wrote: > On Thu, Jul 17, 2008 at 5:26 AM, Xiao Meng <mx.cogito@gmail.com> wrote: > > The patch store hash code only in the index tuple. > > It based on Neil Conway's patch with an old version of PostgreSQL. > > It passes the regression test but I didn't test the performance yet. > > Anyone interested can make a performance test;-) > > You can undefine the macro HASHVALUE_ONLY in hash.h to get the > > original implementation. > > It's a preliminary implementation and I'm looking for input here. > > Hope to hear from you. > > I've spent some time today performing tests similar to those mentioned > here (http://archives.postgresql.org/pgsql-hackers/2007-09/msg00208.php) > > Using a word list of 2650024 unique words (maximum length is 118 > bytes), build times are still high, but I'm not really seeing any > performance improvements over b-tree. I haven't profiled it yet, but > my test is as follows: > > - Created the dict table > - Loaded the dict table > - Counted the records in the dict table > - Created the index > - Shutdown the database > - Randomly selected 200 entries from the word list and built a file > full of (SELECT * FROM dict WHERE word = '<word>') queries using them. > - Cleared out the kernel cache > - Started the database > - Ran the query file > > The result of this is between 5-10ms improvement in the overall > execution time of all 200 queries. The time-per-query is practically > unnoticeable. As this is in the range of noise, methinks there's a > larger problem with hash indexes. I haven't looked heavily into their > implementation, but do you any of you know of any major design flaws? > Jonah, Thank you for running these tests. I was trying to reproduce my initial tests on the original system to make it more apples to apples, but the latest release needs more resources semaphore-wise than the 8.2 and to fix it on Solaris 8 I will need a reboot. Would you mind posting the timings for the hash_only index versus the hash_value versus the btree index for the same test. Also, what is the on-disk size of all three indexes? This will allow us to figure out the bucket/page load or fill-factor for each scenario. The basic implementation looked reasonable. I will take a look at the patch this evening. Regards, Ken
On Thu, 2008-07-17 at 16:24 -0400, Jonah H. Harris wrote: > I'm not really seeing any performance improvements over b-tree. I'd like to see a theoretical analysis on the benefit case before we spend too many teraflops on investigation. In which cases do we expect that hash indexes will beat btrees? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
-----Original Message----- > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Simon Riggs > Sent: Thursday, July 17, 2008 4:10 PM > To: Jonah H. Harris > Cc: Xiao Meng; pgsql-hackers@postgresql.org; Kenneth Marshall > Subject: Re: [HACKERS] [PATCH]-hash index improving > > > On Thu, 2008-07-17 at 16:24 -0400, Jonah H. Harris wrote: > > I'm not really seeing any performance improvements over b-tree. > > I'd like to see a theoretical analysis on the benefit case before we > spend too many teraflops on investigation. > > In which cases do we expect that hash indexes will beat btrees? Large table unique index equality search should be very fast with hashed index (and the only place where any advantage will be seen). Hashed indexes are useless for any search besides equality and gain more and more when the levels of the b-tree index increase. Here is a hash index lookup that is frightfully fast: http://www.corpit.ru/mjt/tinycdb.html It's a constant database, but the file format might be worth examination. Here is a quickie definition of the CDB format: http://cr.yp.to/cdb/cdb.txt I think that the problem with hashed indexes tends to be the blocking of index pages on disk. For instance, the FastDB/GigaBase author was able to make FastDB memory based hash indexes that are faster than the memory based b-tree versions, but not for the disk based GigaBase database. The only way to get better performance from hash based indexes is to read fewer index pages than if a tree-based index were used. So I think that the scheme used to create the index pages is the focus to make them worthwhile.
On Thu, Jul 17, 2008 at 02:11:20PM -0400, Alvaro Herrera wrote: > Kenneth Marshall escribió: > > On Thu, Jul 17, 2008 at 12:42:39PM -0400, Alvaro Herrera wrote: > > > > I think having the HASHVALUE_ONLY define is not a good idea -- > > > it just makes the patch harder to read. I suggest just removing > > > the old code and putting the new code in place. (That's why we > > > have revision control.) > > > > > One thing it helps is building an old version and a new version > > for comparative testing. Otherwise, you could end up with an > > apples-to- oranges comparison. I certainly think that the final > > patch should not have it, but it is useful now for testing and > > comparisons. > > For this purpose I think it would be easier to have a separate tree > with the patch, and one without it. Here's one tree. Anyone can get an initial copy as follows: git clone http://git.postgresql.org/git/~davidfetter/hash/.git Xiao Meng, if you let me know where your git repo is, say by cloning onto a machine I can see from the internet and applying your patches to it, I can pull and let others see it :) Yes, I know it's a little cumbersome, but we'll get something slicker as we figure out what "slicker" really should mean. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Thu, Jul 17, 2008 at 7:37 PM, Dann Corbit <DCorbit@connx.com> wrote: >> In which cases do we expect that hash indexes will beat btrees? > > Large table unique index equality search should be very fast with hashed > index (and the only place where any advantage will be seen). Yes, this is the exact use-case. Likewise, Dan has provided a good description regarding the primary implementation goals of a disk-based hash table. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com Edison, NJ 08837 | http://www.enterprisedb.com/
Alvaro Herrera <alvherre@commandprompt.com> writes: > Xiao Meng escribi�: >> You can undefine the macro HASHVALUE_ONLY in hash.h to get the >> original implementation. > I think having the HASHVALUE_ONLY define is not a good idea -- it just > makes the patch harder to read. While we are griping about readability: failure to update the comments to match the code is NOT, NOT, NOT acceptable. I had barely started to read the patch before encountering this insult to the reader: /* Hash indexes are never lossy (at the moment anyway)*/ - scan->xs_recheck = false; +#ifdef HASHVALUE_ONLY + scan->xs_recheck = true; +#else + scan->xs_recheck = false; +#endif The fact that the patch doesn't touch backend/access/hash/README is already grounds for rejection, but can't you be bothered to fix a comment that is literally one line away from where you are making it wrong? regards, tom lane
On Fri, Jul 18, 2008 at 1:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > While we are griping about readability: failure to update the comments > to match the code is NOT, NOT, NOT acceptable. I had barely started > to read the patch before encountering this insult to the reader: > ... As this is Xiao's first patch to the community, I'd appreciate it if you guys would chill a little. I'm fully aware of what needs to be done with it clean-up wise, but given that we're under some time constraints, I asked that he submit his preliminary patch for those who wanted to preview/test it. Again, this patch is nowhere near acceptance quality, nor was it intended to be. I'll make sure he gets the next one cleaned up prior to submitting it. The real question now has been listed above; why are hash index queries, including this patch, no better than b-tree even for straight equality comparisons? Because hash is lossy, we could easily be performing multiple I/Os for recheck, and that may be causing some of the performance problems. I haven't checked I/O for that yet, but was wondering if you (Tom) knew of any major design/implementation deficiency we should be taking into consideration regarding PG's hash index implementation? -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com Edison, NJ 08837 | http://www.enterprisedb.com/
"Jonah H. Harris" <jonah.harris@gmail.com> writes: > The real question now has been listed above; why are hash index > queries, including this patch, no better than b-tree even for straight > equality comparisons? Well, the theoretical advantage is that a hash probe is O(1) while a btree probe is O(log N) (ignoring a lot of possible nonoptimalities on each side of course). So you'd need a test case large enough for log N to be noticeably more than 1, which I think means in practice that the upper levels of the btree don't fit in memory. So small test cases aren't going to show any improvement. I think the historical problem with our hash implementation has been that it was terribly space-inefficient, because of the fixed (and large) bucket size. A dense btree index can probably beat a sparse hash up to exceedingly large values of N. It's not clear to me how much the patch at hand does to fix that. The patch might introduce a new problem, which is extra heap visits caused by the lossiness of the hash value. Again, that doesn't hurt much ideally, but maybe you chanced to use a test case where it does hurt. It would be worth sticking in a bit of debug instrumentation to see whether the number of failed heap visits is significant. In any case, the reported test seems to be dealing with index sizes in the tens-of-megabytes range, and it doesn't surprise me a whole lot that an O(log N) penalty isn't showing up there. Try a test case where the index size is significantly larger than your RAM. regards, tom lane
On Thu, 2008-07-17 at 16:37 -0700, Dann Corbit wrote: > Large table unique index equality search should be very fast with hashed > index (and the only place where any advantage will be seen). Hashed > indexes are useless for any search besides equality and gain more and > more when the levels of the b-tree index increase. I think a comparison with a btree using a functional index should be shown. > The only way to get better performance from hash based indexes is to > read fewer index pages than if a tree-based index were used. So I think > that the scheme used to create the index pages is the focus to make them > worthwhile. Agreed. Some math on that, plus a clear focus on making this faster than a btree is critical to this project. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
"Simon Riggs" <simon@2ndquadrant.com> writes: > On Thu, 2008-07-17 at 16:37 -0700, Dann Corbit wrote: > >> Large table unique index equality search should be very fast with hashed >> index (and the only place where any advantage will be seen). Hashed >> indexes are useless for any search besides equality and gain more and >> more when the levels of the b-tree index increase. > > I think a comparison with a btree using a functional index should be > shown. To do that you'll have to think about the use cases you think hash should win on. For cpu-bound databases with small indexes there might be a win if you can avoid the binary search of all the elements on a page. (Have we modified btree to do that or does it still scan sequentially on the leaf pages?) For i/o-bound databases with very large indexes there should be an opportunity where btree lookups are O(logn) and hash lookups can in theory be O(1). However to get O(1) hash lookups need to do extra work at insert time. If they don't expand the hash as necessary then they end up just being a linear speedup to whatever lookup algorithm is used to scan the buckets. That isn't going to win over btree which is already doing a binary search. The extra work on insert time is O(nlogn) amortized, but I'm not sure good amortized performance is good enough for Postgres. Users are unhappy when they're average performance is good but 1/1000 inserts thrashes their i/o rewriting the whole index... -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!
On Fri, 2008-07-18 at 11:07 +0100, Gregory Stark wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > hash lookups can in theory be O(1). I'm not sure whether that applies here? I'm interested in how *this* patch will work, not in more generic algorithm theory. To patch authors: Can we please see a table showing expected number of logical I/Os (i,e, block accesses) for btrees and hash indexes e.g. for 100-byte rows... rows btree hash ---- ----- ---- 10^2 10^3 10^4 10^5 10^6 10^7 10^8 -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Gregory Stark escribió: > For cpu-bound databases with small indexes there might be a win if you can > avoid the binary search of all the elements on a page. (Have we modified btree > to do that or does it still scan sequentially on the leaf pages?) Hmm? It has used binary search since as long as I can remember ... see _bt_first and _bt_binsrch. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Gregory Stark wrote: > For i/o-bound databases with very large indexes there should be an opportunity > where btree lookups are O(logn) and hash lookups can in theory be O(1). Ignoring the big-O complexity, if a hash index only stores a 32-bit hash code instead of the whole key, it could be a big win in storage size, and therefore in cache-efficiency and performance, when the keys are very long. Granted, it's not very common to use a 1K text field as a key column... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
"Heikki Linnakangas" <heikki@enterprisedb.com> writes: > Gregory Stark wrote: >> For i/o-bound databases with very large indexes there should be an opportunity >> where btree lookups are O(logn) and hash lookups can in theory be O(1). > > Ignoring the big-O complexity, if a hash index only stores a 32-bit hash code > instead of the whole key, it could be a big win in storage size, and therefore > in cache-efficiency and performance, when the keys are very long. I think it has to show an improvement over an expression index over (hashany(col)) and not just an improvement over an index over "col" due to col being large. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!
On Fri, Jul 18, 2008 at 10:44 AM, Heikki Linnakangas <heikki@enterprisedb.com> wrote: > Ignoring the big-O complexity, if a hash index only stores a 32-bit hash > code instead of the whole key, it could be a big win in storage size, and > therefore in cache-efficiency and performance, when the keys are very long. Agreed. My thinking is that there's either something inherently wrong with the implementation, or we're performing so many disk I/Os that it's nearly equivalent to b-tree. Tom has a couple suggestions which Xiao and I will explore. > Granted, it's not very common to use a 1K text field as a key column... Especially for direct equality comparison :) -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com Edison, NJ 08837 | http://www.enterprisedb.com/
"Jonah H. Harris" <jonah.harris@gmail.com> writes: > Agreed. My thinking is that there's either something inherently wrong > with the implementation, or we're performing so many disk I/Os that > it's nearly equivalent to b-tree. Tom has a couple suggestions which > Xiao and I will explore. I finally got a chance to look through the patch in some detail. If I haven't missed something, there are just two improvement ideas embodied in it: 1. Store just the hash value, and not the index key value, in hash index entries. (This means that all hash indexscans become lossy and have to be rechecked at the heap.) 2. Keep the contents of each index page ordered by hash value, and use binary instead of linear search to find the matching item(s) during an indexscan. (This depends on #1 because recomputing the hash values during the binary search would be too expensive --- although you could also make it work by storing *both* the hash value and the original key.) I suppose that the main point of #1 is to reduce index size by allowing more tuples to fit in each bucket. However, the patch neglects to adjust the target-fillfactor calculation in _hash_metapinit, which means that the code won't actually put any more tuples per bucket (on average) than it did before. So the index isn't actually any smaller and you get no I/O savings --- you just have more unused space on a typical page. Fixing that might help. FWIW, I had always assumed that part of the solution to hash's woes would involve decoupling the bucket size from the page size, so that you could have multiple buckets per page. But maybe the binary-search idea makes that unnecessary. I'm not sure. A whole lot depends on how evenly the buckets get filled. You should probably investigate how many tuples actually end up in each bucket with and without the patch. In the realm of micro-optimizations that might be significant, I think you really need to get rid of all those _create_hash_desc calls, particularly the one in _hash_checkqual which is part of the inner loop of an indexscan. Not only are they slow but they probably represent a serious memory leak in a scan that returns many tuples. For reading the hash value out of an existing index tuple, I don't think you should be bothering with a tupdesc at all --- don't use index_getattr, just map a C struct onto the known layout of a indextuple with a single never-null int field. This would particularly make for a noticeable improvement in the speed of _hash_binsearch. The tupdescs used in storing an index entry are probably needed, but you could just use a single statically declared tupdesc (look at the ones in relcache.c for examples of building one as a constant). regards, tom lane
I just ran my original 16M word test case against the patched version, and like Tom noted below, the tuples per bucket calculation is wrong which results in identical index sizes for both the original version and the hash-value-only version. > I suppose that the main point of #1 is to reduce index size by On Fri, Jul 18, 2008 at 12:23:25PM -0400, Tom Lane wrote: > "Jonah H. Harris" <jonah.harris@gmail.com> writes: > > Agreed. My thinking is that there's either something inherently wrong > > with the implementation, or we're performing so many disk I/Os that > > it's nearly equivalent to b-tree. Tom has a couple suggestions which > > Xiao and I will explore. > > I finally got a chance to look through the patch in some detail. > If I haven't missed something, there are just two improvement ideas > embodied in it: > > 1. Store just the hash value, and not the index key value, in hash > index entries. (This means that all hash indexscans become lossy > and have to be rechecked at the heap.) > > 2. Keep the contents of each index page ordered by hash value, and use > binary instead of linear search to find the matching item(s) during an > indexscan. (This depends on #1 because recomputing the hash values > during the binary search would be too expensive --- although you could > also make it work by storing *both* the hash value and the original > key.) > > allowing more tuples to fit in each bucket. However, the patch > neglects to adjust the target-fillfactor calculation in _hash_metapinit, > which means that the code won't actually put any more tuples per bucket > (on average) than it did before. So the index isn't actually any > smaller and you get no I/O savings --- you just have more unused > space on a typical page. Fixing that might help. > > FWIW, I had always assumed that part of the solution to hash's woes > would involve decoupling the bucket size from the page size, so > that you could have multiple buckets per page. But maybe the > binary-search idea makes that unnecessary. I'm not sure. A whole > lot depends on how evenly the buckets get filled. You should probably > investigate how many tuples actually end up in each bucket with and > without the patch. > I think that while the binary-search idea will improve the lookup over the original sequential scan of the bucket, it makes updates much more expensive particularly with buckets approaching 100% full. The idea that I have been mulling over tries to improve access times by breaking a bucket in mini-virtual buckets within a page. We restrict the size of the mini-bucket to be pagesize/(1/2^n). The sweet spot should be around n=6 or 7 which for an 8k pagesize yields a mini-bucket size of 32 or 64 bytes. Then the search for the value in a page is to read the virtual bucket corresponding to the n bits of the hash value. The second piece is to take advantage of the fact that the size of the mini-bucket is not an even multiple of the size of a hash index tuple and aggregate all the lost space for use as the "first" overflow page for all of a pages mini-buckets. This avoids the I/O needed to read a full overflow page from disk and accomodates the imperfections in the hash function distribution. The overflow pages, both the virtual "first" and subsequent real pages would benefit from the binary lookups. It may also be worth storing the high and low hash values specially to avoid the search in a page if its value would not be on the page. > In the realm of micro-optimizations that might be significant, I think > you really need to get rid of all those _create_hash_desc calls, > particularly the one in _hash_checkqual which is part of the inner loop > of an indexscan. Not only are they slow but they probably represent a > serious memory leak in a scan that returns many tuples. For reading the > hash value out of an existing index tuple, I don't think you should be > bothering with a tupdesc at all --- don't use index_getattr, just map a > C struct onto the known layout of a indextuple with a single never-null > int field. This would particularly make for a noticeable improvement in > the speed of _hash_binsearch. The tupdescs used in storing an index > entry are probably needed, but you could just use a single statically > declared tupdesc (look at the ones in relcache.c for examples of > building one as a constant). > +1 > regards, tom lane > I think that this sort of virtual bucket would allow us to take better advantage of the O(1) behavior. What do you all think? Regards, Ken
FYI, I just patched the fill-factor calculation and re-ran my test. The index size dropped from 513M to 43M which is the same disk footprint as the corresponding btree index. Have a nice weekend. Ken On Fri, Jul 18, 2008 at 12:23:14PM -0500, Kenneth Marshall wrote: > I just ran my original 16M word test case against the patched > version, and like Tom noted below, the tuples per bucket > calculation is wrong which results in identical index sizes > for both the original version and the hash-value-only version. >
I'm sorry for delay reply. I couldn't get access to the internet these days for some reason. I do apologize for my rough work and very bad readability. I posted it in a hurry and I didn't mean to cause the reader so much inconvenience. I'll NEVER make such a mistake again. Currently, I've made some optimization Tom advised and removed the macro HASHVALUE_ONLY. And I'm working on fixing the problem that it crashed in large data set. I'll post a new patch later. Thank you for all your advice and test. -- Best Regards, Xiao Meng DKERC, Harbin Institute of Technology, China Gtalk: mx.cogito@gmail.com MSN: cnEnder@live.com http://xiaomeng.yo2.cn
Well, I'll do it after I finish my second patch. Hash index should be more efficient than btree when N is big enough. It seems meaningful to find how big N is in an experiment way. On Fri, Jul 18, 2008 at 6:35 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > On Fri, 2008-07-18 at 11:07 +0100, Gregory Stark wrote: >> "Simon Riggs" <simon@2ndquadrant.com> writes: > >> hash lookups can in theory be O(1). > > I'm not sure whether that applies here? I'm interested in how *this* > patch will work, not in more generic algorithm theory. > > To patch authors: Can we please see a table showing expected number of > logical I/Os (i,e, block accesses) for btrees and hash indexes > > e.g. for 100-byte rows... > > rows btree hash > ---- ----- ---- > 10^2 > 10^3 > 10^4 > 10^5 > 10^6 > 10^7 > 10^8 > > -- > Simon Riggs www.2ndQuadrant.com > PostgreSQL Training, Services and Support > > -- Best Regards, Xiao Meng DKERC, Harbin Institute of Technology, China Gtalk: mx.cogito@gmail.com MSN: cnEnder@live.com http://xiaomeng.yo2.cn
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > owner@postgresql.org] On Behalf Of Xiao Meng > Sent: Tuesday, July 22, 2008 7:57 PM > To: Simon Riggs > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] [PATCH]-hash index improving > > Well, I'll do it after I finish my second patch. > Hash index should be more efficient than btree when N is big enough. > It seems meaningful to find how big N is in an experiment way. The savings will depend on many factors. Another thing (besides volume) which is important is the sort of key data beingindexed. Consider a unique key on six varchar(40) fields: 1. Country 2. State/Province 3. City 4. Division 5. Branch 6. Office Typically, a key such as this will have lots of clumps of similar data, only being differentiated with the final segment. This sort of index is often used for reporting purposes. To determine a unique entry, it is not unlikely that morethan 200 characters will be traversed. A hash index gets a special boost here because a much smaller data signatureis stored. Even a 64 bit hash occupies only 8 bytes. On the other hand, consider an index on a field consistingof a single character. Here, the pages of the b-tree will have a huge volume of entries per page, requiring fewerpages to search, and the hash index is many times larger and hence more pages will have to be loaded. These things make a hash index desirable: 1. Unique index 2. Long keys 3. Huge data cardinality 4. Equality search These things make a hash index undesirable: 1. Non-unique index 2. Short keys 3. Small data sets These things render a hash index as worthless (except in COMBINATION with a b-tree type index): 1. Need for range searches like BETWEEN 2. Need for ORDER BY on the column(s) As an aside: I guess it will also be nice if you can CLUSTER both index and data values on the hash. It may need a different algorithmthan a b-tree clustering concept. I know that Rdb uses different kinds of storage areas for hashed indexes versesb-tree indexes. This effort to create hashed indexes is very valuable. Because it becomes more and more dominant as the data scales up,right at the point when things get tougher is when it becomes the most helpful. If you have a tiny table, it does noteven matter if you index it, because (for instance) 10 rows will probably always stay in memory and iteration will findwhat is needed instantly. But if you have hundreds of millions of rows or billions of rows, now is when performancereally matters. So when the data scales to preposterous size (which it has an uncanny ability to do) the boostof performance becomes even more valuable.
On Wed, 2008-07-23 at 10:57 +0800, Xiao Meng wrote: > Well, I'll do it after I finish my second patch. > Hash index should be more efficient than btree when N is big enough. > It seems meaningful to find how big N is in an experiment way. Agreed. We should also examine the basic thinking of the index. My understanding is that it dynamically resizes hash as the index grows. If we already believe the only benefit would come when the index is large, having special handling for small tables seems like a waste of time because we will never use it in those contexts. So starting the hash at a fairly large size makes more sense than it might otherwise seem to. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Tue, Jul 22, 2008 at 08:36:34PM -0700, Dann Corbit wrote: > > -----Original Message----- > > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > > owner@postgresql.org] On Behalf Of Xiao Meng > > Sent: Tuesday, July 22, 2008 7:57 PM > > To: Simon Riggs > > Cc: pgsql-hackers@postgresql.org > > Subject: Re: [HACKERS] [PATCH]-hash index improving > > > > Well, I'll do it after I finish my second patch. > > Hash index should be more efficient than btree when N is big enough. > > It seems meaningful to find how big N is in an experiment way. > > The savings will depend on many factors. Another thing (besides volume) which is important is the sort of key data beingindexed. > > Consider a unique key on six varchar(40) fields: > > 1. Country > 2. State/Province > 3. City > 4. Division > 5. Branch > 6. Office > > Typically, a key such as this will have lots of clumps of similar data, only being differentiated with the final segment. This sort of index is often used for reporting purposes. To determine a unique entry, it is not unlikely that morethan 200 characters will be traversed. A hash index gets a special boost here because a much smaller data signatureis stored. Even a 64 bit hash occupies only 8 bytes. On the other hand, consider an index on a field consistingof a single character. Here, the pages of the b-tree will have a huge volume of entries per page, requiring fewerpages to search, and the hash index is many times larger and hence more pages will have to be loaded. > > These things make a hash index desirable: > 1. Unique index > 2. Long keys > 3. Huge data cardinality > 4. Equality search > > These things make a hash index undesirable: > 1. Non-unique index > 2. Short keys > 3. Small data sets > I mentioned in a previous E-mail, adding some additional informational settings that can be used like fill-factor to improve the layout and performance of a hash index. They are roughly: - number of elements - maximum number of elements - multiplicity - estimate of element repetitionin a non-unique index Knowing the number of elements in advance can allow the index to be pre-created in the optimal layout and disk footprint. For every multiple of 256, you can reduce the space needed by the hash value stored by 8-bits. For large indexes you can store a 64-bit hash in the same space as the 32-bit hash in a small index. This will allow for the use of larger hash values which will result in better data distribution between the buckets and more O(1) like behavior. > These things render a hash index as worthless (except in COMBINATION with a b-tree type index): > 1. Need for range searches like BETWEEN > 2. Need for ORDER BY on the column(s) > > As an aside: > I guess it will also be nice if you can CLUSTER both index and data values on the hash. It may need a different algorithmthan a b-tree clustering concept. I know that Rdb uses different kinds of storage areas for hashed indexes versesb-tree indexes. > Clustering a hash index will allow for much smaller indexes through the use of prefix-compression of the common heap tuple id's. Now an entry in the hash index would need sizeof(hash) + sizeof(heap tuple id) which is 4 + 6 = 10bytes before clustering. After clustering and for large indexes, this could drop to 4bytes per entry plus a constant value. > This effort to create hashed indexes is very valuable. Because it becomes more and more dominant as the data scales up,right at the point when things get tougher is when it becomes the most helpful. If you have a tiny table, it does noteven matter if you index it, because (for instance) 10 rows will probably always stay in memory and iteration will findwhat is needed instantly. But if you have hundreds of millions of rows or billions of rows, now is when performancereally matters. So when the data scales to preposterous size (which it has an uncanny ability to do) the boostof performance becomes even more valuable. Although it is a clear theoretical benefit from the O(1) lookup for large indexes, I think that the cross-over point between btree and hash indexes may take place for smaller indexes than might be expected due to the possibly smaller memory footprint needed for the hash index. Of course, this will all need to be tested. Regards, Ken