Re: Next Steps with Hash Indexes - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Next Steps with Hash Indexes |
Date | |
Msg-id | CA+Tgmoa9kff2365gEW7Bhn2_twXcZMkCp5K8wfVE9iUVaaVYZg@mail.gmail.com Whole thread Raw |
In response to | Re: Next Steps with Hash Indexes (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
On Wed, Aug 11, 2021 at 11:17 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Yeah, agreed. The whole buckets-are-integral-numbers-of-pages scheme > is pretty well designed to ensure bloat, but trying to ameliorate that > by reducing the number of buckets creates its own problems (since, as > you mention, we have no scheme whatever for searching within a bucket). > I'm quite unimpressed with Simon's upthread proposal to turn off bucket > splitting without doing anything about the latter issue. Maybe. I don't think that it should be a huge problem to decide that an occupied bucket has to consume an entire page; if that's a big issue, you should just have fewer buckets. I do think it's a problem that a bucket containing no tuples at all still consumes an entire page, because the data is often going to be skewed so that many buckets are entirely empty. I also think it's a problem that expanding the directory from 2^{N} buckets to 2^{N+1} buckets requires 4 allocations of 2^{N-2} *consecutive* pages. That's a lot of consecutive pages for even fairly modest values of N. Imagine a design where we have a single-page directory with 1024 slots, each corresponding to one bucket. Each slot stores a block number, which might be InvalidBlockNumber if there are no tuples in that bucket. Given a tuple with hash value H, check slot H%1024 and then go to that page to look further. If there are more tuples in that bucket than can fit on the page, then it can link to another page. If we assume for the sake of argument that 1024 is the right number of buckets, this is going to use about as much space as the current system when the data distribution is uniform, but considerably less when it's skewed. The larger you make the number of buckets, the better this kind of thing looks on skewed data. Now, you can't just always have 1024 buckets, so we'd actually have to do something a bit more clever, probably involving multiple levels of directories. For example, suppose a directory page contains only 32 slots. That will leave a lot of empty space in the page, which can be used to store tuples. An index search has to scan all tuples that are stored directly in the page, and also use the first 5 bits of the hash key to search the appropriate bucket. But the bucket is itself a directory: it can have some tuples stored directly in the page, and then it has 32 more slots and you use the next 5 bits of the hash key to decide which one of those to search. Then it becomes possible to incrementally expand the hash index: when the space available in a directory page fills up, you can either create a sub-directory and move as many tuples as you can into that page, or add an overflow page that contains only tuples. It's important to be able to do either one, because sometimes a bucket fills up with tuples that have identical hash values, and sometimes a bucket fills up with tuples that have a variety of hash values. The current implementation tends to massively increase the number of buckets even when it does very little to spread the index entries out. ("Hey, I doubled the number of buckets and the keys are still almost all in one bucket ... let me double the number of buckets again and see if it works better this time!") If we were going to create a replacement, we'd want the index to respond differently to a bunch of dups vs. a bunch of non-dups. > As far as the specific point at hand is concerned, I think storing > a hash value per index column, while using only the first column's > hash for bucket selection, is what to do for multicol indexes. > We still couldn't set amoptionalkey=true for hash indexes, because > without a hash for the first column we don't know which bucket to > look in. But storing hashes for the additional columns would allow > us to check additional conditions in the index, and usually save > trips to the heap on queries that provide additional column > conditions. You could also imagine sorting the contents of a bucket > on all the hashes, which would ease uniqueness checks. That sounds reasonable I guess, but I don't know how hard it is to implement. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: