Re: Next Steps with Hash Indexes - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Next Steps with Hash Indexes
Date
Msg-id CAA4eK1KcEuh4_+qrAjB7mpGXhwu+Chuh5Hh6XYYb_HRXUvp6+g@mail.gmail.com
Whole thread Raw
In response to Re: Next Steps with Hash Indexes  (Simon Riggs <simon.riggs@enterprisedb.com>)
List pgsql-hackers
On Wed, Oct 13, 2021 at 4:13 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
>
> On Tue, 5 Oct 2021 at 20:06, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>
> > >>> I have presented a simple, almost trivial, patch to allow multi-col
> > >>> hash indexes. It hashes the first column only, which can be a downside
> > >>> in *some* cases. If that is clearly documented, it would not cause
> > >>> many issues, IMHO. However, it does not have any optimization issues
> > >>> or complexities, which is surely a very good thing.
> > >>>
> > >>> Trying to involve *all* columns in the hash index is a secondary
> > >>> optimization. It requires subtle changes in optimizer code, as Tom
> > >>> points out. It also needs fine tuning to make the all-column approach
> > >>> beneficial for the additional cases without losing against what the
> > >>> "first column" approach gives.
> > >>>
> > >>> I did consider both approaches and after this discussion I am still in
> > >>> favour of committing the very simple "first column" approach to
> > >>> multi-col hash indexes now.
> > >>
> > >> But what about the other approach suggested by Tom, basically we hash
> > >> only based on the first column for identifying the bucket, but we also
> > >> store the hash value for other columns?  With that, we don't need
> > >> changes in the optimizer and we can also avoid a lot of disk fetches
> > >> because after finding the bucket we can match the secondary columns
> > >> before fetching the disk tuple.  I agree, one downside with this
> > >> approach is we will increase the index size.
> > >
> > > Identifying the bucket is the main part of a hash index's work, so
> > > that part would be identical.
> > >
> > > Once we have identified the bucket, we sort the bucket page by hash,
> > > so having an all-col hash would help de-duplicate multi-col hash
> > > collisions, but not enough to be worth it, IMHO, given that storing an
> > > extra 4 bytes per index tuple is a significant size increase which
> > > would cause extra overflow pages etc.. The same thought applies to
> > > 8-byte hashes.
> > >
> >
> > IMO it'd be nice to show some numbers to support the claims that storing
> > the extra hashes and/or 8B hashes is not worth it ...
>
> Using an 8-byte hash is possible, but only becomes effective when
> 4-byte hash collisions get hard to manage. 8-byte hash also makes the
> index 20% bigger, so it is not a good default.
>
> Let's look at the distribution of values:
>
> In a table with 100 million rows, with consecutive monotonic values,
> starting at 1
> No Collisions - 98.8%
> 1 Collision   - 1.15%
> 2+ Collisions - 0.009% (8979 values colliding)
> Max=4
>
> In a table with 1 billion rows (2^30), with consecutive monotonic
> values, starting at 1
> No Collisions - 89.3%
> 1 Collision   - 9.8%
> 2 Collisions - 0.837%
> 3+ Collisions - 0.0573% (615523 values colliding)
> Max=9
>
> At 100 million rows, the collisions from a 4-byte hash are not
> painful, but by a billion rows they are starting to become a problem,
> and by 2 billion rows we have a noticeable issue (>18% collisions).
>
> Clearly, 8-byte hash values would be appropriate for tables larger
> than this. However, we expect users to know about and to use
> partitioning, with reasonable limits somewhere in the 100 million  row
> (with 100 byte rows, 10GB) to 1 billion row (with 100 byte rows,
> 100GB) range.
>
> The change from 4 to 8 byte hashes seems simple, so I am not against
> it for that reason. IMHO there is no use case for 8-byte hashes since
> reasonable users would not have tables big enough to care.
>
> That is my reasoning, YMMV.
>
> > I'm sure there are cases where it'd be a net loss, but imagine for
> > example a case when the first column has a lot of duplicate values.
> > Which is not all that unlikely - duplicates seem like one of the natural
> > reasons why people want multi-column hash indexes. And those duplicates
> > are quite expensive, due to having to access the heap. Being able to
> > eliminate those extra accesses cheaply might be a clear win, even if it
> > makes the index a bit larger (shorter hashes might be enough).
>
> I agree, eliminating duplicates would be a good thing, if that is possible.
>
> However, hashing on multiple columns doesn't eliminate duplicates, we
> can still get them from different combinations of rows.
>
> With a first-column hash then (1,1) and (1,2) collide.
> But with an all-column hash then (1,2) and (2,1) collide.
> So we can still end up with collisions and this depends on the data
> values and types.
>

I don't think this will happen if we store the first column as bucket
identifier and other cols as payload.

> We can all come up with pessimal use cases.
>
> Perhaps it would be possible to specify a parameter that says how many
> columns in the index are part of the hash? Not sure how easy that is.
>
> If you have a situation with lots of duplicates, then use btrees
> instead. We shouldn't have to make hash indexes work well for *all*
> cases before we allow multiple columns for some cases. The user will
> already get to compare btree-vs-hash before they use them in a
> particular use case. The perfect should not be the enemy of the good.
>
> Storing multiple hashes uses more space and is more complex.
>

I agree that storing trailing columns (except the first one) as
payload uses more space but it will save heap fetches in many cases.
Apart from search, even for unique key insertion, we need to do heap
fetches as we can only verify the other values after fetching the row
from the heap.

Now, here I feel the question is do we want to save random heap I/O or
save extra space in a hash? I think both approaches have pros and cons
but probably saving heap I/O appears to be important especially for
unique index checks where we need to hold bucket lock as well.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Improve the HINT message of the ALTER command for postgres_fdw
Next
From: Amit Kapila
Date:
Subject: Re: Next Steps with Hash Indexes