Re: Next Steps with Hash Indexes - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Next Steps with Hash Indexes
Date
Msg-id CAH2-WzkMTgytnQtJ4NKchJrLRNx44951OxtrhoMpjbak+oNbUA@mail.gmail.com
Whole thread Raw
In response to Re: Next Steps with Hash Indexes  (John Naylor <john.naylor@enterprisedb.com>)
List pgsql-hackers
On Wed, Aug 11, 2021 at 8:51 AM John Naylor
<john.naylor@enterprisedb.com> wrote:
> (Standard disclaimer that I'm not qualified to design index AMs) I've seen one mention in the literature about the
possibilityof simply having a btree index over the hash values. That would require faster search within pages, in
particularusing abbreviated keys in the ItemId array of internal pages [1] and interpolated search rather than pure
binarysearch (which should work reliably with high-entropy keys like hash values), but doing that would speed up all
btreeindexes, so that much is worth doing regardless of how hash indexes are implemented. In that scheme, the hash
indexAM would just be around for backward compatibility. 

I think that it's possible (though hard) to do that without involving
hashing, even for datatypes like text. Having some kind of prefix
compression that makes the final abbreviated keys have high entropy
would be essential, though. I agree that it would probably be
significantly easier when you knew you were dealing with hash values,
but even there you need some kind of prefix compression.

In any case I suspect that it would make sense to reimplement hash
indexes as a translation layer between hash index opclasses and
nbtree. Robert said "Likewise, the fact that keys are stored in hash
value order within pages but that the bucket as a whole is not kept in
order seems like it's bad for search performance". Obviously we've
already done a lot of work on an index AM that deals with a fully
ordered keyspace very well. This includes dealing with large groups of
duplicates gracefully, since in a certain sense there are no duplicate
B-Tree index tuples -- the heap TID tiebreaker ensures that. And it
ensures that you have heap-wise locality within these large groups,
which is a key enabler of things like opportunistic index deletion.

When hash indexes have been used in database systems, it tends to be
in-memory database systems where the recovery path doesn't recover
indexes -- they're just rebuilt from scratch instead. If that's
already a baked-in assumption then hash indexes make more sense. To me
it seems like the problem with true hash indexes is that they're
constructed in a top-down fashion, which is approximately the opposite
of the bottom-up, incremental approach used by B-Tree indexing. This
seems to be where all the skew problems arise from. This approach
cannot be robust to changes in the keyspace over time, really.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Greg Sabino Mullane
Date:
Subject: Re: Add option --drop-cascade for pg_dump/restore
Next
From: Yugo NAGATA
Date:
Subject: Re: Avoid stuck of pbgench due to skipped transactions