Re: Next Steps with Hash Indexes - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Next Steps with Hash Indexes
Date
Msg-id CA+TgmoYVAxE0PGdO9aDBj=pWNdkXbJHr5Udw5RHO+9j3e1=eDQ@mail.gmail.com
Whole thread Raw
In response to Re: Next Steps with Hash Indexes  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Next Steps with Hash Indexes  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Next Steps with Hash Indexes  (John Naylor <john.naylor@enterprisedb.com>)
List pgsql-hackers
On Wed, Aug 11, 2021 at 10:30 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > I suspect it would be hard to store multiple hash values, one per
> > column. It seems to me that what we ought to do is combine the hash
> > values for the individual columns using hash_combine(64) and store the
> > combined value. I can't really imagine why we would NOT do that.
>
> That would make it impossible to use the index except with queries
> that provide equality conditions on all the index columns.  Maybe
> that's fine, but it seems less flexible than other possible definitions.
> It really makes me wonder why anyone would bother with a multicol
> hash index.

Hmm. That is a point I hadn't considered.

I have to admit that after working with Amit on all the work to make
hash indexes WAL-logged a few years ago, I was somewhat disillusioned
with the whole AM. It seems like a cool idea to me but it's just not
that well-implemented. For example, the strategy of just doubling the
number of buckets in one shot seems pretty terrible for large indexes,
and ea69a0dead5128c421140dc53fac165ba4af8520 will buy only a limited
amount of relief. Likewise, the fact that keys are stored in hash
value order within pages but that the bucket as a whole is not kept in
order seems like it's bad for search performance and really bad for
implementing unique indexes with reasonable amounts of locking. (I
don't know how the present patch tries to solve that problem.) It's
tempting to think that we should think about creating something
altogether new instead of hacking on the existing implementation, but
that's a lot of work and I'm not sure what specific design would be
best.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Mark Dilger
Date:
Subject: Re: Use extended statistics to estimate (Var op Var) clauses
Next
From: Tomas Vondra
Date:
Subject: Re: Use extended statistics to estimate (Var op Var) clauses