Home > mailing lists

Re: Next Steps with Hash Indexes - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: Next Steps with Hash Indexes
Date	August 11, 2021 14:54:09
Msg-id	CA+TgmoYVAxE0PGdO9aDBj=pWNdkXbJHr5Udw5RHO+9j3e1=eDQ@mail.gmail.com Whole thread Raw
In response to	Re: Next Steps with Hash Indexes (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Next Steps with Hash Indexes Re: Next Steps with Hash Indexes
List	pgsql-hackers

Tree view

On Wed, Aug 11, 2021 at 10:30 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > I suspect it would be hard to store multiple hash values, one per
> > column. It seems to me that what we ought to do is combine the hash
> > values for the individual columns using hash_combine(64) and store the
> > combined value. I can't really imagine why we would NOT do that.
>
> That would make it impossible to use the index except with queries
> that provide equality conditions on all the index columns.  Maybe
> that's fine, but it seems less flexible than other possible definitions.
> It really makes me wonder why anyone would bother with a multicol
> hash index.

Hmm. That is a point I hadn't considered.

I have to admit that after working with Amit on all the work to make
hash indexes WAL-logged a few years ago, I was somewhat disillusioned
with the whole AM. It seems like a cool idea to me but it's just not
that well-implemented. For example, the strategy of just doubling the
number of buckets in one shot seems pretty terrible for large indexes,
and ea69a0dead5128c421140dc53fac165ba4af8520 will buy only a limited
amount of relief. Likewise, the fact that keys are stored in hash
value order within pages but that the bucket as a whole is not kept in
order seems like it's bad for search performance and really bad for
implementing unique indexes with reasonable amounts of locking. (I
don't know how the present patch tries to solve that problem.) It's
tempting to think that we should think about creating something
altogether new instead of hacking on the existing implementation, but
that's a lot of work and I'm not sure what specific design would be
best.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Mark Dilger
Date: 11 August 2021, 14:51:36
Subject: Re: Use extended statistics to estimate (Var op Var) clauses

From: Tomas Vondra
Date: 11 August 2021, 15:13:34
Subject: Re: Use extended statistics to estimate (Var op Var) clauses

Re: Next Steps with Hash Indexes - Mailing list pgsql-hackers

Previous

Next