Re: Next Steps with Hash Indexes - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Next Steps with Hash Indexes
Date
Msg-id CANbhV-GhTaDedzG-+zBDhMgX1eLSJeyP+XrySoPuHKMp1p9Fow@mail.gmail.com
Whole thread Raw
In response to Re: Next Steps with Hash Indexes  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Next Steps with Hash Indexes  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
On Mon, 27 Sept 2021 at 06:52, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Sep 23, 2021 at 11:11 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Sep 23, 2021 at 10:04 AM Sadhuprasad Patro <b.sadhu@gmail.com> wrote:
> > >
> > > And to get the multi-column hash index selected, we may set
> > > enable_hashjoin =off, to avoid any condition become join condition,
> > > saw similar behaviors in other DBs as well...
> >
> > This may be related to Tom's point that, if some of the quals are
> > removed due to optimization or converted to join quals, then now, even
> > if the user has given qual on all the key columns the index scan will
> > not be selected because we will be forcing that the hash index can
> > only be selected if it has quals on all the key attributes?
> >
> > I don't think suggesting enable_hashjoin =off is a solution,
> >
>
> Yeah, this doesn't sound like a good idea. How about instead try to
> explore the idea where the hash (bucket assignment and search) will be
> based on the first index key and the other columns will be stored as
> payload? I think this might pose some difficulty in the consecutive
> patch to enable a unique index because it will increase the chance of
> traversing more buckets for uniqueness checks. If we see such
> problems, then I have another idea to minimize the number of buckets
> that we need to lock during uniqueness check which is by lock chaining
> as is used during hash bucket clean up where at a time we don't need
> to lock more than two buckets at a time.

I have presented a simple, almost trivial, patch to allow multi-col
hash indexes. It hashes the first column only, which can be a downside
in *some* cases. If that is clearly documented, it would not cause
many issues, IMHO. However, it does not have any optimization issues
or complexities, which is surely a very good thing.

Trying to involve *all* columns in the hash index is a secondary
optimization. It requires subtle changes in optimizer code, as Tom
points out. It also needs fine tuning to make the all-column approach
beneficial for the additional cases without losing against what the
"first column" approach gives.

I did consider both approaches and after this discussion I am still in
favour of committing the very simple "first column" approach to
multi-col hash indexes now.

-- 
Simon Riggs                http://www.EnterpriseDB.com/



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: Fix pg_log_backend_memory_contexts() 's delay
Next
From: Amul Sul
Date:
Subject: Re: [Patch] ALTER SYSTEM READ ONLY