Re: Hash index build performance tweak from sorting - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Hash index build performance tweak from sorting
Date
Msg-id CAA4eK1JiCB85JPe-HT6hA9qEZe4HO1FDAJxJSAyVcptTg0wRHA@mail.gmail.com
Whole thread Raw
In response to Re: Hash index build performance tweak from sorting  (Simon Riggs <simon.riggs@enterprisedb.com>)
Responses Re: Hash index build performance tweak from sorting
List pgsql-hackers
On Mon, May 2, 2022 at 9:28 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote:
>
> On Sat, 30 Apr 2022 at 12:12, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Apr 19, 2022 at 3:05 AM Simon Riggs
> > <simon.riggs@enterprisedb.com> wrote:
> > >
> > > Hash index pages are stored in sorted order, but we don't prepare the
> > > data correctly.
> > >
> > > We sort the data as the first step of a hash index build, but we
> > > forget to sort the data by hash as well as by hash bucket.
> > >
> >
> > I was looking into the nearby comments (Fetch hash keys and mask off
> > bits we don't want to sort by.) and it sounds like we purposefully
> > don't want to sort by the hash key. I see that this comment was
> > originally introduced in the below commit:
> >
> > commit 4adc2f72a4ccd6e55e594aca837f09130a6af62b
> > Author: Tom Lane <tgl@sss.pgh.pa.us>
> > Date:   Mon Sep 15 18:43:41 2008 +0000
> >
> >     Change hash indexes to store only the hash code rather than the
> > whole indexed
> >     value.
> >
> > But even before that, we seem to mask off the bits before comparison.
> > Is it that we are doing so because we want to keep the order of hash
> > keys in a particular bucket so such masking was required?
>
> We need to sort by both hash bucket and hash value.
>
> Hash bucket id so we can identify the correct hash bucket to insert into.
>
> But then on each bucket/overflow page we store it sorted by hash value
> to make lookup faster, so inserts go faster if they are also sorted.
>

I also think so. So, we should go with this unless someone else sees
any flaw here.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Logical replication timeout problem
Next
From: Bharath Rupireddy
Date:
Subject: Add a new function and a document page to get/show all the server hooks