Re: Hash index build performance tweak from sorting - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Hash index build performance tweak from sorting
Date
Msg-id CANbhV-EFzF-VF7ALfoJ4Kgc30f5oJ7qFF_AhrXNjH7OPBfgX_g@mail.gmail.com
Whole thread Raw
In response to Re: Hash index build performance tweak from sorting  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Hash index build performance tweak from sorting
Re: Hash index build performance tweak from sorting
List pgsql-hackers
On Wed, 21 Sept 2022 at 02:32, David Rowley <dgrowleyml@gmail.com> wrote:
>

> I took this patch for a spin and saw a 2.5% performance increase using
> the random INT test that Tom posted. The index took an average of
> 7227.47 milliseconds on master and 7045.05 with the patch applied.

Thanks for the review, apologies for the delay in acting upon your comments.

My tests show the sorted and random tests are BOTH 4.6% faster with
the v3 changes using 5-test avg, but you'll be pleased to know your
kit is about 15.5% faster than mine, comparing absolute execution
times.

> On making a pass of the changes, I noted down a few things.

> 2. There are quite a few casts that are not required. e.g:
>
> _hash_doinsert(rel, itup, heapRel, (HashInsertState) &istate);
> buildstate.spool = _h_spoolinit(heap, index, num_buckets,
> (HashInsertState) &insertstate);
> buildstate.istate = (HashInsertState) &insertstate;

Removed

> 3. Just a minor nitpick. Line wraps at 80 chars. You're doing this
> sometimes but not others. This seems just to be due to the additional
> function parameters that have been added.

Done

> 4. I added the following Assert to _hash_pgaddtup() as I expected the
> itup_off to be set to the same thing before and after this change. I
> see the Assert is failing in the regression tests.
>
> Assert(PageGetMaxOffsetNumber(page) + 1 ==
>    _hash_binsearch(page, _hash_get_indextuple_hashkey(itup)));
>
> I think this is because _hash_binsearch() returns the offset with the
> first tuple with the given hashkey, so if there are duplicate hashkey
> values then it looks like PageAddItemExtended() will set needshuffle
> and memmove() the existing item(s) up one slot.  I don't know this
> hash index building code very well, but I wonder if it's worth having
> another version of _hash_binsearch() that can be used to make
> _hash_pgaddtup() put any duplicate hashkeys after the existing ones
> rather than before and shuffle the others up? It sounds like that
> might speed up normal insertions when there are many duplicate values
> to hash.

Sounds reasonable.

I tried changing src/backend/access/hash/hashinsert.c, line 307 (on
patched file) from

-   itup_off = _hash_binsearch(page, hashkey);

to

+   itup_off = _hash_binsearch_last(page, hashkey) + 1;

since exactly such a function already exists in code.

But this seems to cause a consistent ~1% regression in performance,
which surprises me.
Test was the random INSERT SELECT with 10E6 rows after the CREATE INDEX.

Not sure what to suggest, but the above change is not included in v3.

> I wonder if this might be the reason the random INT test didn't come
> out as good as your original test which had unique values. The unique
> values test would do less shuffling during PageAddItemExtended(). If
> so, that implies that skipping the binary search is only part of the
> gains here and that not shuffling tuples accounts for quite a bit of
> the gain you're seeing. If so, then it would be good to not have to
> shuffle duplicate hashkey tuples up in the page during normal
> insertions as well as when building the index.

There is still a 1.4% lead for the sorted test over the random one, in my tests.

> In any case, it would be nice to have some way to assert that we don't
> accidentally pass sorted==true to _hash_pgaddtup() when there's an
> existing item on the page with a higher hash value. Maybe we could
> just look at the hash value of the last tuple on the page and ensure
> it's <= to the current one?

Done

> 5. I think it would be nicer to move the insertstate.sorted = false;
> into the else branch in hashbuild(). However, you might have to do
> that anyway if you were to do what I mentioned in #1.

Done

> 1. In _h_spoolinit() the HSpool is allocated with palloc and then
> you're setting the istate field to a pointer to the HashInsertState
> which is allocated on the stack by the only calling function
> (hashbuild()).  Looking at hashbuild(), it looks like the return value
> of _h_spoolinit is never put anywhere to make it available outside of
> the function, so it does not seem like there is an actual bug there.
> However, it just seems like a bug waiting to happen. If _h_spoolinit()
> is pallocing memory, then we really shouldn't be setting pointer
> fields in that memory to point to something on the stack.  It might be
> nicer if the istate field in HSpool was a HashInsertStateData and
> _h_spoolinit() just memcpy'd the contents of that parameter.  That
> would make HSpool 4 bytes smaller and save additional pointer
> dereferences in _hash_doinsert().

> This is just my opinion, but I don't really see the value in having a
> typedef for a pointer to HashInsertStateData. I can understand that if
> the struct was local to a .c file, but you've got the struct and
> pointer typedef in the same header. I understand we often do this in
> the code, but I feel like we do it less often in newer code. e.g we do
> it in aset.c but not generation.c (which is much newer than aset.c).
> My personal preference would be just to name the struct
> HashInsertState and have no extra pointer typedefs.

Not done, but not disagreeing either, just not very comfortable
actually making those changes.

--
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: SUBTRANS: Minimizing calls to SubTransSetParent()
Next
From: Ian Lawrence Barwick
Date:
Subject: Re: Testing autovacuum wraparound (including failsafe)