Thread: Division in dynahash.c due to HASH_FFACTOR

Division in dynahash.c due to HASH_FFACTOR

From
Jakub Wartak
Date:
Greetins hackers,

I have mixed feelings if this welcome contribution as the potential gain is relatively small in my tests, but still I
wouldlike to point out that HASH_FFACTOR functionality from dynahash.c could be removed or optimized (default fill
factoris always 1, there's not a single place that uses custom custom fill factor other than DEF_FFACTOR=1 inside
PostgreSQLrepository). Because the functionality is present there seems to be division for every buffer access
[BufTableLookup()]/ or every smgropen() call (everything call to hash_search() is affected, provided it's not
ShmemInitHash/HASH_PARTITION).This division is especially visible via perf on single process StartupXLOG WAL recovery
processon standby in heavy duty 100% CPU conditions , as the top1 is inside hash_search: 
   0x0000000000888751 <+449>:   idiv   r8
   0x0000000000888754 <+452>:   cmp    rax,QWORD PTR [r15+0x338] <<-- in perf annotate shows as 30-40%, even on default
-O2,probably CPU pipelining for idiv above 

I've made a PoC test to skip that division assuming ffactor would be gone:
               if (!IS_PARTITIONED(hctl) && !hashp->frozen &&
-                       hctl->freeList[0].nentries / (long) (hctl->max_bucket + 1) >= hctl->ffactor &&
+                       hctl->freeList[0].nentries >= (long) (hctl->max_bucket + 1) &&

For a stream of WAL 3.7GB I'm getting consistent improvement of ~4%, (yes I know it's small, that's why I'm having
mixedfeelings): 
gcc -O3: 104->100s
gcc -O2: 108->104s
pgbench -S -c 16 -j 4 -T 30 -M prepared: stays more or less the same (-s 100), so no positive impact there

After removing HASH_FFACTOR PostgreSQL still compiles...  Would removing it break some external API/extensions ? I saw
severaloptimization for the "idiv" where it could be optimized e.g. see https://github.com/ridiculousfish/libdivide  Or
maybethere is some other idea to expose bottlenecks of BufTableLookup() ? I also saw codepath
PinBuffer()->GetPrivateRefCountEntry()-> dynahash that could be called pretty often I have no idea what kind of pgbench
stresstestcould be used to demonstrate the gain (or lack of it). 

-Jakub Wartak.


Re: Division in dynahash.c due to HASH_FFACTOR

From
Tomas Vondra
Date:
On Fri, Sep 04, 2020 at 07:01:41AM +0000, Jakub Wartak wrote:
>
>Greetins hackers,
>
>I have mixed feelings if this welcome contribution as the potential
>gain is relatively small in my tests, but still I would like to point
>out that HASH_FFACTOR functionality from dynahash.c could be removed or
>optimized (default fill factor is always 1, there's not a single place
>that uses custom custom fill factor other than DEF_FFACTOR=1 inside
>PostgreSQL repository). Because the functionality is present there
>seems to be division for every buffer access [BufTableLookup()] / or
>every smgropen() call (everything call to hash_search() is affected,
>provided it's not ShmemInitHash/HASH_PARTITION). This division is
>especially visible via perf on single process StartupXLOG WAL recovery
>process on standby in heavy duty 100% CPU conditions , as the top1 is
>inside hash_search:
>
>   0x0000000000888751 <+449>:   idiv   r8
>   0x0000000000888754 <+452>:   cmp    rax,QWORD PTR [r15+0x338] <<-- in perf annotate shows as 30-40%, even on
default-O2, probably CPU pipelining for idiv above
 
>
>I've made a PoC test to skip that division assuming ffactor would be gone:
>               if (!IS_PARTITIONED(hctl) && !hashp->frozen &&
>-                       hctl->freeList[0].nentries / (long) (hctl->max_bucket + 1) >= hctl->ffactor &&
>+                       hctl->freeList[0].nentries >= (long) (hctl->max_bucket + 1) &&
>
>For a stream of WAL 3.7GB I'm getting consistent improvement of ~4%, (yes I know it's small, that's why I'm having
mixedfeelings):
 
>gcc -O3: 104->100s
>gcc -O2: 108->104s
>pgbench -S -c 16 -j 4 -T 30 -M prepared: stays more or less the same (-s 100), so no positive impact there
>

Hmm, so it's the division that makes the difference? In that case,
wouldn't it make sense to just compute a threshold every time the hash
table is resized, and then do just the comparison. That is, compute

    nentries_threshold = (long) (hctl->max_bucket + 1) * hctl->ffactor;

and then do just

    hctl->freeList[0].nentries >= nentries_threshold

Of course, this assumes the threshold is calculated only rarely, maybe
that's not the case.

Also, what if you lower the fill factor, e.g. to 0.8? Does that improve
performance or not? I can't find any discussion about this in the
archives, but the dynamic hashing paper [1] seems to suggest it does
make a difference (see the tables at the end, but I haven't re-read the
paper recently so maybe it's irrelevant). Anyway, it'd be foolish to
remove the ffactor parameter to save on division only to lose the
ability to lower the factor and save more than that ...


As for the 4% - it's not much, but it's also not negligible. I'm sure
we've done micro-optimizations for smaller gains. The big question is
whether this is statistically significant, or whether it's just due to
random effects. It could easily be caused by changes in layout of the
binary, for example - that can happen quite easily. See [2] and [3].

The talk actually mentions a tool meant to eliminate this bias by
randomization, but I've never tried using it on PostgreSQL so I'm not
sure how compatible it is :-(


[1] https://www.csd.uoc.gr/~hy460/pdf/Dynamic%20Hash%20Tables.pdf
[2] https://www.cis.upenn.edu/~cis501/previous/fall2016/papers/producing-wrong-data.pdf
[3] https://www.youtube.com/watch?v=r-TLSBdHe1A

>After removing HASH_FFACTOR PostgreSQL still compiles...  Would
>removing it break some external API/extensions ? I saw several
>optimization for the "idiv" where it could be optimized e.g. see
>https://github.com/ridiculousfish/libdivide  Or maybe there is some
>other idea to expose bottlenecks of BufTableLookup() ? I also saw
>codepath PinBuffer()->GetPrivateRefCountEntry() -> dynahash that could
>be called pretty often I have no idea what kind of pgbench stresstest
>could be used to demonstrate the gain (or lack of it).
>
>-Jakub Wartak.
>

I don't think whit would break a lot of stuff, but I'm kinda dubious
it's a measurable improvement for common workloads ...


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Division in dynahash.c due to HASH_FFACTOR

From
Alvaro Herrera
Date:
On 2020-Sep-04, Jakub Wartak wrote:

> After removing HASH_FFACTOR PostgreSQL still compiles...  Would
> removing it break some external API/extensions ?

FWIW, HASH_FFACTOR has *never* been used in Postgres core code.

https://postgr.es/m/20160418180711.55ac82c0@fujitsu already reported
that this flag was unused a few years ago.  And a search through
codesearch.debian.net shows that no software packaged by Debian uses
that flag either.

*If* any third-party extension is using HASH_FFACTOR, it can easily be
unbroken by removing the flag or #defining it to zero; the removal would
only be a problem if anyone showed that there is a performance loss by
using the default fillfactor for some dynahash table instead of their
custom value.

I think if we know that there's a 4% performance increase by removing
the division that comes with an ability to use a custom fillfactor that
nobody uses or has ever used, it makes sense to remove that ability.

... however, if we're really tense about improving some hash table's
performance, it might be worth trying to use simplehash.h instead of
dynahash.c for it.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Division in dynahash.c due to HASH_FFACTOR

From
Thomas Munro
Date:
On Sat, Sep 5, 2020 at 2:34 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> On 2020-Sep-04, Jakub Wartak wrote:
> > After removing HASH_FFACTOR PostgreSQL still compiles...  Would
> > removing it break some external API/extensions ?
>
> FWIW, HASH_FFACTOR has *never* been used in Postgres core code.
>
> https://postgr.es/m/20160418180711.55ac82c0@fujitsu already reported
> that this flag was unused a few years ago.  And a search through
> codesearch.debian.net shows that no software packaged by Debian uses
> that flag either.
>
> *If* any third-party extension is using HASH_FFACTOR, it can easily be
> unbroken by removing the flag or #defining it to zero; the removal would
> only be a problem if anyone showed that there is a performance loss by
> using the default fillfactor for some dynahash table instead of their
> custom value.
>
> I think if we know that there's a 4% performance increase by removing
> the division that comes with an ability to use a custom fillfactor that
> nobody uses or has ever used, it makes sense to remove that ability.

I think Tomas is right, and the correct fix for this would be to
compute it up front every time the hash table size changes, but since
apparently no one ever used it, +1 for just removing it and leaving it
hard-coded.

For what it's worth, for dshash.c I just hard coded it to double the
hash table size whenever the number of elements in a partition
exceeded 75%.  Popular hash tables in standard libraries seem to use
either .75 or 1.0 as a default or only setting.  There's probably room
for discussion about numbers in those ranges, but I think the concept
of customisable load factors with a range much wider than that may be
more relevant for disk-based hash tables (like our hash indexes),
which have very different economics.  I think the name HASH_FFACTOR
may be a clue that this was possibly interference from disk-based hash
tables from the same Berkeley people: rather than "load factor", the
more usual term (?) for nentries/nbuckets in memory-based hash tables
as a way to model number of expected collisions, they called it "fill
factor", which I guess is because in disk-based designs your actually
want a certain rate of collisions, because you're working not with
elements in an array and wanting to avoid collisions while not wasting
space, but rather with fixed sized buckets that you want to fill up,
but not overflow.  </archeological-speculation-mode>

> ... however, if we're really tense about improving some hash table's
> performance, it might be worth trying to use simplehash.h instead of
> dynahash.c for it.

Sure, but removing the dynahash IDIV also seems like a slam dunk
removal of a useless expensive feature.



Re: Division in dynahash.c due to HASH_FFACTOR

From
Jakub Wartak
Date:
Hi Thomas, Tomas :), Alvaro, hackers,

>> > After removing HASH_FFACTOR PostgreSQL still compiles...  Would
>> > removing it break some external API/extensions ?
>>
>> FWIW, HASH_FFACTOR has *never* been used in Postgres core code.
[..]
>> I think if we know that there's a 4% performance increase by removing
>> the division that comes with an ability to use a custom fillfactor that
>> nobody uses or has ever used, it makes sense to remove that ability.

Thomas wrote:
>I think Tomas is right, and the correct fix for this would be to
>compute it up front every time the hash table size changes, but since
>apparently no one ever used it, +1 for just removing it and leaving it
>hard-coded.
[..]
>For what it's worth, for dshash.c I just hard coded it to double the
>hash table size whenever the number of elements in a partition
>exceeded 75%.  Popular hash tables in standard libraries seem to use
>either .75 or 1.0 as a default or only setting.  There's probably room
>for discussion about numbers in those ranges, (..)

Tomas also asked the same "what if you lower the fill factor, e.g. to 0.8? Does that improve performance or not?" and got some interesting links for avoiding bias. I couldn't find any other simple way to benchmark a real impact of it other than checking DB crash-recovery (if anyone has some artificial workload generator with  PinBuffer->hash_search() please let me know).

So I've  been testing it using WAL crash-recovery using Thomas's approach [1]: always the same workload to reply, always on the same machine, same env: same 13GB WAL to recover, 8GB db, 24GB shared buffers on top of 128GB RAM, NVMe with fsync off to put dynahash as primary bottleneck (BufTableLookup()+smgropen()), append-only workload, gcc -O3: Results are <1st time is the WAL recovery timing, 2nd time is checkpoint before opening DB>. There were four measurements per each scenario, first one is cut off to avoid noise:

1. DEF_FFACTOR=1 idiv on
103.971, 3.719
103.937, 3.683
104.934, 3.691

2. DEF_FFACTOR=1 idiv off
100.565, 3.71
101.595, 3.807
100.794, 3.691

3. DEF_FFACTOR=1.2 idiv on // some proof that nobody uses it
1599552729.041 postmaster 94510 FATAL:  failed to initialize hash table "SERIALIZABLEXID hash"

4. DEF_FFACTOR=0.8 idiv on // another proof ? The reason for below is that ffactor is long and as such cannot handle floats, if it would be float....

Program terminated with signal 8, Arithmetic exception.
#0  0x00000000009a4119 in init_htab (nelem=4, hashp=0x17cd238) at dynahash.c:677
677             nbuckets = next_pow2_int((nelem - 1) / hctl->ffactor + 1);
Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.180.amzn1.x86_64

#0  0x00000000009a4119 in init_htab (nelem=4, hashp=0x17cd238) at dynahash.c:677
#1  hash_create (tabname=tabname@entry=0xb77f9a "Timezones", nelem=nelem@entry=4, info=info@entry=0x7ffd3055cf30, flags=flags@entry=16) at dynahash.c:529
#2  0x00000000009ecddf in init_timezone_hashtable () at pgtz.c:211
#3  pg_tzset (name=name@entry=0xb49fd5 "GMT") at pgtz.c:248
#4  0x00000000009ecfee in pg_timezone_initialize () at pgtz.c:372

5. DEF_FFACTOR=1 idiv on // just in case reproducer to validate measurement #1
104.333, 3.804
104.313, 3.772
103.951, 3.687

6. DEF_FFACTOR=2 idiv on
105.406, 3.783
105.33, 3.721
105.403, 3.777

7. DEF_FFACTOR=4 idiv on
105.803, 3.722
105.73, 3.728
106.104, 3.756

Some notes:
- SMgrRelationHash is initialized from start at 400, while DB here is small 8GB and has only couple of tables  -> no expansion needed in above test.
- local backend private overflow hash table for buffers: PrivateRefCountHash is initialized at 100 and maybe it grows during
- googling for DEF_FFACTOR I've found only 1 entry with value of <> 1 from 1993 (see this [2] , what's interesting is the same DEF_SEGSIZE value after all those years)

Alvaro wrote:
>> ... however, if we're really tense about improving some hash table's
>> performance, it might be worth trying to use simplehash.h instead of
>> dynahash.c for it.

Thomas wrote:
> Sure, but removing the dynahash IDIV also seems like a slam dunk removal of a useless expensive feature.

I agree with both, I just thought it might be interesting finding as this idiv might be (?) present in other common paths like ReadBuffer*() / PinBuffer() (some recent discussions, maybe on NUMA boxes), not just WAL recovery as it seems relatively easy to improve.

-J.



From: Thomas Munro <thomas.munro@gmail.com>
Sent: Tuesday, September 8, 2020 2:55 AM
To: Alvaro Herrera <alvherre@2ndquadrant.com>
Cc: Jakub Wartak <Jakub.Wartak@tomtom.com>; pgsql-hackers <pgsql-hackers@postgresql.org>
Subject: Re: Division in dynahash.c due to HASH_FFACTOR
 
On Sat, Sep 5, 2020 at 2:34 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> On 2020-Sep-04, Jakub Wartak wrote:
> > After removing HASH_FFACTOR PostgreSQL still compiles...  Would
> > removing it break some external API/extensions ?
>
> FWIW, HASH_FFACTOR has *never* been used in Postgres core code.
>
> https://postgr.es/m/20160418180711.55ac82c0@fujitsu already reported
> that this flag was unused a few years ago.  And a search through
> codesearch.debian.net shows that no software packaged by Debian uses
> that flag either.
>
> *If* any third-party extension is using HASH_FFACTOR, it can easily be
> unbroken by removing the flag or #defining it to zero; the removal would
> only be a problem if anyone showed that there is a performance loss by
> using the default fillfactor for some dynahash table instead of their
> custom value.
>
> I think if we know that there's a 4% performance increase by removing
> the division that comes with an ability to use a custom fillfactor that
> nobody uses or has ever used, it makes sense to remove that ability.

I think Tomas is right, and the correct fix for this would be to
compute it up front every time the hash table size changes, but since
apparently no one ever used it, +1 for just removing it and leaving it
hard-coded.

For what it's worth, for dshash.c I just hard coded it to double the
hash table size whenever the number of elements in a partition
exceeded 75%.  Popular hash tables in standard libraries seem to use
either .75 or 1.0 as a default or only setting.  There's probably room
for discussion about numbers in those ranges, but I think the concept
of customisable load factors with a range much wider than that may be
more relevant for disk-based hash tables (like our hash indexes),
which have very different economics.  I think the name HASH_FFACTOR
may be a clue that this was possibly interference from disk-based hash
tables from the same Berkeley people: rather than "load factor", the
more usual term (?) for nentries/nbuckets in memory-based hash tables
as a way to model number of expected collisions, they called it "fill
factor", which I guess is because in disk-based designs your actually
want a certain rate of collisions, because you're working not with
elements in an array and wanting to avoid collisions while not wasting
space, but rather with fixed sized buckets that you want to fill up,
but not overflow.  </archeological-speculation-mode>

> ... however, if we're really tense about improving some hash table's
> performance, it might be worth trying to use simplehash.h instead of
> dynahash.c for it.

Sure, but removing the dynahash IDIV also seems like a slam dunk
removal of a useless expensive feature.

Re: Division in dynahash.c due to HASH_FFACTOR

From
Thomas Munro
Date:
On Tue, Sep 8, 2020 at 11:17 PM Jakub Wartak <Jakub.Wartak@tomtom.com> wrote:
> I agree with both, I just thought it might be interesting finding as this idiv might be (?) present in other common
pathslike ReadBuffer*() / PinBuffer() (some recent discussions, maybe on NUMA boxes), not just WAL recovery as it seems
relativelyeasy to improve. 

I wrote a draft commit message for Jakub's proposed change (attached),
and did a little bit of testing, but I haven't seen a case where it
wins yet; I need to work on something else now but thought I'd share
this much anyway.  One observation is that the rule in init_htab had
better agree with the expansion rule in hash_search_with_hash_value,
otherwise you can size your hash table perfectly for n elements and
then it still has to expand when you insert the nth element, which is
why I changed >= to >.  Does this make sense?  Oh man, I don't like
the mash-up of int, long, uint32, Size dynahash uses in various places
and that are brought into relief by that comparison...

Attachment

Re: Division in dynahash.c due to HASH_FFACTOR

From
David Rowley
Date:
On Thu, 10 Sep 2020 at 14:55, Thomas Munro <thomas.munro@gmail.com> wrote:
>
> I wrote a draft commit message for Jakub's proposed change (attached),
> and did a little bit of testing, but I haven't seen a case where it
> wins yet; I need to work on something else now but thought I'd share
> this much anyway.  One observation is that the rule in init_htab had
> better agree with the expansion rule in hash_search_with_hash_value,
> otherwise you can size your hash table perfectly for n elements and
> then it still has to expand when you insert the nth element, which is
> why I changed >= to >.  Does this make sense?  Oh man, I don't like
> the mash-up of int, long, uint32, Size dynahash uses in various places
> and that are brought into relief by that comparison...

I just did some benchmarking with this patch using the same recovery
benchmark that I used in [1] and also the two patches that I posted in
[2]. Additionally, I added a PANIC at the end of recovery so that I
could repeat the recovery over and over again with the same WAL.

Without 0001-Remove-custom-fill-factor-support-from-dynahash.c.patch,
the time in seconds reported for recovery was as follows:

Run 1 65.2
Run 2 65.41
Run 3 65.1
Run 4 64.86
Run 5 62.29
Run 6 67.06
Run 7 63.35
Run 8 63.1
Run 9 62.8
Run 10 62.15

Avg 64.132
Med 64.105

After patching I got:

Run 1 60.69
Run 2 63.13
Run 3 63.24
Run 4 62.13
Run 5 63.81
Run 6 60.27
Run 7 62.85
Run 8 63.45
Run 9 59.6
Run 10 63.16

Avg 62.233
Med 62.99

I was quite happy to see 59.6 seconds in there on run 9 as I'd been
trying hard to get that benchmark to go below 1 minute on my machine.

perf appears to indicate that less time was spent in
hash_search_with_hash_value()

Unpatched:
  26.96%  postgres  postgres            [.] PageRepairFragmentation
  12.07%  postgres  libc-2.31.so        [.] __memmove_avx_unaligned_erms
  10.13%  postgres  postgres            [.] hash_search_with_hash_value
   6.70%  postgres  postgres            [.] XLogReadBufferExtended

Patched:
  27.70%  postgres  postgres            [.] PageRepairFragmentation
  13.50%  postgres  libc-2.31.so        [.] __memmove_avx_unaligned_erms
   9.63%  postgres  postgres            [.] hash_search_with_hash_value
   6.44%  postgres  postgres            [.] XLogReadBufferExtended

I looked over the patch and the only thing I saw was that we might
also want to remove the following line:

#define DEF_FFACTOR    1 /* default fill factor */

Also, a couple of things I noticed when looking at performance
profiles in detail.

1) The most expensive user of hash_search_with_hash_value() comes from
BufTableLookup() which passes HASH_FIND as the HASHACTION.
2) The code that was doing the divide in the unpatched version only
triggered when HASHACTION was HASH_ENTER or HASH_ENTER_NULL.

The 2nd most costly call to hash_search_with_hash_value() came in via
hash_search() via smgropen(). That does use HASH_ENTER, which could
have triggered the divide code. The main caller of smgropen() was
XLogReadBufferExtended().

So, it looks unlikely that any gains we are seeing are from improved
buffer lookups. It's more likely they're coming from more optimal
XLogReadBufferExtended()

David

[1] https://www.postgresql.org/message-id/CAApHDvoKwqAzhiuxEt8jSquPJKDpH8DNUZDFUSX9P7DXrJdc3Q@mail.gmail.com
[2] https://www.postgresql.org/message-id/CAApHDvo9n-nOb3b4PYFx%2Bw9mqd2SSUHm_oAs039eZnZLqFGcbQ%40mail.gmail.com



Re: Division in dynahash.c due to HASH_FFACTOR

From
Thomas Munro
Date:
On Mon, Sep 14, 2020 at 11:35 PM David Rowley <dgrowleyml@gmail.com> wrote:
> I just did some benchmarking with this patch using the same recovery
> benchmark that I used in [1] and also the two patches that I posted in
> [2]. Additionally, I added a PANIC at the end of recovery so that I
> could repeat the recovery over and over again with the same WAL.
>
> [data]

    N           Min           Max        Median           Avg        Stddev
x  10         62.15         67.06         64.86        64.132     1.6188528
+  10          59.6         63.81         63.13        62.233     1.4983031
Difference at 95.0% confidence
    -1.899 +/- 1.46553
    -2.96108% +/- 2.28517%
    (Student's t, pooled s = 1.55974)

Thanks!  Hmm, small but apparently significant and in line with
Jakub's report, and I suppose the effect will be greater with other
nearby recovery performance patches applied that halve the times.
Annoyingly, I can't reproduce this speedup on my local i9-9900; maybe
it requires a different CPU...

> I looked over the patch and the only thing I saw was that we might
> also want to remove the following line:
>
> #define DEF_FFACTOR    1 /* default fill factor */

Right, thanks.  Fixed in the attached.

> The 2nd most costly call to hash_search_with_hash_value() came in via
> hash_search() via smgropen(). That does use HASH_ENTER, which could
> have triggered the divide code. The main caller of smgropen() was
> XLogReadBufferExtended().
>
> So, it looks unlikely that any gains we are seeing are from improved
> buffer lookups. It's more likely they're coming from more optimal
> XLogReadBufferExtended()

I think we call smgropen() twice for every buffer referenced in the
WAL: XLogReadBufferExtended() and again in
ReadBufferWithoutRelcache().  We could reduce it to once with some
refactoring, but I am looking into whether I can reduce it to zero as
a side-effect of another change, more soon...

Attachment

Re: Division in dynahash.c due to HASH_FFACTOR

From
Thomas Munro
Date:
On Tue, Sep 15, 2020 at 9:56 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> > I looked over the patch and the only thing I saw was that we might
> > also want to remove the following line:
> >
> > #define DEF_FFACTOR    1 /* default fill factor */
>
> Right, thanks.  Fixed in the attached.

Pushed.  Thanks Jakub, everyone.

Separately, we really should tidy up the int/long/uint32/size_t
confusion in this module.  I know we have K&R C-era long-itude in
numerous other modules, but this one is a little more egregious in its
data type mixing.



Re: Division in dynahash.c due to HASH_FFACTOR

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> Pushed.  Thanks Jakub, everyone.

BTW, looking over this patch, I wonder about

        /*
         * Can't split if running in partitioned mode, nor if frozen, nor if
         * table is the subject of any active hash_seq_search scans.  Strange
         * order of these tests is to try to check cheaper conditions first.
         */
        if (!IS_PARTITIONED(hctl) && !hashp->frozen &&
            hctl->freeList[0].nentries > (long) (hctl->max_bucket + 1) &&
            !has_seq_scans(hashp))
            (void) expand_table(hashp);

ISTM that getting rid of the division obviates the concern that the
nentries condition is too expensive, and therefore we should revert
to checking it first, on the grounds that that condition is most
likely to fail.

            regards, tom lane



Re: Division in dynahash.c due to HASH_FFACTOR

From
Tom Lane
Date:
I wrote:
> ISTM that getting rid of the division obviates the concern that the
> nentries condition is too expensive,

Also, we could make it slightly cheaper yet, by changing the condition
to

            hctl->freeList[0].nentries > (long) (hctl->max_bucket)

ie drop the +1.  I'd argue that this is actually a more faithful
rendition of the previous condition, since what we had before was
basically

            hctl->freeList[0].nentries >= (long) (hctl->max_bucket + 1)

            regards, tom lane



Re: Division in dynahash.c due to HASH_FFACTOR

From
Thomas Munro
Date:
On Sat, Sep 19, 2020 at 1:30 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
> > ISTM that getting rid of the division obviates the concern that the
> > nentries condition is too expensive,

True, that comment needed to go.

> Also, we could make it slightly cheaper yet, by changing the condition
> to
>
>             hctl->freeList[0].nentries > (long) (hctl->max_bucket)
>
> ie drop the +1.  I'd argue that this is actually a more faithful
> rendition of the previous condition, since what we had before was
> basically
>
>             hctl->freeList[0].nentries >= (long) (hctl->max_bucket + 1)

Agreed, and done.  Thanks!