Re: Use simplehash.h instead of dynahash in SMgr - Mailing list pgsql-hackers

From Yura Sokolov
Subject Re: Use simplehash.h instead of dynahash in SMgr
Date
Msg-id 4159ecd97e53318e6302d4091c27e97f@postgrespro.ru
Whole thread Raw
In response to Re: Use simplehash.h instead of dynahash in SMgr  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Use simplehash.h instead of dynahash in SMgr  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
David Rowley писал 2021-04-29 02:51:
> On Thu, 29 Apr 2021 at 00:28, Yura Sokolov <y.sokolov@postgrespro.ru> 
> wrote:
>> The best result is with just:
>> 
>>     return (uint32)rnode->node.relNode;
>> 
>> ie, relNode could be taken without mixing at all.
>> relNode is unique inside single database, and almost unique among 
>> whole
>> cluster
>> since it is Oid.
> 
> I admit to having tried that too just to almost eliminate the cost of
> hashing.  I just didn't consider it something we'd actually do.
> 
> The system catalogues are quite likely to all have the same
> relfilenode in all databases, so for workloads that have a large
> number of databases, looking up WAL records that touch the catalogues
> is going to be pretty terrible.

Single workload that could touch system catalogues in different
databases is recovery (and autovacuum?). Client backends couldn't
be connected to more than one database.

But netherless, you're right. I oversimplified it intentionally.
I wrote originally:

    hashcode = (uint32)rnode->node.dbNode * 0xc2b2ae35;
    hashcode ^= (uint32)rnode->node.relNode;
    return hashcode;

But before sending mail I'd cut dbNode part.
Still, main point: relNode could be put unmixed into final value.
This way less collisions happen.

> 
> The simplified hash function I wrote with just the relNode and dbNode
> would be the least I'd be willing to entertain.  However, I just
> wouldn't be surprised if there was a good reason for that being bad
> too.
> 
> 
>> So, I've repeated benchmark with different number of partitons (I 
>> tried
>> to catch higher fillfactor) and less amount of inserted data (since I
>> don't
>> want to stress my SSD).
>> 
>> partitions/ | dynahash | dynahash +  | simplehash | simplehash + |
>> fillfactor  |          | simple func |           | simple func  |
>> ------------+----------+-------------+--------------+
>>   3500/0.43  |   3.73s  |   3.54s     |   3.58s    |   3.34s      |
>>   3200/0.78  |   3.64s  |   3.46s     |   3.47s    |   3.25s      |
>>   1500/0.74  |   3.18s  |   2.97s     |   3.03s    |   2.79s      |
>> 
>> Fillfactor is effective fillfactor in simplehash with than number of
>> partitions.
>> I wasn't able to measure with fillfactor close to 0.9 since looks like
>> simplehash tends to grow much earlier due to SH_GROW_MAX_MOVE.
> 
> Thanks for testing that.
> 
> I also ran some tests last night to test the 0.75 vs 0.9 fillfactor to
> see if it made a difference.  The test was similar to last time, but I
> trimmed the number of rows inserted from 100 million down to 25
> million.  Last time I tested with 1000 partitions, this time with each
> of: 100 200 300 400 500 600 700 800 900 1000 partitions.  There didn't
> seem to be any point of testing lower than that as the minimum hash
> table size is 512.
> 
> The averages over 10 runs were:
> 
> nparts  ff75       ff90
> 100      21.898  22.226
> 200      23.105  25.493
> 300      25.274  24.251
> 400      25.139  25.611
> 500      25.738  25.454
> 600      26.656  26.82
> 700      27.577  27.102
> 800      27.608  27.546
> 900      27.284  28.186
> 1000    29         28.153
> 
> Or to summarise a bit, we could just look at the sum of all the
> results per fillfactor:
> 
> sum ff75 2592.79
> sum ff90 2608.42 100.6%
> 
> fillfactor 75 did come out slightly faster, but only just. It seems
> close enough that it might be better just to keep the 90 to save a
> little memory and improve caching elsewhere.  Also, from below, you
> can see that for 300, 500, 600, 700, 1000 tables tests, the hash
> tables ended up the same size, yet there's a bit of variability in the
> timing result.  So the 0.6% gain might just be noise.
> 
> I don't think it's worth making the fillfactor 75.

To be clear: I didn't change SH_FILLFACTOR. It were equal to 0.9 .
I just were not able to catch table with fill factor more than 0.78.
Looks like you've got it with 900 partitions :-)

> 
> Another line of thought for making it go faster would be to do
> something like get rid of the hash status field from SMgrEntry.  That
> could be either coded into a single bit we'd borrow from the hash
> value, or it could be coded into the least significant bit of the data
> field.  A pointer to palloc'd memory should always be MAXALIGNed,
> which means at least the lower two bits are always zero.  We'd just
> need to make sure and do something like "data & ~((uintptr_t) 3)" to
> trim off the hash status bits before dereferencing the pointer.  That
> would make the SMgrEntry 12 bytes on a 64-bit machine.  However, it
> would also mean that some entries would span 2 cache lines, which
> might affect performance a bit.

Then data pointer will be itself unaligned to 8 bytes. While x86 is
mostly indifferent to this, doubtfully this memory economy will pay
off.

regards,
Yura Sokolov.



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: WIP: WAL prefetch (another approach)
Next
From: Richard Yen
Date:
Subject: Patch to allow pg_filedump to support reading of pg_filenode.map