Division in dynahash.c due to HASH_FFACTOR - Mailing list pgsql-hackers

From Jakub Wartak
Subject Division in dynahash.c due to HASH_FFACTOR
Date
Msg-id VI1PR0701MB696044FC35013A96FECC7AC8F62D0@VI1PR0701MB6960.eurprd07.prod.outlook.com
Whole thread Raw
Responses Re: Division in dynahash.c due to HASH_FFACTOR
Re: Division in dynahash.c due to HASH_FFACTOR
List pgsql-hackers
Greetins hackers,

I have mixed feelings if this welcome contribution as the potential gain is relatively small in my tests, but still I
wouldlike to point out that HASH_FFACTOR functionality from dynahash.c could be removed or optimized (default fill
factoris always 1, there's not a single place that uses custom custom fill factor other than DEF_FFACTOR=1 inside
PostgreSQLrepository). Because the functionality is present there seems to be division for every buffer access
[BufTableLookup()]/ or every smgropen() call (everything call to hash_search() is affected, provided it's not
ShmemInitHash/HASH_PARTITION).This division is especially visible via perf on single process StartupXLOG WAL recovery
processon standby in heavy duty 100% CPU conditions , as the top1 is inside hash_search: 
   0x0000000000888751 <+449>:   idiv   r8
   0x0000000000888754 <+452>:   cmp    rax,QWORD PTR [r15+0x338] <<-- in perf annotate shows as 30-40%, even on default
-O2,probably CPU pipelining for idiv above 

I've made a PoC test to skip that division assuming ffactor would be gone:
               if (!IS_PARTITIONED(hctl) && !hashp->frozen &&
-                       hctl->freeList[0].nentries / (long) (hctl->max_bucket + 1) >= hctl->ffactor &&
+                       hctl->freeList[0].nentries >= (long) (hctl->max_bucket + 1) &&

For a stream of WAL 3.7GB I'm getting consistent improvement of ~4%, (yes I know it's small, that's why I'm having
mixedfeelings): 
gcc -O3: 104->100s
gcc -O2: 108->104s
pgbench -S -c 16 -j 4 -T 30 -M prepared: stays more or less the same (-s 100), so no positive impact there

After removing HASH_FFACTOR PostgreSQL still compiles...  Would removing it break some external API/extensions ? I saw
severaloptimization for the "idiv" where it could be optimized e.g. see https://github.com/ridiculousfish/libdivide  Or
maybethere is some other idea to expose bottlenecks of BufTableLookup() ? I also saw codepath
PinBuffer()->GetPrivateRefCountEntry()-> dynahash that could be called pretty often I have no idea what kind of pgbench
stresstestcould be used to demonstrate the gain (or lack of it). 

-Jakub Wartak.


pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: [PATCH] Detect escape of ErrorContextCallback stack pointers (and from PG_TRY() )
Next
From: Heikki Linnakangas
Date:
Subject: Re: POC: rational number type (fractions)