Home > mailing lists

Re: Use simplehash.h instead of dynahash in SMgr - Mailing list pgsql-hackers

From	Yura Sokolov
Subject	Re: Use simplehash.h instead of dynahash in SMgr
Date	April 28, 2021 15:28:57
Msg-id	744eb6874c364e34b7639a85853cc3ea@postgrespro.ru Whole thread Raw
In response to	Re: Use simplehash.h instead of dynahash in SMgr (David Rowley <dgrowleyml@gmail.com>)
Responses	Re: Use simplehash.h instead of dynahash in SMgr (David Rowley <dgrowleyml@gmail.com>)
List	pgsql-hackers

Tree view

David Rowley писал 2021-04-26 09:43:
> I tried that and it got a median result of 113.795 seconds over 14
> runs with this recovery benchmark test.
> 
> LOG:  size: 4096, members: 2032, filled: 0.496094, total chain: 1014,
> max chain: 6, avg chain: 0.499016, total_collisions: 428,
> max_collisions: 3, avg_collisions: 0.210630
> 
> I also tried the following hash function just to see how much
> performance might be left from speeding it up:
> 
> static inline uint32
> relfilenodebackend_hash(RelFileNodeBackend *rnode)
> {
> uint32 h;
> 
> h = pg_rotate_right32((uint32) rnode->node.relNode, 16) ^ ((uint32)
> rnode->node.dbNode);
> return murmurhash32(h);
> }
> 
> I got a median of 112.685 seconds over 14 runs with:
> 
> LOG:  size: 4096, members: 2032, filled: 0.496094, total chain: 1044,
> max chain: 7, avg chain: 0.513780, total_collisions: 438,
> max_collisions: 3, avg_collisions: 0.215551

The best result is with just:

    return (uint32)rnode->node.relNode;

ie, relNode could be taken without mixing at all.
relNode is unique inside single database, and almost unique among whole 
cluster
since it is Oid.

>> I'd like to see benchmark code. It quite interesting this place became
>> measurable at all.
> 
> Sure.
> 
> $ cat recoverybench_insert_hash.sh
> ....
> 
> David.

So, I've repeated benchmark with different number of partitons (I tried
to catch higher fillfactor) and less amount of inserted data (since I 
don't
want to stress my SSD).

partitions/ | dynahash | dynahash +  | simplehash | simplehash + |
fillfactor  |          | simple func |           | simple func  |
------------+----------+-------------+--------------+
  3500/0.43  |   3.73s  |   3.54s     |   3.58s    |   3.34s      |
  3200/0.78  |   3.64s  |   3.46s     |   3.47s    |   3.25s      |
  1500/0.74  |   3.18s  |   2.97s     |   3.03s    |   2.79s      |

Fillfactor is effective fillfactor in simplehash with than number of 
partitions.
I wasn't able to measure with fillfactor close to 0.9 since looks like
simplehash tends to grow much earlier due to SH_GROW_MAX_MOVE.

Simple function is hash function that returns only rnode->node.relNode.
I've test it both with simplehash and dynahash.
For dynahash also custom match function were made.

Conclusion:
- trivial hash function gives better results for both simplehash and 
dynahash,
- simplehash improves performance for both complex and trivial hash 
function,
- simplehash + trivial function perform best.

I'd like to hear other's people comments on trivial hash function. But 
since
generation of relation's Oid are not subject of human interventions, I'd 
recommend
to stick with trivial.

Since patch is simple, harmless and gives measurable improvement,
I think it is ready for commit fest.

regards,
Yura Sokolov.
Postgres Proffesional https://www.postgrespro.com

PS. David, please send patch once again since my mail client reattached 
files in
previous messages, and commit fest robot could think I'm author.

pgsql-hackers by date:

From: Aleksander Alekseev
Date: 28 April 2021, 15:22:39
Subject: Re: Some oversights in query_id calculation

From: Justin Pryzby
Date: 28 April 2021, 16:11:47
Subject: Re: Some doubious error messages and comments

Re: Use simplehash.h instead of dynahash in SMgr - Mailing list pgsql-hackers

Previous

Next