Re: Use simplehash.h instead of dynahash in SMgr - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Use simplehash.h instead of dynahash in SMgr |
Date | |
Msg-id | CAApHDvpXGmb=5X0ys29N=pa436tadC8tm9SK9cvwz9xfoZj4iQ@mail.gmail.com Whole thread Raw |
In response to | Re: Use simplehash.h instead of dynahash in SMgr (Yura Sokolov <y.sokolov@postgrespro.ru>) |
Responses |
Re: Use simplehash.h instead of dynahash in SMgr
|
List | pgsql-hackers |
On Thu, 29 Apr 2021 at 00:28, Yura Sokolov <y.sokolov@postgrespro.ru> wrote: > The best result is with just: > > return (uint32)rnode->node.relNode; > > ie, relNode could be taken without mixing at all. > relNode is unique inside single database, and almost unique among whole > cluster > since it is Oid. I admit to having tried that too just to almost eliminate the cost of hashing. I just didn't consider it something we'd actually do. The system catalogues are quite likely to all have the same relfilenode in all databases, so for workloads that have a large number of databases, looking up WAL records that touch the catalogues is going to be pretty terrible. The simplified hash function I wrote with just the relNode and dbNode would be the least I'd be willing to entertain. However, I just wouldn't be surprised if there was a good reason for that being bad too. > So, I've repeated benchmark with different number of partitons (I tried > to catch higher fillfactor) and less amount of inserted data (since I > don't > want to stress my SSD). > > partitions/ | dynahash | dynahash + | simplehash | simplehash + | > fillfactor | | simple func | | simple func | > ------------+----------+-------------+--------------+ > 3500/0.43 | 3.73s | 3.54s | 3.58s | 3.34s | > 3200/0.78 | 3.64s | 3.46s | 3.47s | 3.25s | > 1500/0.74 | 3.18s | 2.97s | 3.03s | 2.79s | > > Fillfactor is effective fillfactor in simplehash with than number of > partitions. > I wasn't able to measure with fillfactor close to 0.9 since looks like > simplehash tends to grow much earlier due to SH_GROW_MAX_MOVE. Thanks for testing that. I also ran some tests last night to test the 0.75 vs 0.9 fillfactor to see if it made a difference. The test was similar to last time, but I trimmed the number of rows inserted from 100 million down to 25 million. Last time I tested with 1000 partitions, this time with each of: 100 200 300 400 500 600 700 800 900 1000 partitions. There didn't seem to be any point of testing lower than that as the minimum hash table size is 512. The averages over 10 runs were: nparts ff75 ff90 100 21.898 22.226 200 23.105 25.493 300 25.274 24.251 400 25.139 25.611 500 25.738 25.454 600 26.656 26.82 700 27.577 27.102 800 27.608 27.546 900 27.284 28.186 1000 29 28.153 Or to summarise a bit, we could just look at the sum of all the results per fillfactor: sum ff75 2592.79 sum ff90 2608.42 100.6% fillfactor 75 did come out slightly faster, but only just. It seems close enough that it might be better just to keep the 90 to save a little memory and improve caching elsewhere. Also, from below, you can see that for 300, 500, 600, 700, 1000 tables tests, the hash tables ended up the same size, yet there's a bit of variability in the timing result. So the 0.6% gain might just be noise. I don't think it's worth making the fillfactor 75. drowley@amd3990x:~/recoverylogs$ grep -rH -m 1 "collisions" ff75_tb100.log:LOG: size: 1024, members: 231, filled: 0.225586, total chain: 33, max chain: 2, avg chain: 0.142857, total_collisions: 20, max_collisions: 2, avg_collisions: 0.086580 ff90_tb100.log:LOG: size: 512, members: 231, filled: 0.451172, total chain: 66, max chain: 2, avg chain: 0.285714, total_collisions: 36, max_collisions: 2, avg_collisions: 0.155844 ff75_tb200.log:LOG: size: 1024, members: 431, filled: 0.420898, total chain: 160, max chain: 4, avg chain: 0.371230, total_collisions: 81, max_collisions: 3, avg_collisions: 0.187935 ff90_tb200.log:LOG: size: 512, members: 431, filled: 0.841797, total chain: 942, max chain: 9, avg chain: 2.185615, total_collisions: 134, max_collisions: 3, avg_collisions: 0.310905 ff90_tb300.log:LOG: size: 1024, members: 631, filled: 0.616211, total chain: 568, max chain: 9, avg chain: 0.900158, total_collisions: 158, max_collisions: 4, avg_collisions: 0.250396 ff75_tb300.log:LOG: size: 1024, members: 631, filled: 0.616211, total chain: 568, max chain: 9, avg chain: 0.900158, total_collisions: 158, max_collisions: 4, avg_collisions: 0.250396 ff75_tb400.log:LOG: size: 2048, members: 831, filled: 0.405762, total chain: 341, max chain: 4, avg chain: 0.410349, total_collisions: 162, max_collisions: 3, avg_collisions: 0.194946 ff90_tb400.log:LOG: size: 1024, members: 831, filled: 0.811523, total chain: 1747, max chain: 15, avg chain: 2.102286, total_collisions: 269, max_collisions: 3, avg_collisions: 0.323706 ff75_tb500.log:LOG: size: 2048, members: 1031, filled: 0.503418, total chain: 568, max chain: 5, avg chain: 0.550921, total_collisions: 219, max_collisions: 4, avg_collisions: 0.212415 ff90_tb500.log:LOG: size: 2048, members: 1031, filled: 0.503418, total chain: 568, max chain: 5, avg chain: 0.550921, total_collisions: 219, max_collisions: 4, avg_collisions: 0.212415 ff75_tb600.log:LOG: size: 2048, members: 1231, filled: 0.601074, total chain: 928, max chain: 7, avg chain: 0.753859, total_collisions: 298, max_collisions: 4, avg_collisions: 0.242080 ff90_tb600.log:LOG: size: 2048, members: 1231, filled: 0.601074, total chain: 928, max chain: 7, avg chain: 0.753859, total_collisions: 298, max_collisions: 4, avg_collisions: 0.242080 ff75_tb700.log:LOG: size: 2048, members: 1431, filled: 0.698730, total chain: 1589, max chain: 9, avg chain: 1.110412, total_collisions: 391, max_collisions: 4, avg_collisions: 0.273235 ff90_tb700.log:LOG: size: 2048, members: 1431, filled: 0.698730, total chain: 1589, max chain: 9, avg chain: 1.110412, total_collisions: 391, max_collisions: 4, avg_collisions: 0.273235 ff75_tb800.log:LOG: size: 4096, members: 1631, filled: 0.398193, total chain: 628, max chain: 6, avg chain: 0.385040, total_collisions: 296, max_collisions: 3, avg_collisions: 0.181484 ff90_tb800.log:LOG: size: 2048, members: 1631, filled: 0.796387, total chain: 2903, max chain: 12, avg chain: 1.779890, total_collisions: 515, max_collisions: 3, avg_collisions: 0.315757 ff75_tb900.log:LOG: size: 4096, members: 1831, filled: 0.447021, total chain: 731, max chain: 5, avg chain: 0.399235, total_collisions: 344, max_collisions: 3, avg_collisions: 0.187875 ff90_tb900.log:LOG: size: 2048, members: 1831, filled: 0.894043, total chain: 6364, max chain: 14, avg chain: 3.475696, total_collisions: 618, max_collisions: 4, avg_collisions: 0.337520 ff75_tb1000.log:LOG: size: 4096, members: 2031, filled: 0.495850, total chain: 1024, max chain: 6, avg chain: 0.504185, total_collisions: 416, max_collisions: 3, avg_collisions: 0.204825 ff90_tb1000.log:LOG: size: 4096, members: 2031, filled: 0.495850, total chain: 1024, max chain: 6, avg chain: 0.504185, total_collisions: 416, max_collisions: 3, avg_collisions: 0.204825 Another line of thought for making it go faster would be to do something like get rid of the hash status field from SMgrEntry. That could be either coded into a single bit we'd borrow from the hash value, or it could be coded into the least significant bit of the data field. A pointer to palloc'd memory should always be MAXALIGNed, which means at least the lower two bits are always zero. We'd just need to make sure and do something like "data & ~((uintptr_t) 3)" to trim off the hash status bits before dereferencing the pointer. That would make the SMgrEntry 12 bytes on a 64-bit machine. However, it would also mean that some entries would span 2 cache lines, which might affect performance a bit. > PS. David, please send patch once again since my mail client reattached > files in > previous messages, and commit fest robot could think I'm author. Authors are listed manually in the CF app. The app will pickup .patch files from the latest email in the thread and the CF bot will test those. So it does pay to be pretty careful when attaching patches to threads that are in the CF app. That's the reason I added the .txt extension to the recovery panic patch. The CF bot machines would have complained about that.
pgsql-hackers by date: