Re: Use simplehash.h instead of dynahash in SMgr - Mailing list pgsql-hackers
From | Yura Sokolov |
---|---|
Subject | Re: Use simplehash.h instead of dynahash in SMgr |
Date | |
Msg-id | 4b7616c612b9c445a401025f74ed57bf@postgrespro.ru Whole thread Raw |
In response to | Use simplehash.h instead of dynahash in SMgr (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Use simplehash.h instead of dynahash in SMgr
Re: Use simplehash.h instead of dynahash in SMgr |
List | pgsql-hackers |
David Rowley писал 2021-04-24 18:58: > Hackers, > > Last year, when working on making compactify_tuples() go faster for > 19c60ad69, I did quite a bit of benchmarking of the recovery process. > The next thing that was slow after compactify_tuples() was the hash > lookups done in smgropen(). > > Currently, we use dynahash hash tables to store the SMgrRelation so we > can perform fast lookups by RelFileNodeBackend. However, I had in mind > that a simplehash table might perform better. So I tried it... > > The attached converts the hash table lookups done in smgr.c to use > simplehash instead of dynahash. > > This does require a few changes in simplehash.h to make it work. The > reason being is that RelationData.rd_smgr points directly into the > hash table entries. This works ok for dynahash as that hash table > implementation does not do any reallocations of existing items or move > any items around in the table, however, simplehash moves entries > around all the time, so we can't point any pointers directly at the > hash entries and expect them to be valid after adding or removing > anything else from the table. > > To work around that, I've just made an additional type that serves as > the hash entry type that has a pointer to the SMgrRelationData along > with the hash status and hash value. It's just 16 bytes (or 12 on > 32-bit machines). I opted to keep the hash key in the > SMgrRelationData rather than duplicating it as it keeps the SMgrEntry > struct nice and small. We only need to dereference the SMgrRelation > pointer when we find an entry with the same hash value. The chances > are quite good that an entry with the same hash value is the one that > we want, so any additional dereferences to compare the key are not > going to happen very often. > > I did experiment with putting the hash key in SMgrEntry and found it > to be quite a bit slower. I also did try to use hash_bytes() but > found building a hash function that uses murmurhash32 to be quite a > bit faster. > > Benchmarking > =========== > > I did some of that. It made my test case about 10% faster. > > The test case was basically inserting 100 million rows one at a time > into a hash partitioned table with 1000 partitions and 2 int columns > and a primary key on one of those columns. It was about 12GB of WAL. I > used a hash partitioned table in the hope to create a fairly > random-looking SMgr hash table access pattern. Hopefully something > similar to what might happen in the real world. > > Over 10 runs of recovery, master took an average of 124.89 seconds. > The patched version took 113.59 seconds. About 10% faster. > > I bumped shared_buffers up to 10GB, max_wal_size to 20GB and > checkpoint_timeout to 60 mins. > > To make the benchmark more easily to repeat I patched with the > attached recovery_panic.patch.txt. This just PANICs at the end of > recovery so that the database shuts down before performing the end of > recovery checkpoint. Just start the database up again to do another > run. > > I did 10 runs. The end of recovery log message reported: > > master (aa271209f) > CPU: user: 117.89 s, system: 5.70 s, elapsed: 123.65 s > CPU: user: 117.81 s, system: 5.74 s, elapsed: 123.62 s > CPU: user: 119.39 s, system: 5.75 s, elapsed: 125.20 s > CPU: user: 117.98 s, system: 4.39 s, elapsed: 122.41 s > CPU: user: 117.92 s, system: 4.79 s, elapsed: 122.76 s > CPU: user: 119.84 s, system: 4.75 s, elapsed: 124.64 s > CPU: user: 120.60 s, system: 5.82 s, elapsed: 126.49 s > CPU: user: 118.74 s, system: 5.71 s, elapsed: 124.51 s > CPU: user: 124.29 s, system: 6.79 s, elapsed: 131.14 s > CPU: user: 118.73 s, system: 5.67 s, elapsed: 124.47 s > > master + v1 patch > CPU: user: 106.90 s, system: 4.45 s, elapsed: 111.39 s > CPU: user: 107.31 s, system: 5.98 s, elapsed: 113.35 s > CPU: user: 107.14 s, system: 5.58 s, elapsed: 112.77 s > CPU: user: 105.79 s, system: 5.64 s, elapsed: 111.48 s > CPU: user: 105.78 s, system: 5.80 s, elapsed: 111.63 s > CPU: user: 113.18 s, system: 6.21 s, elapsed: 119.45 s > CPU: user: 107.74 s, system: 4.57 s, elapsed: 112.36 s > CPU: user: 107.42 s, system: 4.62 s, elapsed: 112.09 s > CPU: user: 106.54 s, system: 4.65 s, elapsed: 111.24 s > CPU: user: 113.24 s, system: 6.86 s, elapsed: 120.16 s > > I wrote this patch a few days ago. I'm only posting it now as I know a > couple of other people have expressed an interest in working on this. > I didn't really want any duplicate efforts, so thought I'd better post > it now before someone else goes and writes a similar patch. > > I'll park this here and have another look at it when the PG15 branch > opens. > > David Hi, David It is quite interesting result. Simplehash being open-addressing with linear probing is friendly for cpu cache. I'd recommend to define SH_FILLFACTOR with value lower than default (0.9). I believe 0.75 is suitable most for such kind of hash table. > + /* rotate hashkey left 1 bit at each step */ > + hashkey = (hashkey << 1) | ((hashkey & 0x80000000) ? 1 : 0); > + hashkey ^= murmurhash32((uint32) rnode->node.dbNode); Why do you use so strange rotation expression? I know compillers are able to translage `h = (h << 1) | (h >> 31)` to single rotate instruction. Do they recognize construction in your code as well? Your construction looks more like "multiplate-modulo" operation in 32bit Galois field . It is widely used operation in cryptographic, but it is used modulo some primitive polynomial, and 0x100000001 is not such polynomial. 0x1000000c5 is, therefore it should be: hashkey = (hashkey << 1) | ((hashkey & 0x80000000) ? 0xc5 : 0); or hashkey = (hashkey << 1) | ((uint32)((int32)hashkey >> 31) & 0xc5); But why don't just use hash_combine(uint32 a, uint32 b) instead (defined in hashfn.h)? Yep, it could be a bit slower, but is it critical? > - * smgrclose() -- Close and delete an SMgrRelation object. > + * smgrclose() -- Close and delete an SMgrRelation object but don't > + * remove from the SMgrRelationHash table. I believe `smgrclose_internal()` should be in this comment. Still I don't believe it worth to separate smgrclose_internal from smgrclose. Is there measurable performance improvement from this change? Even if there is, it will be lesser with SH_FILLFACTOR 0.75 . As well I don't support modification simplehash.h for SH_ENTRY_INITIALIZER, SH_ENTRY_CLEANUP and SH_TRUNCATE. The initialization could comfortably live in smgropen and the cleanup in smgrclose. And then SH_TRUNCATE doesn't mean much. Summary: regards, Yura Sokolov
Attachment
pgsql-hackers by date: