Re: Use simplehash.h instead of dynahash in SMgr - Mailing list pgsql-hackers

From Yura Sokolov
Subject Re: Use simplehash.h instead of dynahash in SMgr
Date
Msg-id 4b7616c612b9c445a401025f74ed57bf@postgrespro.ru
Whole thread Raw
In response to Use simplehash.h instead of dynahash in SMgr  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Use simplehash.h instead of dynahash in SMgr
Re: Use simplehash.h instead of dynahash in SMgr
List pgsql-hackers
David Rowley писал 2021-04-24 18:58:
> Hackers,
> 
> Last year, when working on making compactify_tuples() go faster for
> 19c60ad69, I did quite a bit of benchmarking of the recovery process.
> The next thing that was slow after compactify_tuples() was the hash
> lookups done in smgropen().
> 
> Currently, we use dynahash hash tables to store the SMgrRelation so we
> can perform fast lookups by RelFileNodeBackend. However, I had in mind
> that a simplehash table might perform better. So I tried it...
> 
> The attached converts the hash table lookups done in smgr.c to use
> simplehash instead of dynahash.
> 
> This does require a few changes in simplehash.h to make it work.  The
> reason being is that RelationData.rd_smgr points directly into the
> hash table entries.  This works ok for dynahash as that hash table
> implementation does not do any reallocations of existing items or move
> any items around in the table, however, simplehash moves entries
> around all the time, so we can't point any pointers directly at the
> hash entries and expect them to be valid after adding or removing
> anything else from the table.
> 
> To work around that, I've just made an additional type that serves as
> the hash entry type that has a pointer to the SMgrRelationData along
> with the hash status and hash value. It's just 16 bytes (or 12 on
> 32-bit machines).  I opted to keep the hash key in the
> SMgrRelationData rather than duplicating it as it keeps the SMgrEntry
> struct nice and small.  We only need to dereference the SMgrRelation
> pointer when we find an entry with the same hash value.  The chances
> are quite good that an entry with the same hash value is the one that
> we want, so any additional dereferences to compare the key are not
> going to happen very often.
> 
> I did experiment with putting the hash key in SMgrEntry and found it
> to be quite a bit slower.  I also did try to use hash_bytes() but
> found building a hash function that uses murmurhash32 to be quite a
> bit faster.
> 
> Benchmarking
> ===========
> 
> I did some of that.  It made my test case about 10% faster.
> 
> The test case was basically inserting 100 million rows one at a time
> into a hash partitioned table with 1000 partitions and 2 int columns
> and a primary key on one of those columns. It was about 12GB of WAL. I
> used a hash partitioned table in the hope to create a fairly
> random-looking SMgr hash table access pattern.  Hopefully something
> similar to what might happen in the real world.
> 
> Over 10 runs of recovery, master took an average of 124.89 seconds.
> The patched version took 113.59 seconds. About 10% faster.
> 
> I bumped shared_buffers up to 10GB, max_wal_size to 20GB and
> checkpoint_timeout to 60 mins.
> 
> To make the benchmark more easily to repeat I patched with the
> attached recovery_panic.patch.txt.  This just PANICs at the end of
> recovery so that the database shuts down before performing the end of
> recovery checkpoint.  Just start the database up again to do another
> run.
> 
> I did 10 runs.  The end of recovery log message reported:
> 
> master (aa271209f)
> CPU: user: 117.89 s, system: 5.70 s, elapsed: 123.65 s
> CPU: user: 117.81 s, system: 5.74 s, elapsed: 123.62 s
> CPU: user: 119.39 s, system: 5.75 s, elapsed: 125.20 s
> CPU: user: 117.98 s, system: 4.39 s, elapsed: 122.41 s
> CPU: user: 117.92 s, system: 4.79 s, elapsed: 122.76 s
> CPU: user: 119.84 s, system: 4.75 s, elapsed: 124.64 s
> CPU: user: 120.60 s, system: 5.82 s, elapsed: 126.49 s
> CPU: user: 118.74 s, system: 5.71 s, elapsed: 124.51 s
> CPU: user: 124.29 s, system: 6.79 s, elapsed: 131.14 s
> CPU: user: 118.73 s, system: 5.67 s, elapsed: 124.47 s
> 
> master + v1 patch
> CPU: user: 106.90 s, system: 4.45 s, elapsed: 111.39 s
> CPU: user: 107.31 s, system: 5.98 s, elapsed: 113.35 s
> CPU: user: 107.14 s, system: 5.58 s, elapsed: 112.77 s
> CPU: user: 105.79 s, system: 5.64 s, elapsed: 111.48 s
> CPU: user: 105.78 s, system: 5.80 s, elapsed: 111.63 s
> CPU: user: 113.18 s, system: 6.21 s, elapsed: 119.45 s
> CPU: user: 107.74 s, system: 4.57 s, elapsed: 112.36 s
> CPU: user: 107.42 s, system: 4.62 s, elapsed: 112.09 s
> CPU: user: 106.54 s, system: 4.65 s, elapsed: 111.24 s
> CPU: user: 113.24 s, system: 6.86 s, elapsed: 120.16 s
> 
> I wrote this patch a few days ago. I'm only posting it now as I know a
> couple of other people have expressed an interest in working on this.
> I didn't really want any duplicate efforts, so thought I'd better post
> it now before someone else goes and writes a similar patch.
> 
> I'll park this here and have another look at it when the PG15 branch 
> opens.
> 
> David

Hi, David

It is quite interesting result. Simplehash being open-addressing with
linear probing is friendly for cpu cache. I'd recommend to define
SH_FILLFACTOR with value lower than default (0.9). I believe 0.75 is
suitable most for such kind of hash table.

> +    /* rotate hashkey left 1 bit at each step */
> +    hashkey = (hashkey << 1) | ((hashkey & 0x80000000) ? 1 : 0);
> +    hashkey ^= murmurhash32((uint32) rnode->node.dbNode);

Why do you use so strange rotation expression? I know compillers are 
able
to translage `h = (h << 1) | (h >> 31)` to single rotate instruction.
Do they recognize construction in your code as well?

Your construction looks more like "multiplate-modulo" operation in 32bit
Galois field . It is widely used operation in cryptographic, but it is
used modulo some primitive polynomial, and 0x100000001 is not such
polynomial. 0x1000000c5 is, therefore it should be:

     hashkey = (hashkey << 1) | ((hashkey & 0x80000000) ? 0xc5 : 0);
or
     hashkey = (hashkey << 1) | ((uint32)((int32)hashkey >> 31) & 0xc5);

But why don't just use hash_combine(uint32 a, uint32 b) instead (defined
in hashfn.h)? Yep, it could be a bit slower, but is it critical?

> - *    smgrclose() -- Close and delete an SMgrRelation object.
> + *    smgrclose() -- Close and delete an SMgrRelation object but don't
> + *    remove from the SMgrRelationHash table.

I believe `smgrclose_internal()` should be in this comment.

Still I don't believe it worth to separate smgrclose_internal from
smgrclose. Is there measurable performance improvement from this
change? Even if there is, it will be lesser with SH_FILLFACTOR 0.75 .

As well I don't support modification simplehash.h for 
SH_ENTRY_INITIALIZER,
SH_ENTRY_CLEANUP and SH_TRUNCATE. The initialization could comfortably
live in smgropen and the cleanup in smgrclose. And then SH_TRUNCATE
doesn't mean much.

Summary:

regards,
Yura Sokolov
Attachment

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: decoupling table and index vacuum
Next
From: David Rowley
Date:
Subject: Re: Use simplehash.h instead of dynahash in SMgr