Use simplehash.h instead of dynahash in SMgr - Mailing list pgsql-hackers

From David Rowley
Subject Use simplehash.h instead of dynahash in SMgr
Date
Msg-id CAApHDvpkWOGLh_bYg7jproXN8B2g2T9dWDcqsmKsXG5+WwZaqw@mail.gmail.com
Whole thread Raw
Responses Re: Use simplehash.h instead of dynahash in SMgr
Re: Use simplehash.h instead of dynahash in SMgr
List pgsql-hackers
Hackers,

Last year, when working on making compactify_tuples() go faster for
19c60ad69, I did quite a bit of benchmarking of the recovery process.
The next thing that was slow after compactify_tuples() was the hash
lookups done in smgropen().

Currently, we use dynahash hash tables to store the SMgrRelation so we
can perform fast lookups by RelFileNodeBackend. However, I had in mind
that a simplehash table might perform better. So I tried it...

The attached converts the hash table lookups done in smgr.c to use
simplehash instead of dynahash.

This does require a few changes in simplehash.h to make it work.  The
reason being is that RelationData.rd_smgr points directly into the
hash table entries.  This works ok for dynahash as that hash table
implementation does not do any reallocations of existing items or move
any items around in the table, however, simplehash moves entries
around all the time, so we can't point any pointers directly at the
hash entries and expect them to be valid after adding or removing
anything else from the table.

To work around that, I've just made an additional type that serves as
the hash entry type that has a pointer to the SMgrRelationData along
with the hash status and hash value. It's just 16 bytes (or 12 on
32-bit machines).  I opted to keep the hash key in the
SMgrRelationData rather than duplicating it as it keeps the SMgrEntry
struct nice and small.  We only need to dereference the SMgrRelation
pointer when we find an entry with the same hash value.  The chances
are quite good that an entry with the same hash value is the one that
we want, so any additional dereferences to compare the key are not
going to happen very often.

I did experiment with putting the hash key in SMgrEntry and found it
to be quite a bit slower.  I also did try to use hash_bytes() but
found building a hash function that uses murmurhash32 to be quite a
bit faster.

Benchmarking
===========

I did some of that.  It made my test case about 10% faster.

The test case was basically inserting 100 million rows one at a time
into a hash partitioned table with 1000 partitions and 2 int columns
and a primary key on one of those columns. It was about 12GB of WAL. I
used a hash partitioned table in the hope to create a fairly
random-looking SMgr hash table access pattern.  Hopefully something
similar to what might happen in the real world.

Over 10 runs of recovery, master took an average of 124.89 seconds.
The patched version took 113.59 seconds. About 10% faster.

I bumped shared_buffers up to 10GB, max_wal_size to 20GB and
checkpoint_timeout to 60 mins.

To make the benchmark more easily to repeat I patched with the
attached recovery_panic.patch.txt.  This just PANICs at the end of
recovery so that the database shuts down before performing the end of
recovery checkpoint.  Just start the database up again to do another
run.

I did 10 runs.  The end of recovery log message reported:

master (aa271209f)
CPU: user: 117.89 s, system: 5.70 s, elapsed: 123.65 s
CPU: user: 117.81 s, system: 5.74 s, elapsed: 123.62 s
CPU: user: 119.39 s, system: 5.75 s, elapsed: 125.20 s
CPU: user: 117.98 s, system: 4.39 s, elapsed: 122.41 s
CPU: user: 117.92 s, system: 4.79 s, elapsed: 122.76 s
CPU: user: 119.84 s, system: 4.75 s, elapsed: 124.64 s
CPU: user: 120.60 s, system: 5.82 s, elapsed: 126.49 s
CPU: user: 118.74 s, system: 5.71 s, elapsed: 124.51 s
CPU: user: 124.29 s, system: 6.79 s, elapsed: 131.14 s
CPU: user: 118.73 s, system: 5.67 s, elapsed: 124.47 s

master + v1 patch
CPU: user: 106.90 s, system: 4.45 s, elapsed: 111.39 s
CPU: user: 107.31 s, system: 5.98 s, elapsed: 113.35 s
CPU: user: 107.14 s, system: 5.58 s, elapsed: 112.77 s
CPU: user: 105.79 s, system: 5.64 s, elapsed: 111.48 s
CPU: user: 105.78 s, system: 5.80 s, elapsed: 111.63 s
CPU: user: 113.18 s, system: 6.21 s, elapsed: 119.45 s
CPU: user: 107.74 s, system: 4.57 s, elapsed: 112.36 s
CPU: user: 107.42 s, system: 4.62 s, elapsed: 112.09 s
CPU: user: 106.54 s, system: 4.65 s, elapsed: 111.24 s
CPU: user: 113.24 s, system: 6.86 s, elapsed: 120.16 s

I wrote this patch a few days ago. I'm only posting it now as I know a
couple of other people have expressed an interest in working on this.
I didn't really want any duplicate efforts, so thought I'd better post
it now before someone else goes and writes a similar patch.

I'll park this here and have another look at it when the PG15 branch opens.

David

Attachment

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: compute_query_id and pg_stat_statements
Next
From: Magnus Hagander
Date:
Subject: Re: compute_query_id and pg_stat_statements