Good day, all.
I did benchmark of patch on 2 socket Xeon 5220 CPU @ 2.20GHz .
I used "benchmark" used to reproduce problems with SLRU on our
customers setup.
In opposite to Shawn's tests I concentrated on bad case: a lot
of contention.
slru-funcs.sql - function definitions
- functions creates a lot of subtrunsactions to stress subtrans
- and select random rows for share to stress multixacts
slru-call.sql - function call for benchmark
slru-ballast.sql - randomly select 1000 consequent rows
"for update skip locked" to stress multixacts
patch1 - make SLRU buffers configurable
patch2 - make "8-associative banks"
Benchmark done by pgbench.
Inited with scale 1 to induce contention.
pgbench -i -s 1 testdb
Benchmark 1:
- low number of connections (50), 60% slru-call, 40% slru-ballast
pgbench -f slru-call.sql@60 -f slru-ballast.sql@40 -c 50 -j 75 -P 1 -T 30 testdb
version | subtrans | multixact | tps
| buffers | offs/memb | func+ballast
--------+----------+-----------+------
master | 32 | 8/16 | 184+119
patch1 | 32 | 8/16 | 184+119
patch1 | 1024 | 8/16 | 121+77
patch1 | 1024 | 512/1024 | 118+75
patch2 | 32 | 8/16 | 190+122
patch2 | 1024 | 8/16 | 190+125
patch2 | 1024 | 512/1024 | 190+127
As you see, this test case degrades with dumb increase of
SLRU buffers. But use of "hash table" in form of "associative
buckets" makes performance stable.
Benchmark 2:
- high connection number (600), 98% slru-call, 2% slru-ballast
pgbench -f slru-call.sql@98 -f slru-ballast.sql@2 -c 600 -j 75 -P 1 -T 30 testdb
I don't paste "ballast" tps here since 2% make them too small,
and they're very noisy.
version | subtrans | multixact | tps
| buffers | offs/memb | func
--------+----------+-----------+------
master | 32 | 8/16 | 13
patch1 | 32
| 8/16 | 13
patch1 | 1024 | 8/16 | 31
patch1 | 1024 | 512/1024 | 53
patch2 | 32 | 8/16 | 12
patch2 | 1024 | 8/16 | 34
patch2 | 1024 | 512/1024 | 67
In this case simple buffer increase does help. But "buckets"
increase performance gain.
I didn't paste here results third part of patch ("Pack SLRU...")
because I didn't see any major performance gain from it, and
it consumes large part of patch diff.
Rebased versions of first two patch parts are attached.
regards,
Yura Sokolov