On Sun, Apr 10, 2016 at 1:13 AM, Andres Freund <andres@anarazel.de> wrote:
On 2016-04-09 22:38:31 +0300, Alexander Korotkov wrote: > There are results with 5364b357 reverted.
What exactly is this test?
I think assuming it is a read-only -M prepared pgbench run where data fits in shared buffers. However if you can share exact details, then I can try the similar test.
Crazy that this has such a negative impact. Amit, can you reproduce that?
I will try it.
Good.
Okay, I have done some performance testing of read-only tests with configuration suggested by you to see the impact
pin_unpin - latest version of pin unpin patch on top of HEAD.
pin_unpin_clog_32 - pin_unpin + change clog buffers to 32
Client_Count/Patch_ver
64
128
pin_unpin
330280
133586
pin_unpin_clog_32
388244
132388
This shows that at 64 client count, the performance is better with 32 clog buffers. However, I think this is more attributed towards the fact that contention seems to shifted to procarraylock as to an extent indicated in Alexandar's mail. I will try once with cache the snapshot patch as well and with clog buffers as 64.
I went ahead and tried with Cache the snapshot patch and with clog buffers as 64 and below is performance data:
Description of patches
pin_unpin - latest version of pin unpin patch on top of HEAD.
pin_unpin_clog_32 - pin_unpin + change clog buffers to 32
pin_unpin_cache_snapshot - pin_unpin + Cache the snapshot
pin_unpin_clog_64 - pin_unpin + change clog buffers to 64
Client_Count/Patch_ver
64
128
pin_unpin
330280
133586
pin_unpin_clog_32
388244
132388
pin_unpin_cache_snapshot
412149
144799
pin_unpin_clog_64
391472
132951
Above data seems to indicate that cache the snapshot patch will make performance go further up with clog buffers as 128 (HEAD). I will take the performance data with pin_unpin + clog buffers as 32 + cache the snapshot, but above seems a good enough indication that making clog buffers as 128 is a good move considering we will one day improve GetSnapshotData either by Cache the snapshot technique or some other way. Also making clog buffers as 64 instead of 128 seems to address the regression (at the very least in above tests), but for read-write performance, clog buffers as 128 has better numbers, though the difference between 64 and 128 is not very high.