This developed a slight merge conflict. I've rebased the attached version, and I also took the step of getting rid of buf_table.c, as I think I proposed somewhere upthread. This avoids the overhead of constructing a BufferTag only to copy it into a BufferLookupEnt, plus some function calls and so forth. A quick-and-dirty test suggests this might not have cut down on the 1-client overhead much, but I think it's worth doing anyway: it's certainly saving a few cycles, and I don't think it's complicating anything measurably.
Performance data at some of the configurations.
Configuration and Db Details ---------------------------------------------- IBM POWER-8 24 cores, 192 hardware threads RAM = 492GB checkpoint_segments=300 checkpoint_timeout =25min Client Count = number of concurrent sessions and threads (ex. -c 8 -j 8) Duration of each individual run = 5min Scale_factor - 5000 HEAD (commit id - 168a809d)
Below is the data for median of 3-runs with pgbench read-only
(using -M prepared) configuration
Shared_buffers=8GB
Client Count/No. Of Runs (tps)
1
8
16
32
64
128
256
HEAD
17748
119106
164949
246632
216763
183177
173055
HEAD + patch
17802
119721
167422
298746
457863
422621
356756
Shared_buffers=16GB
Client Count/No. Of Runs (tps)
1
8
16
32
64
128
256
HEAD
18139
113265
169217
270172
310936
238490
215308
HEAD + patch
17900
119960
174196
314866
448238
425916
347919
Observations as per data
--------------------------------------
a. It improves the tps by great margin at higher client count.
b. It is evident that as we increase the shared buffers, the gain
is relatively less (gain when shared_buffers is 16GB is lesser as
compare to when shared_buffers is 8GB)
I think the patch is valuable for such loads even though it will show
lesser benefit at higher shared buffers value, although we might want
to once verify that it doesn't topple at configurations such as