I too have performed benchmarking of this patch on a large machine (with 128 CPU(s), 520GB RAM, intel x86-64 architecture) and would like to share my observations for the same (Please note that, as I had to reverify readings on few client counts, it did take some time for me to share these test-results.)
Great! Thank you very much for testing.
Case3: Data fits in shared buffer, Read-write workload: ----------------------------------------------------------------------------- In this case, I could see that the tps on head and patch are very close to each other with a small variation of (+-)3-4% which i assume is a run-to-run variation. PFA result sheet 'results-readwrite-300-1000-SF' containing the test-results.
I wouldn't say it's just a variation. It looks like relatively small but noticeable regression in the patch.
According to Andres comment [1] I made a version of patch (pgxact-align-3.patch) which align PGXACT to 16 bytes.
That excludes situation when single PGXACT is spread over 2 cache lines.
Results of read-only tests are attached. We can see that 16-byte alignment gives speedup in read-only tests, but it's a bit less than speedup of cache line alignment version.
Read-write tests are now running. Hopefully 16-byte alignment version of patch wouldn't cause regression in read-write benchmark.