On Tue, Apr 12, 2016 at 12:40 AM, Andres Freund <andres@anarazel.de> wrote:
I did get access to the machine (thanks!). My testing shows that
performance is sensitive to various parameters influencing memory allocation. E.g. twiddling with max_connections changes performance. With max_connections=400 and the previous patches applied I get ~1220000 tps, with 402 ~1620000 tps. This sorta confirms that we're dealing with an alignment/sharing related issue.
Padding PGXACT to a full cache-line seems to take care of the largest part of the performance irregularity. I looked at perf profiles and saw that most cache misses stem from there, and that the percentage (not absolute amount!) changes between fast/slow settings.
To me it makes intuitive sense why you'd want PGXACTs to be on separate cachelines - they're constantly dirtied via SnapshotResetXmin(). Indeed making it immediately return propels performance up to 1720000, without other changes. Additionally cacheline-padding PGXACT speeds things up to 1750000 tps.
It seems like padding PGXACT to a full cache-line is a great improvement. We have not so many PGXACTs to care about bytes wasted to padding. But could it have another negative side-effect?