Home > mailing lists

Re: Move PinBuffer and UnpinBuffer to atomics - Mailing list pgsql-hackers

From	Alexander Korotkov
Subject	Re: Move PinBuffer and UnpinBuffer to atomics
Date	April 13, 2016 11:42:01
Msg-id	CAPpHfdt9bbHj5nNGazuDyw=ZotNbOVznofmwPMeBBxu3sxyYNA@mail.gmail.com Whole thread
In response to	Re: Move PinBuffer and UnpinBuffer to atomics (Amit Kapila <amit.kapila16@gmail.com>)
List	pgsql-hackers

Tree view

On Tue, Apr 12, 2016 at 5:12 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 12, 2016 at 3:48 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Tue, Apr 12, 2016 at 12:40 AM, Andres Freund <andres@anarazel.de> wrote:
I did get access to the machine (thanks!). My testing shows that
performance is sensitive to various parameters influencing memory
allocation. E.g. twiddling with max_connections changes
performance. With max_connections=400 and the previous patches applied I
get ~1220000 tps, with 402 ~1620000 tps. This sorta confirms that we're
dealing with an alignment/sharing related issue.

Padding PGXACT to a full cache-line seems to take care of the largest
part of the performance irregularity. I looked at perf profiles and saw
that most cache misses stem from there, and that the percentage (not
absolute amount!) changes between fast/slow settings.

To me it makes intuitive sense why you'd want PGXACTs to be on separate
cachelines - they're constantly dirtied via SnapshotResetXmin(). Indeed
making it immediately return propels performance up to 1720000, without
other changes. Additionally cacheline-padding PGXACT speeds things up to
1750000 tps.

It seems like padding PGXACT to a full cache-line is a great improvement. We have not so many PGXACTs to care about bytes wasted to padding.

Yes, it seems generally it is a good idea, but not sure if it is a complete fix for variation in performance we are seeing when we change shared memory structures. Andres suggested me on IM to take performance data on x86 m/c by padding PGXACT and the data for the same is as below:

median of 3, 5-min runs

Client_Count/Patch_ver 8 64 128
HEAD 59708 329560 173655
PATCH 61480 379798 157580

Here, at 128 client-count the performance with patch still seems to have variation. The highest tps with patch (170363) is close to HEAD (175718). This could be run-to-run variation, but I think it indicates that there are more places where we might need such padding or may be optimize them, so that they are aligned.

I can do some more experiments on similar lines, but I am out on vacation and might not be able to access the m/c for 3-4 days.

Could share details of hardware you used? I could try to find something similar to reproduce this.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com

The Russian Postgres Company

pgsql-hackers by date:

From: Robert Haas
Date: 13 April 2016, 10:57:21
Subject: Re: Detrimental performance impact of ringbuffers on performance

From: Robert Haas
Date: 13 April 2016, 11:57:16
Subject: Re: Missing PG_INT32_MIN in numutils.c

Re: Move PinBuffer and UnpinBuffer to atomics - Mailing list pgsql-hackers

Previous

Next

Client_Count/Patch_ver	8	64	128
HEAD	59708	329560	173655
PATCH	61480	379798	157580