Re: Move PinBuffer and UnpinBuffer to atomics - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: Move PinBuffer and UnpinBuffer to atomics
Date
Msg-id CAPpHfduh9ThT1hgd2LVM95uKdtk6UbpKS2b4cyP1YejstMK6sA@mail.gmail.com
Whole thread Raw
In response to Re: Move PinBuffer and UnpinBuffer to atomics  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Tue, Apr 12, 2016 at 5:39 AM, Andres Freund <andres@anarazel.de> wrote:
On 2016-04-11 14:40:29 -0700, Andres Freund wrote:
> On 2016-04-11 12:17:20 -0700, Andres Freund wrote:
> I did get access to the machine (thanks!). My testing shows that
> performance is sensitive to various parameters influencing memory
> allocation. E.g. twiddling with max_connections changes
> performance. With max_connections=400 and the previous patches applied I
> get ~1220000 tps, with 402 ~1620000 tps.  This sorta confirms that we're
> dealing with an alignment/sharing related issue.
>
> Padding PGXACT to a full cache-line seems to take care of the largest
> part of the performance irregularity. I looked at perf profiles and saw
> that most cache misses stem from there, and that the percentage (not
> absolute amount!) changes between fast/slow settings.
>
> To me it makes intuitive sense why you'd want PGXACTs to be on separate
> cachelines - they're constantly dirtied via SnapshotResetXmin(). Indeed
> making it immediately return propels performance up to 1720000, without
> other changes. Additionally cacheline-padding PGXACT speeds things up to
> 1750000 tps.
>
> But I'm unclear why the magnitude of the effect depends on other
> allocations. With the previously posted patches allPgXact is always
> cacheline-aligned.

I've spent considerable amount experimenting around this. The alignment
of allPgXact does *not* apear to play a significant role; rather it
apears to be the the "distance" between the allPgXact and pgprocno
arrays.

Alexander, could you post dmidecode output, and install numactl &
numastat on the machine? I wonder if the box has cluster-on-die
activated or not.

Dmidecode output is in the attachment.  Numactl & numastat are installed.
 
Do I see correctly that this is a system that could
potentially have 8 sockets, but actually has only four? Because I see
physical id     : 3 in /proc/cpuinfo only going up to three (from zero),
not 7? And there's only 144 processorcs, while each E7-8890 v3 should
have 36 threads.

There are definitely 4 of used sockets.  I'm not sure about potential count though.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Attachment

pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Parser extensions (maybe for 10?)
Next
From: tushar
Date:
Subject: Re: Choosing parallel_degree