Skylake-S warning - Mailing list pgsql-hackers

From Daniel Wood
Subject Skylake-S warning
Date
Msg-id 802677091.158786.1538602180341@connect.xfinity.com
Whole thread Raw
Responses Re: Skylake-S warning  (Andres Freund <andres@anarazel.de>)
Re: Skylake-S warning  (Adrien Nayrat <adrien.nayrat@anayrat.info>)
List pgsql-hackers

If running benchmarks or you are a customer which is currently impacted by GetSnapshotData() on high end multisocket systems be wary of Skylake-S.


Performance differences of nearly 2X can be seen on select only pgbench due to nothing else but unlucky choices for max_connections.  Scale 1000, 192 local clients on a 2 socket 48 core Skylake-S(Xeon Platinum 8175M @ 2.50-GHz) system.  pgbench -S


Results from 5 runs varying max_connections from 400 to 405:


max

conn     TPS

400     677639

401    1146776

402    1122140

403     765664

404     671455

405    1190277

...


perf top shows about 21% GetSnapshotData() with the good numbers and 48% with the bad numbers.


This problem is not seen on a 2 socket 32 core Haswell system.  Being a one man show I lack some of the diagnostic tools to drill down further.  My suspicion is that the fact that Intel has lowered the L2 associativity from 8(Haswell) to 4(Skylake-S) may be the cause.  The other possibility is that at higher core counts the shared 16-way inclusive associative L3 cache becomes insufficient.  Perhaps that is why Intel has moved to an exclusive L3 cache on Skylake-SP.


If this is indeed just disadvantageous placement of structures/arrays in memory then you might also find that after upgrading a previous good choice for max_connections becomes a bad choice if things move around.


NOTE: int pgprocno = pgprocnos[index];

is where the big increase in time occurs in GetSnapshotData()

This is largely read-only, once all connections are established, and easily fits in the L1, and is not next to anything else causing invalidations.


NOTE2: It is unclear why PG needs to support over 64K sessions.  At about 10MB per backend(at the low end) the empty backends alone would consume 640GB's of memory!  Changing pgprocnos from int to short gives me the following results.


max

conn     TPS

400     780119

401    1129286

402    1263093

403     887021

404     679891

405    1218118


While this change is significant on large Skylake systems it is likely just a trivial improvement on other systems or workloads.

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: executor relation handling
Next
From: Andres Freund
Date:
Subject: DROP DATABASE doesn't force other backends to close FDs