Thread: heads up: Fix for intel hardware bug will lead to performanceregressions
Hi, Upcoming versions of the linux kernel (and apparently also windows and others), will include new feature that apparently has been implemented with haste to work around an intel hardware bug. https://lwn.net/SubscriberLink/741878/eaff7b24627c41a2/ The fix, split userland / kernel pagetables, is going to be merged in the next version of the linux kernel and is being backported to older point releases. The backports of a complex invasive new feature signals that this concerns a significant issue. There's plenty speculation about details about what exactly the vulnerability is. Don't want to go into that here. The fix will unfortunately cause performance regressions. Depending on the hardware version and kernel version (will not be backported for every version) hardware features (PCID / ASID) will be used to reduce the impact. pti is the workaroud, page table isolation, which can be enabled/disabled via boot parameters. nopcid disables the use of the hardware feature that reduces the impact of workaround. PCID support readonly pgbench (tpch-like), 16 clients, i7-6820HQ CPU (skylake): pti=off: tps = 236629.778328 pti=on: tps = 220791.228297 (~0.93x) pti=on, nopcid: tps = 198959.801459 (~0.84x) To get closer to the worst case, I've also measured: pgbench SELECT 1, 16 clients, i7-6820HQ CPU (skylake): pti=off: tps = 420490.162391 pti=on: tps = 350746.065039 (~0.83x) pti=on, nopcid: tps = 324269.903152 (~0.77x) Note that real-world scenarios probably will see somewhat smaller impact, as this was measured over a loopback unix sockets which'll have smaller overhead itself than proper TCP sockets + actual network. The rumor mill has it that details about the vulnerability will be un-embargoed in the next few days. Greetings, Andres Freund
Re: heads up: Fix for intel hardware bug will lead to performance regressions
From
Robert Haas
Date:
On Tue, Jan 2, 2018 at 5:23 PM, Andres Freund <andres@anarazel.de> wrote: > To get closer to the worst case, I've also measured: > > pgbench SELECT 1, 16 clients, i7-6820HQ CPU (skylake): > > pti=off: > tps = 420490.162391 > > pti=on: > tps = 350746.065039 (~0.83x) > > pti=on, nopcid: > tps = 324269.903152 (~0.77x) > > > Note that real-world scenarios probably will see somewhat smaller > impact, as this was measured over a loopback unix sockets which'll have > smaller overhead itself than proper TCP sockets + actual network. What about scenarios with longer-running queries? Is it feasible to think about reducing the number of system calls we issue in cases that weren't previously worth optimizing? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: heads up: Fix for intel hardware bug will lead to performance regressions
From
Thomas Munro
Date:
On Fri, Jan 5, 2018 at 6:28 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Jan 2, 2018 at 5:23 PM, Andres Freund <andres@anarazel.de> wrote: >> Note that real-world scenarios probably will see somewhat smaller >> impact, as this was measured over a loopback unix sockets which'll have >> smaller overhead itself than proper TCP sockets + actual network. > > What about scenarios with longer-running queries? > > Is it feasible to think about reducing the number of system calls we > issue in cases that weren't previously worth optimizing? Maybe the places where syscall rate is controlled by arbitrary buffer sizes? Examples: 8kB BufFile buffers and 128kB replication stream buffers. Just an idea, not sure if it's worth looking into; maybe we already spend enough time filling those buffers that a 50% syscall markup won't hurt. -- Thomas Munro http://www.enterprisedb.com
Re: heads up: Fix for intel hardware bug will lead to performance regressions
From
Thomas Munro
Date:
On Mon, Jan 8, 2018 at 2:38 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Fri, Jan 5, 2018 at 6:28 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Tue, Jan 2, 2018 at 5:23 PM, Andres Freund <andres@anarazel.de> wrote: >>> Note that real-world scenarios probably will see somewhat smaller >>> impact, as this was measured over a loopback unix sockets which'll have >>> smaller overhead itself than proper TCP sockets + actual network. >> >> What about scenarios with longer-running queries? >> >> Is it feasible to think about reducing the number of system calls we >> issue in cases that weren't previously worth optimizing? > > Maybe the places where syscall rate is controlled by arbitrary buffer > sizes? Examples: 8kB BufFile buffers and 128kB replication stream > buffers. Just an idea, not sure if it's worth looking into; maybe we > already spend enough time filling those buffers that a 50% syscall > markup won't hurt. Also pgarch.c, syncrep.c, walsender.c and walreceiver.c use PostmasterIsAlive() every time through their loops[1] generating extra syscalls, one instance of which has caused complaints before[1] on a system where the syscall was expensive (arguably because that kernel needs some work but still, it's an example of the thing you asked about). [1] https://www.postgresql.org/message-id/flat/20160915135755.GC19008%40genua.de -- Thomas Munro http://www.enterprisedb.com
Re: heads up: Fix for intel hardware bug will lead to performance regressions
From
Michael Paquier
Date:
On Mon, Jan 8, 2018 at 1:32 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > Also pgarch.c, syncrep.c, walsender.c and walreceiver.c use > PostmasterIsAlive() every time through their loops[1] generating extra > syscalls, one instance of which has caused complaints before[1] on a > system where the syscall was expensive (arguably because that kernel > needs some work but still, it's an example of the thing you asked > about). > > [1] https://www.postgresql.org/message-id/flat/20160915135755.GC19008%40genua.de Or we could replace calls to PostmasterIsAlive() by checks on WL_POSTMASTER_DEATH? At least for the WAL sender portion it looks doable. -- Michael
Re: heads up: Fix for intel hardware bug will lead to performanceregressions
From
Andres Freund
Date:
On 2018-01-08 14:38:20 +1300, Thomas Munro wrote: > Just an idea, not sure if it's worth looking into; maybe we already > spend enough time filling those buffers that a 50% syscall markup > won't hurt. Yea, I suspect that won't make a huge difference - copying an 8kb buffer is typically a lot more than the overhead. The big problem for the demonstrated slowness is really that we send a lot of tiny packets back and forth and wait for them, and that's obviously going to be performance sensitive to syscall speed. Pipelining helps a lot, but isn't that generally applicable... TBH, I don't really see that much we can do from our side for readonly OLTP with prepared statements. Greetings, Andres Freund