Re: Possible performance regression in version 10.1 with pgbenchread-write tests. - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Possible performance regression in version 10.1 with pgbenchread-write tests.
Date
Msg-id 20180720205541.pdh5lpwrhzxfsg3z@alap3.anarazel.de
Whole thread Raw
In response to Re: Possible performance regression in version 10.1 with pgbench read-write tests.  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Possible performance regression in version 10.1 with pgbench read-write tests.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi,

On 2018-07-20 16:43:33 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2018-07-20 15:35:39 -0400, Tom Lane wrote:
> >> In any case, I strongly resist making performance-based changes on
> >> the basis of one test on one kernel and one hardware platform.
> 
> > Sure, it'd be good to do more of that. But from a theoretical POV it's
> > quite logical that posix semas sharing cachelines is bad for
> > performance, if there's any contention. When backed by futexes -
> > i.e. all non ancient linux machines - the hot path just does a cmpxchg
> > of the *userspace* data (I've copied the relevant code below).
> 
> Here's the thing: the hot path is of little or no interest, because
> if we are in the sema code at all, we are expecting to block.

Note that we're also using semas for ProcArrayGroupClearXid(), which is
pretty commonly hot for pgbench style workloads, and where the expected
wait times are very short.


> It's possible that the bigger picture here is that the kernel boys
> optimized for the "uncontended" path to the point where they broke
> performance of the blocking path.  It's hard to see how they could
> have broke it to the point of being slower than the SysV sema API,
> though.

I don't see how this is a likely proposition, given that adding padding
to the *userspace* portion of futexes increased the performance quite
significantly.


> On my RHEL6 machine, with unmodified HEAD and 8 sessions (since I've
> only got 8 cores) but other parameters matching Mithun's example,
> I just got

It's *really* common to have more actual clients than cpus for oltp
workloads, so I don't think it's insane to test with more clients.

Greetings,

Andres Freund


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Possible performance regression in version 10.1 with pgbench read-write tests.
Next
From: Daniel Gustafsson
Date:
Subject: Re: pread() and pwrite()