Re: Reduce ProcArrayLock contention - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Reduce ProcArrayLock contention
Date
Msg-id 20150821183112.GG8552@awork2.anarazel.de
Whole thread Raw
In response to Re: Reduce ProcArrayLock contention  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Reduce ProcArrayLock contention  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 2015-08-21 14:08:36 -0400, Robert Haas wrote:
> On Wed, Aug 19, 2015 at 11:39 AM, Andres Freund <andres@anarazel.de> wrote:
> > There's a bunch of issues with those two blocks afaics:
> >
> > 1) The first block (in one backend) might read INVALID_PGPROCNO before
> >    ever locking the semaphore if a second backend quickly enough writes
> >    INVALID_PGPROCNO. That way the semaphore will be permanently out of
> >    "balance".
> 
> Yes, this is a clear bug.  I think the fix is to do one unconditional
> PGSemaphoreLock(&proc->sem) just prior to the loop.

I don't think that fixes it it completely, see the following.

> > 2) There's no memory barriers around dealing with nextClearXidElem in
> >    the first block. Afaics there needs to be a read barrier before
> >    returning, otherwise it's e.g. not guaranteed that the woken up
> >    backend sees its own xid set to InvalidTransactionId.
> 
> I can't believe it actually works like that.  Surely a semaphore
> operation is a full barrier.  Otherwise this could fail:

> P1: a = 0;
> P1: PGSemaphoreLock(&P1);
> P2: a = 1;
> P2: PGSemaphoreUnlock(&P1);
> P1: Assert(a == 1);
> 
> Do you really think that fails anywhere?

No, if it's paired like that, I don't think it's allowed to fail.

But, as the code stands, there's absolutely no guarantee you're not
seeing something like:
P1: a = 0;
P1: b = 0;
P1: PGSemaphoreLock(&P1);
P2: a = 1;
P2: PGSemaphoreUnlock(&P1); -- unrelated, as e.g. earlier by ProcSendSignal
P1: Assert(a == b == 1);
P2: b = 1;
P2: PGSemaphoreUnlock(&P1);

if the pairing is like this there's no guarantees anymore, right? Even
if a and be were set before P1's assert, the thing would be allowed to
fail, because the store to a or b might each be visible since there's no
enforced ordering.


> > 3) If a follower returns before the leader has actually finished woken
> >    that respective backend up we can get into trouble:
...
> So I don't see the problem.

Don't have time (nor spare brain capacity) to answer in detail right
now.

> > I don't think it's a good idea to use the variable name in PROC_HDR and
> > PGPROC, it's confusing.
> 
> It made sense to me, but I don't care that much if you want to change it.

Please.

> > How hard did you try checking whether this causes regressions? This
> > increases the number of atomics in the commit path a fair bit. I doubt
> > it's really bad, but it seems like a good idea to benchmark something
> > like a single full-throttle writer and a large number of readers.
> 
> Hmm, that's an interesting point.  My analysis was that it really only
> increased the atomics in the cases where we otherwise would have gone
> to sleep, and I figured that the extra atomics were a small price to
> pay for not sleeping.

You're probably right that it won't have a big, if any, impact. Seems
easy enough to test though, and it's really the only sane adversarial
scenario I could come up with..

> It's worth point out, though, that your lwlock.c atomics changes have
> the same effect.

To some degree, yes. I tried to benchmark adversarial scenarios rather
extensively because of that... I couldn't make it regress, presumably
because because putting on the waitlist only "hard" contended with other
exclusive lockers, not with the shared lockers which could progress.

> Before, if there were any shared lockers on an LWLock and an
> exclusive-locker came along, the exclusive locker would take and
> release the spinlock and go to sleep.

That often spun (span?) on a spinlock which continously was acquired, so
it was hard to ever get that far...

> The changes we made to the clock sweep are more of the same.

Yea.

> Instead of taking an lwlock, sweeping the clock hand across many
> buffers, and releasing the lwlock, we now perform an atomic operation
> for every buffer.  That's obviously "worse" in terms of the total
> number of atomic operations, and if you construct a case where every
> backend sweeps 10 or 20 buffers yet none of them would have contended
> with each other, it might lose.  But it's actually much better and
> more scalable in the real world.

I think we probably should optimize that bit of code at some point -
right now the bottleneck appear to be elsewhere though, namely all the
buffer header locks which are rather likely to be in no cache at all.

> I think this is a pattern we'll see with atomics over and over again:
> more atomic operations overall in order to reduce the length of time
> for which particular resources are unavailable to other processes.

Yea, agreed.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: More WITH
Next
From: Tom Lane
Date:
Subject: Re: (full) Memory context dump considered harmful