Re: [PATCH] Improve performance of NOTIFY over many databases (issue blocking on AccessExclusiveLock on object 0 of class 1262 of database 0) - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [PATCH] Improve performance of NOTIFY over many databases (issue blocking on AccessExclusiveLock on object 0 of class 1262 of database 0)
Date
Msg-id 27787.1563045173@sss.pgh.pa.us
Whole thread Raw
In response to Re: [PATCH] Improve performance of NOTIFY over many databases (issueblocking on AccessExclusiveLock on object 0 of class 1262 of database 0)  (Martijn van Oosterhout <kleptog@gmail.com>)
Responses Re: [PATCH] Improve performance of NOTIFY over many databases (issueblocking on AccessExclusiveLock on object 0 of class 1262 of database 0)
List pgsql-hackers
Martijn van Oosterhout <kleptog@gmail.com> writes:
> Please find attached updated versions of the patches, I've now tested
> them. Also attached is a reproduction script to verify that they
> actually work.

I looked through these (a bit cursorily).

I'm generally on board with the idea of 0001, but not with the patch
details.  As coded, asyncQueueAdvanceTail is supposing that it can
examine the shared QUEUE_HEAD and QUEUE_TAIL pointers without any
lock whatsoever.  That's probably unsafe, and if it is safe for some
reason, you haven't made the argument why.  Moreover, it seems
unnecessary to make any such assumption.  Why not put back the
advanceTail tests you removed, but adjust them so that advanceTail
isn't set true unless QUEUE_HEAD and QUEUE_TAIL point to different
pages?  (Note that in the existing coding, those tests are made
while holding an appropriate lock, so it's safe to look at those
pointers there.)

It might be a good idea to make a macro encapsulating this new,
more complicated rule for setting advanceTail, instead of relying
on keeping the various call sites in sync.

More attention to comments is also needed.  For instance, you've
made a lie out of the documentation of the tail pointer:

    QueuePosition tail;         /* the global tail is equivalent to the pos of
                                 * the "slowest" backend */

It needs to say something like "is <= the pos of the slowest backend",
instead.  I think the explanation of why this algorithm is good could
use more effort, too.

Comments for 0002 are about the same: for no explained reason, and
certainly no savings, you've put the notify_all test in an unsafe
place rather than a safe one (viz, two lines down, *after* taking
the relevant lock).  And 0002 needs more commentary about why
its optimization is safe and useful, too.  In particular it's
not obvious why QUEUE_HEAD being on a different page from QUEUE_TAIL
has anything to do with whether we should wake up other backends.

I'm not very persuaded by 0003, mainly because it seems likely to
me that 0001 and 0002 will greatly reduce the possibility that
the early-exit can happen.  So it seems like it's adding cycles
(in a spot where we hold exclusive lock) without a good chance of
saving any cycles.

Taking a step back in hopes of seeing the bigger picture ...
as you already noted, these changes don't really fix the "thundering
herd of wakeups" problem, they just arrange for it to happen
once per SLRU page rather than once per message.  I wonder if we
could improve matters by stealing an idea from the sinval code:
when we're trying to cause advance of the global QUEUE_TAIL, waken
only the slowest backend, and have it waken the next-slowest after
it's done.  In sinval there are some additional provisions to prevent
a nonresponsive backend from delaying matters for other backends,
but I think maybe we don't need that here.  async.c doesn't have
anything equivalent to sinval reset, so there's no chance of
overruling a slow backend's failure to advance its pos pointer,
so there's not much reason not to just wait till it does do so.

A related idea is to awaken only one backend at a time when we
send a new message (i.e., advance QUEUE_HEAD) but I think that
would likely be bad.  The hazard with the chained-wakeups method
is that a slow backend blocks wakeup of everything else.  We don't
care about that hugely for QUEUE_TAIL advance, because we're just
hoping to free some SLRU space.  But for QUEUE_HEAD advance it'd
mean failing to deliver notifies in a timely way, which we do care
about.  (Also, if I remember correctly, the processing on that side
only requires shared lock so it's less of a problem if many backends
do it at once.)

            regards, tom lane



pgsql-hackers by date:

Previous
From: Joe Conway
Date:
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Next
From: Tomas Vondra
Date:
Subject: Re: [sqlsmith] Crash in mcv_get_match_bitmap