Re: [PATCH] Improve performance of NOTIFY over many databases (issue blocking on AccessExclusiveLock on object 0 of class 1262 of database 0) - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [PATCH] Improve performance of NOTIFY over many databases (issue blocking on AccessExclusiveLock on object 0 of class 1262 of database 0) |
Date | |
Msg-id | 27787.1563045173@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [PATCH] Improve performance of NOTIFY over many databases (issueblocking on AccessExclusiveLock on object 0 of class 1262 of database 0) (Martijn van Oosterhout <kleptog@gmail.com>) |
Responses |
Re: [PATCH] Improve performance of NOTIFY over many databases (issueblocking on AccessExclusiveLock on object 0 of class 1262 of database 0)
|
List | pgsql-hackers |
Martijn van Oosterhout <kleptog@gmail.com> writes: > Please find attached updated versions of the patches, I've now tested > them. Also attached is a reproduction script to verify that they > actually work. I looked through these (a bit cursorily). I'm generally on board with the idea of 0001, but not with the patch details. As coded, asyncQueueAdvanceTail is supposing that it can examine the shared QUEUE_HEAD and QUEUE_TAIL pointers without any lock whatsoever. That's probably unsafe, and if it is safe for some reason, you haven't made the argument why. Moreover, it seems unnecessary to make any such assumption. Why not put back the advanceTail tests you removed, but adjust them so that advanceTail isn't set true unless QUEUE_HEAD and QUEUE_TAIL point to different pages? (Note that in the existing coding, those tests are made while holding an appropriate lock, so it's safe to look at those pointers there.) It might be a good idea to make a macro encapsulating this new, more complicated rule for setting advanceTail, instead of relying on keeping the various call sites in sync. More attention to comments is also needed. For instance, you've made a lie out of the documentation of the tail pointer: QueuePosition tail; /* the global tail is equivalent to the pos of * the "slowest" backend */ It needs to say something like "is <= the pos of the slowest backend", instead. I think the explanation of why this algorithm is good could use more effort, too. Comments for 0002 are about the same: for no explained reason, and certainly no savings, you've put the notify_all test in an unsafe place rather than a safe one (viz, two lines down, *after* taking the relevant lock). And 0002 needs more commentary about why its optimization is safe and useful, too. In particular it's not obvious why QUEUE_HEAD being on a different page from QUEUE_TAIL has anything to do with whether we should wake up other backends. I'm not very persuaded by 0003, mainly because it seems likely to me that 0001 and 0002 will greatly reduce the possibility that the early-exit can happen. So it seems like it's adding cycles (in a spot where we hold exclusive lock) without a good chance of saving any cycles. Taking a step back in hopes of seeing the bigger picture ... as you already noted, these changes don't really fix the "thundering herd of wakeups" problem, they just arrange for it to happen once per SLRU page rather than once per message. I wonder if we could improve matters by stealing an idea from the sinval code: when we're trying to cause advance of the global QUEUE_TAIL, waken only the slowest backend, and have it waken the next-slowest after it's done. In sinval there are some additional provisions to prevent a nonresponsive backend from delaying matters for other backends, but I think maybe we don't need that here. async.c doesn't have anything equivalent to sinval reset, so there's no chance of overruling a slow backend's failure to advance its pos pointer, so there's not much reason not to just wait till it does do so. A related idea is to awaken only one backend at a time when we send a new message (i.e., advance QUEUE_HEAD) but I think that would likely be bad. The hazard with the chained-wakeups method is that a slow backend blocks wakeup of everything else. We don't care about that hugely for QUEUE_TAIL advance, because we're just hoping to free some SLRU space. But for QUEUE_HEAD advance it'd mean failing to deliver notifies in a timely way, which we do care about. (Also, if I remember correctly, the processing on that side only requires shared lock so it's less of a problem if many backends do it at once.) regards, tom lane
pgsql-hackers by date: