Re: Optimize LISTEN/NOTIFY - Mailing list pgsql-hackers
From | Arseniy Mukhin |
---|---|
Subject | Re: Optimize LISTEN/NOTIFY |
Date | |
Msg-id | CAE7r3M+TKWcvBjHZnj-_TVyUyD7G2v-WQDxQGPmu_dMcinWR0Q@mail.gmail.com Whole thread Raw |
In response to | Re: Optimize LISTEN/NOTIFY (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
On Thu, Oct 16, 2025 at 12:39 PM Joel Jacobson <joel@compiler.org> wrote: > > On Wed, Oct 15, 2025, at 16:16, Tom Lane wrote: > > Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> writes: > >> I think "Direct advancement" is a good idea. But the way it's > >> implemented now has a concurrency bug. Listeners store its current > >> position in the local variable 'pos' during the reading in > >> asyncQueueReadAllNotifications() and don't hold NotifyQueueLock. It > >> means that some notifier can directly advance the listener's position > >> while the listener has an old value in the local variable. The same > >> time we use listener positions to find out the limit we can truncate > >> the queue in asyncQueueAdvanceTail(). > > > > Good catch! > > I've implemented the three ideas presented below, attached as .txt files > that are diffs on top of v19, which has these changes since v17: > Thank you for the new version and all implementations! > 0002-optimize_listen_notify-v19.patch: > * Improve wording of top comment per request from Chao Li. > * Add initChannelHash call to top of SignalBackends, > to fix bug reported by Arseniy Mukhin. > > > I think we can perhaps salvage the idea if we invent a separate > > "advisory" queue position field, which tells its backend "hey, > > you could skip as far as here if you want", but is not used for > > purposes of SLRU truncation. > > Above idea is implemented in 0002-optimize_listen_notify-v19-alt1.txt pos = QUEUE_BACKEND_POS(i); /* Direct advancement for idle backends at the old head */ if (pendingNotifies != NULL && QUEUE_POS_EQUAL(pos, queueHeadBeforeWrite)) { QUEUE_BACKEND_ADVISORY_POS(i) = queueHeadAfterWrite; If we have several notifying backends, it looks like only the first one will be able to do direct advancement here. Next notifying backend will fail on QUEUE_POS_EQUAL(pos, queueHeadBeforeWrite) as we don't wake up the listener and pos will be the same as it was for the first notifying backend. It seems that to accumulate direct advancement from several notifying backends we need to compare queueHeadBeforeWrite with advisoryPos here. And we also need to advance advisoryPos to the listener's position after reading if advisoryPos falls behind. Minute of brainstorming I also thought about a workload that probably frequently can be met. Let's say we have sequence of notifications: F F F T F F F T F F F T Here F - notification from the channel we don't care about and T - the opposite. It seems that after the first 'T' notification it will be more difficult for notifying backends to do 'direct advancement' as there will be some lag before the listener reads the notification and advances its position. Not sure if it's a problem, probably it depends on the intensity of notifications. But maybe we can use a bit more sophisticated data structure here? Something like a list of skip ranges. Every entry in the list is the range (pos1, pos2) that the listener can skip during the reading. So instead of advancing advisoryPos every notifying backend should add skip range to the list. Notifying backends can merge neighbour ranges (pos1, pos2) & (pos2, pos3) -> (pos1, pos3). We also can limit the number of entries to 5 for example. Listeners on their side should clear the list before reading and skip all ranges from it. What do you think? Is it overkill? > > > Alternatively, split the queue pos > > into "this is where to read next" and "this is as much as I'm > > definitively done with", where the second field gets advanced at > > the end of asyncQueueReadAllNotifications. Not sure which > > view would be less confusing (in the end I guess they're nearly > > the same thing, differently explained). > > Above idea is implemented in 0002-optimize_listen_notify-v19-alt2.txt > IMHO it's a little bit more confusing than the first option. Two points I noticed: 1) We have a fast path in asyncQueueReadAllNotifications() if (QUEUE_POS_EQUAL(pos, head)) { /* Nothing to do, we have read all notifications already. */ return; } Should we update donePos here? It looks like donePos may never be updated without it. 2) In SignalBackends() /* Signal backends that have fallen too far behind */ lag = asyncQueuePageDiff(QUEUE_POS_PAGE(QUEUE_HEAD), QUEUE_POS_PAGE(pos)); if (lag >= QUEUE_CLEANUP_DELAY) { pid = QUEUE_BACKEND_PID(i); Assert(pid != InvalidPid); QUEUE_BACKEND_WAKEUP_PENDING(i) = true; pids[count] = pid; procnos[count] = i; count++; } Should we use donePos here as it is responsible for queue truncation now? > > A different line of thought could be to get rid of > > asyncQueueReadAllNotifications's optimization of moving the > > queue pos only once, per > > > > * (We could alternatively retake NotifyQueueLock and move the position > > * before handling each individual message, but that seems like too much > > * lock traffic.) > > > > Since we only need shared lock to advance our own queue pos, > > maybe that wouldn't be too awful. Not sure. > > Above idea is implemented in 0002-optimize_listen_notify-v19-alt3.txt > Hmm, it seems we still have the race when in the beginning of asyncQueueReadAllNotifications we read pos into the local variable and release the lock. IIUC to avoid the race without introducing another field here, the listener needs to hold the lock until it updates its position so that the notifying backend cannot change it concurrently. Best regards, Arseniy Mukhin
pgsql-hackers by date: