Thread: LISTEN/NOTIFY and notification timing guarantees
The proposed new implementation of listen/notify works by shoving all of a transaction's outgoing notifies into the global queue during pre-commit, then sending PROCSIG_NOTIFY_INTERRUPT to listening backends post-commit. When a listening backend scans the queue, if it hits a message from a transaction that hasn't yet committed nor aborted, it abandons queue scanning, expecting to resume scanning when it gets another PROCSIG_NOTIFY_INTERRUPT. This means that a transaction that is still hanging fire on commit can block receipt of notifies from already-committed transactions, if they queued after it did. While the old implementation never made any hard guarantees about the time interval between commit and receipt of notify, it still seems to me that there are some potential surprises here, and I don't recall if they were all analyzed in the previous discussions. So bear with me a second: 1. In the previous code, a transaction "hanging fire on commit" (ie, with pg_listener changes made, but not committed) would be holding exclusive lock on pg_listener. So it would block things even worse than now, as we couldn't queue new items either. An uncommitted queue entry seems to behave about the same as that lock would, in that it prevents all listeners from seeing any new messages. AFAICS, therefore, this isn't objectionable in itself. 2. Since the pre-commit code releases AsyncQueueLock between pages, it is possible for the messages of different transactions to get interleaved in the queue, which not only means that they'd be delivered interleaved but also that it's possible for a listener to deliver some notifications of a transaction, and only later (perhaps many transactions later) deliver the rest. The existing code can also deliver notifications of different transactions interleaved, but AFAICS it can never deliver some notifications of one transaction and then deliver more of them in a different batch. By the time any listener gets to scan pg_listener, a sending transaction is either committed or not, it cannot commit partway through a scan (because of the locking done on pg_listener). 3. It is possible for a backend's own self-notifies to not be delivered immediately after commit, if they are queued behind some other uncommitted transaction's messages. That wasn't possible before either. I'm not sure how probable it is that applications might be coded in a way that relies on the properties lost according to point #2 or #3. It seems rather scary though, particularly because if there were such a dependency, it would be easy to never see the misbehavior during testing. We could fix #2 by not releasing AsyncQueueLock between pages when queuing messages. This has no obvious downsides as far as I can see; if anything it ought to save some cycles and contention. We could fix #3 by re-instituting the special code path that previously existed for self-notifies, ie send them to the client directly from AtCommit_Notify and ignore self-notifies coming back from the queue. This would mean that a backend might see its own self-notifies in a different order relative to other backends' messages than other backends do --- but that was the case in the old coding as well. I think preserving the property that self-notifies are delivered immediately upon commit might be more important than that. Comments? regards, tom lane
On Mon, Feb 15, 2010 at 3:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm not sure how probable it is that applications might be coded in a > way that relies on the properties lost according to point #2 or #3. Your observations are all correct as far as I can tell. One question regarding #2: Is a client application able to tell whether or not it has received all notifications from one batch? i.e. does PQnotifies() return NULL only when the backend has sent over the complete batch of notifications or could it also return NULL while a batch is still being transmitted but the client-side buffer just happens to be empty? > We could fix #2 by not releasing AsyncQueueLock between pages when > queuing messages. This has no obvious downsides as far as I can see; > if anything it ought to save some cycles and contention. Currently transactions with a small number of notifications can deliver their notifications and then proceed with their commit while transactions with many notifications need to stay there longer, so the current behavior is fair in this respect. Changing the locking strategy makes the small volume transactions wait for the bigger ones. Also currently readers can already start reading while writers are still writing (until they hit the first uncommitted transaction of their database). > I think preserving the > property that self-notifies are delivered immediately upon commit might > be more important than that. Fine with me, sounds reasonable :-) Joachim
Joachim Wieland <joe@mcknight.de> writes: > One question regarding #2: Is a client application able to tell > whether or not it has received all notifications from one batch? i.e. > does PQnotifies() return NULL only when the backend has sent over the > complete batch of notifications or could it also return NULL while a > batch is still being transmitted but the client-side buffer just > happens to be empty? That's true, it's difficult for the client to be sure whether it's gotten all the available notifications. It could wait a little bit to see if more arrive but there's no sure upper bound for how long is enough. If you really need it, though, you could send a query (perhaps just a dummy empty-string query). In the old implementation, the query response would mark a point of guaranteed consistency in the notification responses: you would have gotten all or none of the messages from any particular sending transaction, and furthermore there could not be any missing messages from transactions that committed before one that you saw a message from. The latter property is probably the bigger issue really, and I'm afraid that even with contiguous queuing we'd not be able to guarantee it, so maybe we have a problem even with my proposed #2 fix. Maybe we should go back to the existing scheme whereby a writer takes a lock it holds through commit, so that entries in the queue are guaranteed to be in commit order. It wouldn't lock out readers just other writers. regards, tom lane
I wrote: > ... > 3. It is possible for a backend's own self-notifies to not be delivered > immediately after commit, if they are queued behind some other > uncommitted transaction's messages. That wasn't possible before either. > ... We could fix > #3 by re-instituting the special code path that previously existed for > self-notifies, ie send them to the client directly from AtCommit_Notify > and ignore self-notifies coming back from the queue. This would mean > that a backend might see its own self-notifies in a different order > relative to other backends' messages than other backends do --- but that > was the case in the old coding as well. I think preserving the > property that self-notifies are delivered immediately upon commit might > be more important than that. I modified the patch to do that, but after awhile realized that there are more worms in this can than I'd thought. What I had done was to add the NotifyMyFrontEnd() calls to the post-commit cleanup function for async.c. However, that is a horribly bad place to put it because of the non-negligible probability of a failure. An encoding conversion failure, for example, becomes a "PANIC: cannot abort transaction NNN, it was already committed". The reason we have not seen any such behavior in the field is that in the historical coding, self-notifies are actually sent *pre commit*. So if they do happen to fail you get a transaction rollback and no backend crash. Of course, if some notifies went out before we got to the one that failed, the app might have taken action based on a notify for some event that now didn't happen; so that's not exactly ideal either. So right now I'm not sure what to do. We could adopt the historical policy of sending self-notifies pre-commit, but that doesn't seem tremendously appetizing from the standpoint of transactional integrity. Or we could do it the way Joachim's submitted patch does, but I'm quite sure somebody will complain about the delay involved. Another possibility is to force a ProcessIncomingNotifies scan to occur before we reach ReadyForQuery if we sent any notifies in the just-finished transaction --- but that won't help if there are uncommitted messages in front of ours. So it would only really improve matters if we forced queuing order to match commit order, as I was speculating about earlier. Thoughts? regards, tom lane
On Tue, Feb 16, 2010 at 6:20 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Another possibility is to force a ProcessIncomingNotifies scan to occur > before we reach ReadyForQuery if we sent any notifies in the > just-finished transaction --- but that won't help if there are > uncommitted messages in front of ours. What about dealing with self-notifies in memory? i.e. copy them into a subcontext of TopMemoryContext in precommit and commit as usual. Then as a first step in ProcessIncomingNotifies() deliver whatever is in memory and then delete the context. While reading the queue, ignore all self-notifies there. If we abort for some reason, delete the context in AtAbort_Notify(). Would that work? Joachim
Tom Lane wrote: > We could adopt the historical policy of sending self-notifies > pre-commit, but that doesn't seem tremendously appetizing from the > standpoint of transactional integrity. But one traditional aspect of transactional integrity is that a transaction always sees *its own* uncommitted work. Wouldn't the historical policy of PostgreSQL self-notifies be consistent with that? -Kevin
On Tue, Feb 16, 2010 at 1:31 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Tom Lane wrote: >> We could adopt the historical policy of sending self-notifies >> pre-commit, but that doesn't seem tremendously appetizing from the >> standpoint of transactional integrity. > > But one traditional aspect of transactional integrity is that a > transaction always sees *its own* uncommitted work. True but notifications aren't sent until the transaction commits anyway. At the time when an application receives its self-notifies, it has already committed the transaction so there is no uncommitted work anymore. > Wouldn't the > historical policy of PostgreSQL self-notifies be consistent with > that? No. The policy is also to not see the committed work if for some reason the transaction had to roll back during commit. In this case we'd also expect getting no notification from this transaction at all and this is what is violated here. Joachim
Joachim Wieland <joe@mcknight.de> writes: > On Tue, Feb 16, 2010 at 1:31 PM, Kevin Grittner > <Kevin.Grittner@wicourts.gov> wrote: >> Tom Lane �wrote: >>> We could adopt the historical policy of sending self-notifies >>> pre-commit, but that doesn't seem tremendously appetizing from the >>> standpoint of transactional integrity. >> >> But one traditional aspect of transactional integrity is that a >> transaction always sees *its own* uncommitted work. > True but notifications aren't sent until the transaction commits > anyway. At the time when an application receives its self-notifies, it > has already committed the transaction so there is no uncommitted work > anymore. Right. The application's view is that it sends COMMIT and gets any self-notifies back as part of the response to that. What is worrisome is that the notifies come out just before the actual commit and so it's still (barely) possible for the transaction to abort. In which case it should not have sent the notifies, and indeed did not send them as far as any other client is concerned. We really ought to try to make a similar guarantee for self-notifies. After sleeping on it I'm fairly convinced that we should approach it like this: 1. No special data path for self-notifies; we expect to pull them back out of the queue just like anything else. 2. Add an extra lock to serialize writers to the queue, so that messages are guaranteed to be added to the queue in commit order. As long as notify-sending is nearly the last thing in the pre-commit sequence, this doesn't seem to me to be a huge concurrency hit (certainly no worse than the existing implementation) and the improved semantics guarantee seems worth it. 3. When a transaction has sent notifies, perform an extra ProcessIncomingNotifies scan after finishing up post-commit work (so that an error wouldn't result in PANIC) but before we issue ReadyForQuery to the frontend. This will mean that what the client sees is CommandComplete message for COMMIT (or NOTIFY)NotificationResponse messages, including self-notifiesReadyForQuery where the notifies are guaranteed to arrive in commit order. This compares to the historical behavior of NotificationResponse messages for self-notifiesCommandComplete message for COMMIT (or NOTIFY)ReadyForQueryNotificationResponsemessages for other transactions where there's no particular guarantee about ordering of notifies from different transactions. At least for users of libpq, postponing the self-notifies till after CommandComplete won't make any difference, because libpq reads to the ReadyForQuery message before deciding the query is done. regards, tom lane
On Tue, 2010-02-16 at 10:38 -0500, Tom Lane wrote: > 2. Add an extra lock to serialize writers to the queue, so that messages > are guaranteed to be added to the queue in commit order. I assume this is a heavyweight lock, correct? Regards,Jeff Davis
On Tue, Feb 16, 2010 at 10:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > 2. Add an extra lock to serialize writers to the queue, so that messages > are guaranteed to be added to the queue in commit order. As long as fwiw, I think you're definitely on the right track. IMO, any scenario where an issued notification ends up being deferred for an indefinite period of time without alerting the issuer should be avoided if at all possible. Just to clarify though, does your proposal block all notifiers if any uncommitted transaction issued a notify? merlin
Jeff Davis <pgsql@j-davis.com> writes: > On Tue, 2010-02-16 at 10:38 -0500, Tom Lane wrote: >> 2. Add an extra lock to serialize writers to the queue, so that messages >> are guaranteed to be added to the queue in commit order. > I assume this is a heavyweight lock, correct? Yeah, that seems the easiest way to do it. I think an LWLock could be made to work, but releasing it on error might be a bit funky. regards, tom lane
Merlin Moncure <mmoncure@gmail.com> writes: > On Tue, Feb 16, 2010 at 10:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> 2. Add an extra lock to serialize writers to the queue, so that messages >> are guaranteed to be added to the queue in commit order. �As long as > fwiw, I think you're definitely on the right track. IMO, any scenario > where an issued notification ends up being deferred for an indefinite > period of time without alerting the issuer should be avoided if at all > possible. Just to clarify though, does your proposal block all > notifiers if any uncommitted transaction issued a notify? It will block other notifiers until the transaction releases its locks, which should happen pretty promptly --- there are no user-accessible reasons for it to wait. regards, tom lane
tgl@sss.pgh.pa.us (Tom Lane) writes: > Merlin Moncure <mmoncure@gmail.com> writes: >> On Tue, Feb 16, 2010 at 10:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> 2. Add an extra lock to serialize writers to the queue, so that messages >>> are guaranteed to be added to the queue in commit order. As long as > >> fwiw, I think you're definitely on the right track. IMO, any scenario >> where an issued notification ends up being deferred for an indefinite >> period of time without alerting the issuer should be avoided if at all >> possible. Just to clarify though, does your proposal block all >> notifiers if any uncommitted transaction issued a notify? > > It will block other notifiers until the transaction releases its locks, > which should happen pretty promptly --- there are no user-accessible > reasons for it to wait. I have heard of reasons to want to be able to have some actions run at COMMIT time. You probably recall Jan's proposal of a commit time timestamp. The particular implementation may have fallen by the wayside, but the reasons to want such things do continue to be. Indeed an "on commit" trigger hook would be a mighty valuable thing to support things like (but not restricted to) commit timestamps. It's conceivable that "clustering issues" might introduce some somewhat more "user-accessible" hooks that could cost something here. Certainly not true today, but plausibly foreseeable... -- select 'cbbrowne' || '@' || 'linuxfinances.info'; http://www3.sympatico.ca/cbbrowne/lsf.html Beauty is the first test: there is no permanent place in the world for ugly mathematics.