Thread: LISTEN/NOTIFY and notification timing guarantees

LISTEN/NOTIFY and notification timing guarantees

From

Tom Lane

Date:

14 February 2010, 22:31:48

The proposed new implementation of listen/notify works by shoving all
of a transaction's outgoing notifies into the global queue during
pre-commit, then sending PROCSIG_NOTIFY_INTERRUPT to listening backends
post-commit.  When a listening backend scans the queue, if it hits a
message from a transaction that hasn't yet committed nor aborted, it
abandons queue scanning, expecting to resume scanning when it gets
another PROCSIG_NOTIFY_INTERRUPT.  This means that a transaction that is
still hanging fire on commit can block receipt of notifies from
already-committed transactions, if they queued after it did.  While the
old implementation never made any hard guarantees about the time
interval between commit and receipt of notify, it still seems to me that
there are some potential surprises here, and I don't recall if they were
all analyzed in the previous discussions.  So bear with me a second:

1. In the previous code, a transaction "hanging fire on commit" (ie,
with pg_listener changes made, but not committed) would be holding
exclusive lock on pg_listener.  So it would block things even worse
than now, as we couldn't queue new items either.  An uncommitted queue
entry seems to behave about the same as that lock would, in that it
prevents all listeners from seeing any new messages.  AFAICS, therefore,
this isn't objectionable in itself.

2. Since the pre-commit code releases AsyncQueueLock between pages,
it is possible for the messages of different transactions to get
interleaved in the queue, which not only means that they'd be delivered
interleaved but also that it's possible for a listener to deliver some
notifications of a transaction, and only later (perhaps many
transactions later) deliver the rest.  The existing code can also
deliver notifications of different transactions interleaved, but AFAICS
it can never deliver some notifications of one transaction and then
deliver more of them in a different batch.  By the time any listener
gets to scan pg_listener, a sending transaction is either committed or
not, it cannot commit partway through a scan (because of the locking
done on pg_listener).

3. It is possible for a backend's own self-notifies to not be delivered
immediately after commit, if they are queued behind some other
uncommitted transaction's messages.  That wasn't possible before either.

I'm not sure how probable it is that applications might be coded in a
way that relies on the properties lost according to point #2 or #3.
It seems rather scary though, particularly because if there were such a
dependency, it would be easy to never see the misbehavior during testing.

We could fix #2 by not releasing AsyncQueueLock between pages when
queuing messages.  This has no obvious downsides as far as I can see;
if anything it ought to save some cycles and contention.  We could fix
#3 by re-instituting the special code path that previously existed for
self-notifies, ie send them to the client directly from AtCommit_Notify
and ignore self-notifies coming back from the queue.  This would mean
that a backend might see its own self-notifies in a different order
relative to other backends' messages than other backends do --- but that
was the case in the old coding as well.  I think preserving the
property that self-notifies are delivered immediately upon commit might
be more important than that.

Comments?
        regards, tom lane

Re: LISTEN/NOTIFY and notification timing guarantees

From

Joachim Wieland

Date:

15 February 2010, 07:41:45

On Mon, Feb 15, 2010 at 3:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm not sure how probable it is that applications might be coded in a
> way that relies on the properties lost according to point #2 or #3.

Your observations are all correct as far as I can tell.

One question regarding #2: Is a client application able to tell
whether or not it has received all notifications from one batch? i.e.
does PQnotifies() return NULL only when the backend has sent over the
complete batch of notifications or could it also return NULL while a
batch is still being transmitted but the client-side buffer just
happens to be empty?

> We could fix #2 by not releasing AsyncQueueLock between pages when
> queuing messages.  This has no obvious downsides as far as I can see;
> if anything it ought to save some cycles and contention.

Currently transactions with a small number of notifications can
deliver their notifications and then proceed with their commit while
transactions with many notifications need to stay there longer, so the
current behavior is fair in this respect. Changing the locking
strategy makes the small volume transactions wait for the bigger ones.
Also currently readers can already start reading while writers are
still writing (until they hit the first uncommitted transaction of
their database).

> I think preserving the
> property that self-notifies are delivered immediately upon commit might
> be more important than that.

Fine with me, sounds reasonable  :-)

Joachim

Re: LISTEN/NOTIFY and notification timing guarantees

From

Tom Lane

Date:

15 February 2010, 13:50:01

Joachim Wieland <joe@mcknight.de> writes:
> One question regarding #2: Is a client application able to tell
> whether or not it has received all notifications from one batch? i.e.
> does PQnotifies() return NULL only when the backend has sent over the
> complete batch of notifications or could it also return NULL while a
> batch is still being transmitted but the client-side buffer just
> happens to be empty?

That's true, it's difficult for the client to be sure whether it's
gotten all the available notifications.  It could wait a little bit
to see if more arrive but there's no sure upper bound for how long
is enough.  If you really need it, though, you could send a query
(perhaps just a dummy empty-string query).  In the old implementation,
the query response would mark a point of guaranteed consistency in the
notification responses: you would have gotten all or none of the
messages from any particular sending transaction, and furthermore
there could not be any missing messages from transactions that committed
before one that you saw a message from.

The latter property is probably the bigger issue really, and I'm afraid
that even with contiguous queuing we'd not be able to guarantee it, so
maybe we have a problem even with my proposed #2 fix.  Maybe we should
go back to the existing scheme whereby a writer takes a lock it holds
through commit, so that entries in the queue are guaranteed to be in
commit order.  It wouldn't lock out readers just other writers.
        regards, tom lane

Re: LISTEN/NOTIFY and notification timing guarantees

From

Tom Lane

Date:

16 February 2010, 01:20:34

I wrote:
> ...
> 3. It is possible for a backend's own self-notifies to not be delivered
> immediately after commit, if they are queued behind some other
> uncommitted transaction's messages.  That wasn't possible before either.
> ...  We could fix
> #3 by re-instituting the special code path that previously existed for
> self-notifies, ie send them to the client directly from AtCommit_Notify
> and ignore self-notifies coming back from the queue.  This would mean
> that a backend might see its own self-notifies in a different order
> relative to other backends' messages than other backends do --- but that
> was the case in the old coding as well.  I think preserving the
> property that self-notifies are delivered immediately upon commit might
> be more important than that.

I modified the patch to do that, but after awhile realized that there
are more worms in this can than I'd thought.  What I had done was to add
the NotifyMyFrontEnd() calls to the post-commit cleanup function for
async.c.  However, that is a horribly bad place to put it because of the
non-negligible probability of a failure.  An encoding conversion
failure, for example, becomes a "PANIC:  cannot abort transaction NNN,
it was already committed".

The reason we have not seen any such behavior in the field is that
in the historical coding, self-notifies are actually sent *pre commit*.
So if they do happen to fail you get a transaction rollback and no
backend crash.  Of course, if some notifies went out before we got to
the one that failed, the app might have taken action based on a notify
for some event that now didn't happen; so that's not exactly ideal
either.

So right now I'm not sure what to do.  We could adopt the historical
policy of sending self-notifies pre-commit, but that doesn't seem
tremendously appetizing from the standpoint of transactional
integrity.  Or we could do it the way Joachim's submitted patch does,
but I'm quite sure somebody will complain about the delay involved.
Another possibility is to force a ProcessIncomingNotifies scan to occur
before we reach ReadyForQuery if we sent any notifies in the
just-finished transaction --- but that won't help if there are
uncommitted messages in front of ours.  So it would only really improve
matters if we forced queuing order to match commit order, as I was
speculating about earlier.

Thoughts?
        regards, tom lane

Re: LISTEN/NOTIFY and notification timing guarantees

From

Joachim Wieland

Date:

16 February 2010, 04:28:27

On Tue, Feb 16, 2010 at 6:20 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Another possibility is to force a ProcessIncomingNotifies scan to occur
> before we reach ReadyForQuery if we sent any notifies in the
> just-finished transaction --- but that won't help if there are
> uncommitted messages in front of ours.

What about dealing with self-notifies in memory? i.e. copy them into a
subcontext of TopMemoryContext in precommit and commit as usual. Then
as a first step in ProcessIncomingNotifies() deliver whatever is in
memory and then delete the context. While reading the queue, ignore
all self-notifies there. If we abort for some reason, delete the
context in AtAbort_Notify(). Would that work?

Joachim

Re: LISTEN/NOTIFY and notification timing guarantees

From

"Kevin Grittner"

Date:

16 February 2010, 08:31:31

Tom Lane  wrote:
> We could adopt the historical policy of sending self-notifies
> pre-commit, but that doesn't seem tremendously appetizing from the
> standpoint of transactional integrity.
But one traditional aspect of transactional integrity is that a
transaction always sees *its own* uncommitted work.  Wouldn't the
historical policy of PostgreSQL self-notifies be consistent with
that?
-Kevin

Re: LISTEN/NOTIFY and notification timing guarantees

From

Joachim Wieland

Date:

16 February 2010, 09:04:16

On Tue, Feb 16, 2010 at 1:31 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Tom Lane  wrote:
>> We could adopt the historical policy of sending self-notifies
>> pre-commit, but that doesn't seem tremendously appetizing from the
>> standpoint of transactional integrity.
>
> But one traditional aspect of transactional integrity is that a
> transaction always sees *its own* uncommitted work.

True but notifications aren't sent until the transaction commits
anyway. At the time when an application receives its self-notifies, it
has already committed the transaction so there is no uncommitted work
anymore.

> Wouldn't the
> historical policy of PostgreSQL self-notifies be consistent with
> that?

No. The policy is also to not see the committed work if for some
reason the transaction had to roll back during commit. In this case
we'd also expect getting no notification from this transaction at all
and this is what is violated here.

Joachim

Re: LISTEN/NOTIFY and notification timing guarantees

From

Tom Lane

Date:

16 February 2010, 11:38:55

Joachim Wieland <joe@mcknight.de> writes:
> On Tue, Feb 16, 2010 at 1:31 PM, Kevin Grittner
> <Kevin.Grittner@wicourts.gov> wrote:
>> Tom Lane �wrote:
>>> We could adopt the historical policy of sending self-notifies
>>> pre-commit, but that doesn't seem tremendously appetizing from the
>>> standpoint of transactional integrity.
>> 
>> But one traditional aspect of transactional integrity is that a
>> transaction always sees *its own* uncommitted work.

> True but notifications aren't sent until the transaction commits
> anyway. At the time when an application receives its self-notifies, it
> has already committed the transaction so there is no uncommitted work
> anymore.

Right.  The application's view is that it sends COMMIT and gets any
self-notifies back as part of the response to that.  What is worrisome
is that the notifies come out just before the actual commit and so it's
still (barely) possible for the transaction to abort.  In which case it
should not have sent the notifies, and indeed did not send them as far
as any other client is concerned.  We really ought to try to make a
similar guarantee for self-notifies.

After sleeping on it I'm fairly convinced that we should approach it
like this:

1. No special data path for self-notifies; we expect to pull them back
out of the queue just like anything else.

2. Add an extra lock to serialize writers to the queue, so that messages
are guaranteed to be added to the queue in commit order.  As long as
notify-sending is nearly the last thing in the pre-commit sequence,
this doesn't seem to me to be a huge concurrency hit (certainly no worse
than the existing implementation) and the improved semantics guarantee
seems worth it.

3. When a transaction has sent notifies, perform an extra
ProcessIncomingNotifies scan after finishing up post-commit work
(so that an error wouldn't result in PANIC) but before we issue
ReadyForQuery to the frontend.  This will mean that what the client
sees is
CommandComplete message for COMMIT (or NOTIFY)NotificationResponse messages, including self-notifiesReadyForQuery

where the notifies are guaranteed to arrive in commit order.
This compares to the historical behavior of
NotificationResponse messages for self-notifiesCommandComplete message for COMMIT (or
NOTIFY)ReadyForQueryNotificationResponsemessages for other transactions

where there's no particular guarantee about ordering of notifies
from different transactions.  At least for users of libpq, postponing
the self-notifies till after CommandComplete won't make any difference,
because libpq reads to the ReadyForQuery message before deciding the
query is done.
        regards, tom lane

Re: LISTEN/NOTIFY and notification timing guarantees

From

Jeff Davis

Date:

16 February 2010, 14:30:27

On Tue, 2010-02-16 at 10:38 -0500, Tom Lane wrote:
> 2. Add an extra lock to serialize writers to the queue, so that messages
> are guaranteed to be added to the queue in commit order.

I assume this is a heavyweight lock, correct?

Regards,Jeff Davis

Re: LISTEN/NOTIFY and notification timing guarantees

From

Merlin Moncure

Date:

16 February 2010, 15:34:40

On Tue, Feb 16, 2010 at 10:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 2. Add an extra lock to serialize writers to the queue, so that messages
> are guaranteed to be added to the queue in commit order.  As long as

fwiw, I think you're definitely on the right track.  IMO, any scenario
where an issued notification ends up being deferred for an indefinite
period of time without alerting the issuer should be avoided if at all
possible.  Just to clarify though, does your proposal block all
notifiers if any uncommitted transaction issued a notify?

merlin

Re: LISTEN/NOTIFY and notification timing guarantees

From

Tom Lane

Date:

16 February 2010, 17:13:19

Jeff Davis <pgsql@j-davis.com> writes:
> On Tue, 2010-02-16 at 10:38 -0500, Tom Lane wrote:
>> 2. Add an extra lock to serialize writers to the queue, so that messages
>> are guaranteed to be added to the queue in commit order.

> I assume this is a heavyweight lock, correct?

Yeah, that seems the easiest way to do it.  I think an LWLock could be
made to work, but releasing it on error might be a bit funky.
        regards, tom lane

Re: LISTEN/NOTIFY and notification timing guarantees

From

Tom Lane

Date:

16 February 2010, 17:15:26

Merlin Moncure <mmoncure@gmail.com> writes:
> On Tue, Feb 16, 2010 at 10:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> 2. Add an extra lock to serialize writers to the queue, so that messages
>> are guaranteed to be added to the queue in commit order. �As long as

> fwiw, I think you're definitely on the right track.  IMO, any scenario
> where an issued notification ends up being deferred for an indefinite
> period of time without alerting the issuer should be avoided if at all
> possible.  Just to clarify though, does your proposal block all
> notifiers if any uncommitted transaction issued a notify?

It will block other notifiers until the transaction releases its locks,
which should happen pretty promptly --- there are no user-accessible
reasons for it to wait.
        regards, tom lane

Re: LISTEN/NOTIFY and notification timing guarantees

From

Chris Browne

Date:

16 February 2010, 19:21:49

tgl@sss.pgh.pa.us (Tom Lane) writes:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> On Tue, Feb 16, 2010 at 10:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> 2. Add an extra lock to serialize writers to the queue, so that messages
>>> are guaranteed to be added to the queue in commit order.  As long as
>
>> fwiw, I think you're definitely on the right track.  IMO, any scenario
>> where an issued notification ends up being deferred for an indefinite
>> period of time without alerting the issuer should be avoided if at all
>> possible.  Just to clarify though, does your proposal block all
>> notifiers if any uncommitted transaction issued a notify?
>
> It will block other notifiers until the transaction releases its locks,
> which should happen pretty promptly --- there are no user-accessible
> reasons for it to wait.

I have heard of reasons to want to be able to have some actions run at
COMMIT time.

You probably recall Jan's proposal of a commit time timestamp.  The
particular implementation may have fallen by the wayside, but the
reasons to want such things do continue to be.  Indeed an "on commit"
trigger hook would be a mighty valuable thing to support things like
(but not restricted to) commit timestamps.

It's conceivable that "clustering issues" might introduce some somewhat
more "user-accessible" hooks that could cost something here.  Certainly
not true today, but plausibly foreseeable...
-- 
select 'cbbrowne' || '@' || 'linuxfinances.info';
http://www3.sympatico.ca/cbbrowne/lsf.html
Beauty is the first test: there is no permanent place in the world for
ugly mathematics.