Re: Listen / Notify - what to do when the queue is full - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Listen / Notify - what to do when the queue is full
Date
Msg-id 1263884925.15228.39.camel@jdavis
Whole thread Raw
In response to Re: Listen / Notify - what to do when the queue is full  (Joachim Wieland <joe@mcknight.de>)
Responses Re: Listen / Notify - what to do when the queue is full
Re: Listen / Notify - what to do when the queue is full
List pgsql-hackers
Initial comments:

* compiler warnings

ipci.c: In function ‘CreateSharedMemoryAndSemaphores’:
ipci.c:228: warning: implicit declaration of function ‘AsyncShmemInit’

* 2PC

Adds complexity, and I didn't see any clear, easy solution after
reading the thread. I don't see this as a showstopper, so I'd leave
this until later.

* Hot Standby

It would be nice to have NOTIFY work with HS, but I don't think that's
going to happen this cycle. I don't think this is a showstopper,
either.

* ASCII?

I tend to think that encoding should be handled properly here, or we
should use BYTEA. My reasoning is that TEXT is a datatype, whereas ASCII
is not, and I don't see a reason to invent it now. We might as well use
real TEXT (with encoding support) or use BYTEA which works fine for
ASCII anyway.

* send_notify()

Someone suggested a function like this, which sounds useful to me. Not
a showstopper, but if it's not too difficult it might be worth
including. (Incidentally, the function signature should match the type
of the payload, which is unclear if we're going to use ASCII.)

* AsyncCommitOrderLock

This is a big red flag to me. The name by itself is scary. The fact
that it is held across a series of fairly interesting operations is
even scarier.

As you say in the comments, it's acquired before recording the
transaction commit in the clog, and released afterward. What you
actually have is something like:
   AtCommit_NotifyBeforeCommit();
   HOLD_INTERRUPTS();
   s->state = TRANS_COMMIT;
   latestXid = RecordTransactionCommit(false);
   TRACE_POSTGRESQL_TRANSACTION_COMMIT(MyProc->lxid);
   ProcArrayEndTransaction(MyProc, latestXid);
   CallXactCallbacks(XACT_EVENT_COMMIT);
   ResourceOwnerRelease(TopTransactionResourceOwner,                        RESOURCE_RELEASE_BEFORE_LOCKS,
         true, true);
 
   AtEOXact_Buffers(true);
   AtEOXact_RelationCache(true);
   AtEarlyCommit_Snapshot();
   AtEOXact_Inval(true);
   smgrDoPendingDeletes(true);
   AtEOXact_MultiXact();
   AtCommit_NotifyAfterCommit();

That is a lot of stuff happening between the acquisition and
release. There are two things particularly scary about this (to me):
* holding an LWLock while performing I/O (in smgrDoPendingDeletes())
* holding an LWLock while acquiring another LWLock (e.g.  ProcArrayEndTransaction())

An LWLock-based deadlock is a hard deadlock -- not detected by the
deadlock detector, and there's not much you can do even if it were --
right in the transaction-completing code. That means that the whole
system would be locked up. I'm not sure that such a deadlock condition
exists, but it seems likely (if not, it's only because other areas of
the code avoided this practice), and hard to prove that it's safe. And
it's probably bad for performance to increase the length of time
transactions are serialized, even if only for cases that involve
LISTEN/NOTIFY.

I believe this needs a re-think. What is the real purpose for
AsyncCommitOrderLock, and can we acheive that another way? It seems
that you're worried about a transaction that issues a LISTEN and
committing not getting a notification from a NOTIFYing transaction
that commits concurrently (and slightly after the LISTEN). But
SignalBackends() is called after transaction commit, and should signal
all backends who committed a LISTEN before that time, right?

* The transaction IDs are used because Send_Notify() is called before
the AsyncCommitOrderLock acquire, and so the backend could potentially
be reading uncommitted notifications that are "about" to be committed
(or aborted). Then, the queue is not read further until that transaction
completes. That's not really commented effectively, and I suspect the
process could be simpler. For instance, why can't the backend always
read all of the data from the queue, notifying if the transaction is
committed and saving to a local list otherwise (which would be checked
on the next wakeup)?

* Can you clarify the big comment at the top of async.c? It's a helpful
overview, but it glosses over some of the touchy synchronization steps
going on. I find myself trying to make a map of these steps myself.

Regards,Jeff Davis



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: plpgsql: open for execute - add USING clause
Next
From: Simon Riggs
Date:
Subject: Re: Streaming replication, and walsender during recovery