Thread: LISTEN/NOTIFY testing woes

LISTEN/NOTIFY testing woes

From
Tom Lane
Date:
I ran into a couple of issues while trying to devise a regression test
illustrating the LISTEN-in-serializable-transaction issue Mark Dilger
reported.  The first one is that an isolation test in which we expect
to see a cross-process NOTIFY immediately after a COMMIT turns out to
be not very stable: on my machine, it works as long as you're just
running the isolation tests by themselves, but it usually falls over
if I'm running check-world with any amount of parallelism.  The reason
for this seems to be that incoming notifies are only checked for when
we're about to wait for client input.  At that point we've already
sent the ReadyForQuery ('Z') protocol message, which will cause libpq
to stand down from looking for more input and return a null from
PQgetResult().  Depending on timing, the following Notify protocol
messages might arrive quickly enough that isolationtester.c sees them
before it goes off to do something else, but that's not very reliable.

In the case of self-notifies, postgres.c ensures that those get
transmitted to the frontend *before* ReadyForQuery, and this is what
makes self-notify cases stable enough to survive buildfarm testing.

I'm a bit surprised, now that I've seen this effect, that the existing
cross-session notify tests in async-notify.spec haven't given us
problems in the buildfarm.  (Maybe, now that I just pushed those into
the back branches, we'll start to see some failures?)  Anyway, what
I propose to do about this is patch 0001 attached, which tweaks
postgres.c to ensure that any cross-session notifies that arrived
during the just-finished transaction are also guaranteed to be sent
to the client before, not after, ReadyForQuery.

Another thing that I discovered while testing this is that as of HEAD,
you can't run "make installcheck" for the isolation tests more than
once without restarting the server.  If you do, you get a test result
mismatch because the async-notify test's first invocation of
pg_notification_queue_usage() returns a positive value.  Which is
entirely unsurprising, because the previous iteration ensured that
it would, and we've done nothing to make the queue tail advance since
then.

This seems both undesirable for our own testing, and rather bogus
from users' standpoints as well.  However, I think a simple fix is
available: just make the SQL pg_notification_queue_usage() function
advance the queue tail before measuring, as in 0002 below.  This will
restore the behavior of that function to what it was before 51004c717,
and it doesn't seem like it'd cost any performance in any plausible
use-cases.

0002 is only needed in HEAD, but I'll have to back-patch 0001 as
far as 9.6, to support a test case for the problem Mark discovered
and to ensure that back-patching b10f40bf0 doesn't cause any issues.

BTW, the fix and test case for Mark's issue look like 0003.  Without
the 0001 patch, it's unstable exactly when the "listener2: NOTIFY "c1"
with payload "" from notifier" message comes out.  But modulo that
issue, this test case reliably shows the assertion failure in the
back branches.

            regards, tom lane

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index d7a72c0..3b85e48 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -4194,7 +4194,18 @@ PostgresMain(int argc, char *argv[],
             }
             else
             {
+                /* Send out notify signals and transmit self-notifies */
                 ProcessCompletedNotifies();
+
+                /*
+                 * Also process incoming notifies, if any.  This is mostly to
+                 * ensure stable behavior in tests: if any notifies were
+                 * received during the just-finished transaction, they'll be
+                 * seen by the client before ReadyForQuery is.
+                 */
+                if (notifyInterruptPending)
+                    ProcessNotifyInterrupt();
+
                 pgstat_report_stat(false);
 
                 set_ps_display("idle", false);
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index a3209d0..cce2858 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -1554,6 +1573,9 @@ pg_notification_queue_usage(PG_FUNCTION_ARGS)
 {
     double        usage;
 
+    /* Advance the queue tail so we don't report a too-large result */
+    asyncQueueAdvanceTail();
+
     LWLockAcquire(AsyncQueueLock, LW_SHARED);
     usage = asyncQueueUsage();
     LWLockRelease(AsyncQueueLock);
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 8fe38c3..fa994c9 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2113,6 +2113,14 @@ CommitTransaction(void)
     AtEOXact_LargeObject(true);

     /*
+     * Insert notifications sent by NOTIFY commands into the queue.  This
+     * should be late in the pre-commit sequence to minimize time spent
+     * holding the notify-insertion lock.  However, this could result in
+     * creating a snapshot, so we must do it before serializable cleanup.
+     */
+    PreCommit_Notify();
+
+    /*
      * Mark serializable transaction as complete for predicate locking
      * purposes.  This should be done as late as we can put it and still allow
      * errors to be raised for failure patterns found at commit.  This is not
@@ -2122,13 +2130,6 @@ CommitTransaction(void)
     if (!is_parallel_worker)
         PreCommit_CheckForSerializationFailure();

-    /*
-     * Insert notifications sent by NOTIFY commands into the queue.  This
-     * should be late in the pre-commit sequence to minimize time spent
-     * holding the notify-insertion lock.
-     */
-    PreCommit_Notify();
-
     /* Prevent cancel/die interrupt while cleaning up */
     HOLD_INTERRUPTS();

@@ -2344,6 +2345,9 @@ PrepareTransaction(void)
     /* close large objects before lower-level cleanup */
     AtEOXact_LargeObject(true);

+    /* let NOTIFY do its thing, too */
+    PrePrepare_Notify();
+
     /*
      * Mark serializable transaction as complete for predicate locking
      * purposes.  This should be done as late as we can put it and still allow
@@ -2351,8 +2355,6 @@ PrepareTransaction(void)
      */
     PreCommit_CheckForSerializationFailure();

-    /* NOTIFY will be handled below */
-
     /*
      * Don't allow PREPARE TRANSACTION if we've accessed a temporary table in
      * this transaction.  Having the prepared xact hold locks on another
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index a3209d0..cce2858 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -851,13 +851,16 @@ Async_UnlistenOnExit(int code, Datum arg)
 }

 /*
- * AtPrepare_Notify
+ * PrePrepare_Notify
+ *
+ *        This is called during setup of the prepare phase of a two-phase
+ *        transaction, at a point corresponding to PreCommit_Notify.
  *
- *        This is called at the prepare phase of a two-phase
- *        transaction.  Save the state for possible commit later.
+ * We don't currently support having done anything related to LISTEN/NOTIFY
+ * in a prepared transaction, so just throw an error.
  */
 void
-AtPrepare_Notify(void)
+PrePrepare_Notify(void)
 {
     /* It's not allowed to have any pending LISTEN/UNLISTEN/NOTIFY actions */
     if (pendingActions || pendingNotifies)
@@ -867,6 +870,22 @@ AtPrepare_Notify(void)
 }

 /*
+ * AtPrepare_Notify
+ *
+ *        This is called in the prepare phase of a two-phase transaction,
+ *        at a point where we should save LISTEN/NOTIFY state.
+ *
+ * We don't currently support having done anything related to LISTEN/NOTIFY
+ * in a prepared transaction; this is just a placeholder for future work.
+ */
+void
+AtPrepare_Notify(void)
+{
+    /* PrePrepare_Notify should have thrown the error already */
+    Assert(!(pendingActions || pendingNotifies));
+}
+
+/*
  * PreCommit_Notify
  *
  *        This is called at transaction commit, before actually committing to
diff --git a/src/include/commands/async.h b/src/include/commands/async.h
index 90b682f..d18b74b 100644
--- a/src/include/commands/async.h
+++ b/src/include/commands/async.h
@@ -42,6 +42,7 @@ extern void AtCommit_Notify(void);
 extern void AtAbort_Notify(void);
 extern void AtSubCommit_Notify(void);
 extern void AtSubAbort_Notify(void);
+extern void PrePrepare_Notify(void);
 extern void AtPrepare_Notify(void);
 extern void ProcessCompletedNotifies(void);

diff --git a/src/test/isolation/expected/async-notify.out b/src/test/isolation/expected/async-notify.out
index 7ad26b7..7942778 100644
--- a/src/test/isolation/expected/async-notify.out
+++ b/src/test/isolation/expected/async-notify.out
@@ -1,4 +1,4 @@
-Parsed test spec with 2 sessions
+Parsed test spec with 3 sessions

 starting permutation: listenc notify1 notify2 notify3 notifyf
 step listenc: LISTEN c1; LISTEN c2;
@@ -83,6 +83,17 @@ listener: NOTIFY "c1" with payload "" from notifier
 listener: NOTIFY "c2" with payload "payload" from notifier
 listener: NOTIFY "c2" with payload "" from notifier

+starting permutation: l2listen l2begin notify1 lbegins llisten lcommit l2commit l2stop
+step l2listen: LISTEN c1;
+step l2begin: BEGIN;
+step notify1: NOTIFY c1;
+step lbegins: BEGIN ISOLATION LEVEL SERIALIZABLE;
+step llisten: LISTEN c1; LISTEN c2;
+step lcommit: COMMIT;
+step l2commit: COMMIT;
+listener2: NOTIFY "c1" with payload "" from notifier
+step l2stop: UNLISTEN *;
+
 starting permutation: llisten lbegin usage bignotify usage
 step llisten: LISTEN c1; LISTEN c2;
 step lbegin: BEGIN;
diff --git a/src/test/isolation/specs/async-notify.spec b/src/test/isolation/specs/async-notify.spec
index daf7bef..a7b2600 100644
--- a/src/test/isolation/specs/async-notify.spec
+++ b/src/test/isolation/specs/async-notify.spec
@@ -41,8 +41,18 @@ session "listener"
 step "llisten"    { LISTEN c1; LISTEN c2; }
 step "lcheck"    { SELECT 1 AS x; }
 step "lbegin"    { BEGIN; }
+step "lbegins"    { BEGIN ISOLATION LEVEL SERIALIZABLE; }
+step "lcommit"    { COMMIT; }
 teardown        { UNLISTEN *; }

+# In some tests we need a second listener, just to block the queue.
+
+session "listener2"
+step "l2listen"    { LISTEN c1; }
+step "l2begin"    { BEGIN; }
+step "l2commit"    { COMMIT; }
+step "l2stop"    { UNLISTEN *; }
+

 # Trivial cases.
 permutation "listenc" "notify1" "notify2" "notify3" "notifyf"
@@ -59,6 +69,10 @@ permutation "llisten" "notify1" "notify2" "notify3" "notifyf" "lcheck"
 # Again, with local delivery too.
 permutation "listenc" "llisten" "notify1" "notify2" "notify3" "notifyf" "lcheck"

+# Check for bug when initial listen is only action in a serializable xact,
+# and notify queue is not empty
+permutation "l2listen" "l2begin" "notify1" "lbegins" "llisten" "lcommit" "l2commit" "l2stop"
+
 # Verify that pg_notification_queue_usage correctly reports a non-zero result,
 # after submitting notifications while another connection is listening for
 # those notifications and waiting inside an active transaction.  We have to

Re: LISTEN/NOTIFY testing woes

From
Mark Dilger
Date:

On 11/23/19 5:01 PM, Tom Lane wrote:
> I ran into a couple of issues while trying to devise a regression test
> illustrating the LISTEN-in-serializable-transaction issue Mark Dilger
> reported.  The first one is that an isolation test in which we expect
> to see a cross-process NOTIFY immediately after a COMMIT turns out to
> be not very stable: on my machine, it works as long as you're just
> running the isolation tests by themselves, but it usually falls over
> if I'm running check-world with any amount of parallelism.  The reason
> for this seems to be that incoming notifies are only checked for when
> we're about to wait for client input.  At that point we've already
> sent the ReadyForQuery ('Z') protocol message, which will cause libpq
> to stand down from looking for more input and return a null from
> PQgetResult().  Depending on timing, the following Notify protocol
> messages might arrive quickly enough that isolationtester.c sees them
> before it goes off to do something else, but that's not very reliable.

Thanks for working on this, Tom.

I have finished reading and applying your three patches and have moved 
on to testing them.  I hope to finish the review soon.




-- 
Mark Dilger



Re: LISTEN/NOTIFY testing woes

From
Martijn van Oosterhout
Date:
Hoi Tom,

On Sun, 24 Nov 2019 at 02:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> This seems both undesirable for our own testing, and rather bogus
> from users' standpoints as well.  However, I think a simple fix is
> available: just make the SQL pg_notification_queue_usage() function
> advance the queue tail before measuring, as in 0002 below.  This will
> restore the behavior of that function to what it was before 51004c717,
> and it doesn't seem like it'd cost any performance in any plausible
> use-cases.

This was one of those open points in the previous patches where it
wasn't quite clear what the correct behaviour should be. This fixes
the issue, but the question in my mind is: do we want to document this
fact and can users rely on this behaviour? If we go with the argument
that the delay in cleaning up should be entirely invisible, then I
guess this patch is the correct one that makes the made changes
invisible. Arguably not doing this means we'd have to document the
values are possibly out of date.

So I think this patch does the right thing.

Have a nice day,
-- 
Martijn van Oosterhout <kleptog@gmail.com> http://svana.org/kleptog/



Re: LISTEN/NOTIFY testing woes

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@gmail.com> writes:
> On Sun, 24 Nov 2019 at 02:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> This seems both undesirable for our own testing, and rather bogus
>> from users' standpoints as well.  However, I think a simple fix is
>> available: just make the SQL pg_notification_queue_usage() function
>> advance the queue tail before measuring, as in 0002 below.  This will
>> restore the behavior of that function to what it was before 51004c717,
>> and it doesn't seem like it'd cost any performance in any plausible
>> use-cases.

> This was one of those open points in the previous patches where it
> wasn't quite clear what the correct behaviour should be. This fixes
> the issue, but the question in my mind is: do we want to document this
> fact and can users rely on this behaviour? If we go with the argument
> that the delay in cleaning up should be entirely invisible, then I
> guess this patch is the correct one that makes the made changes
> invisible. Arguably not doing this means we'd have to document the
> values are possibly out of date.

> So I think this patch does the right thing.

Thanks for looking!  In the light of morning, there's one other
thing bothering me about this patch: it means that the function has
side-effects, even though those effects are at the implementation
level and shouldn't be user-visible.  We do already have it marked
"volatile", so that's OK, but I notice that it's not parallel
restricted.  The isolation test still passes when I set
force_parallel_mode = regress, so apparently it works to run this
code in a parallel worker, but that seems pretty scary to me;
certainly nothing in async.c was written with that in mind.
I think we'd be well advised to adjust pg_proc.dat to mark
pg_notification_queue_usage() as parallel-restricted, so that
it only executes in the main session process.  It's hard to
see any use-case for parallelizing it that would justify even
a small chance of new bugs.

            regards, tom lane



Re: LISTEN/NOTIFY testing woes

From
Mark Dilger
Date:

On 11/23/19 8:50 PM, Mark Dilger wrote:
> 
> 
> On 11/23/19 5:01 PM, Tom Lane wrote:
>> I ran into a couple of issues while trying to devise a regression test
>> illustrating the LISTEN-in-serializable-transaction issue Mark Dilger
>> reported.  The first one is that an isolation test in which we expect
>> to see a cross-process NOTIFY immediately after a COMMIT turns out to
>> be not very stable: on my machine, it works as long as you're just
>> running the isolation tests by themselves, but it usually falls over
>> if I'm running check-world with any amount of parallelism.  The reason
>> for this seems to be that incoming notifies are only checked for when
>> we're about to wait for client input.  At that point we've already
>> sent the ReadyForQuery ('Z') protocol message, which will cause libpq
>> to stand down from looking for more input and return a null from
>> PQgetResult().  Depending on timing, the following Notify protocol
>> messages might arrive quickly enough that isolationtester.c sees them
>> before it goes off to do something else, but that's not very reliable.
> 
> Thanks for working on this, Tom.
> 
> I have finished reading and applying your three patches and have moved 
> on to testing them.  I hope to finish the review soon.

After applying all three patches, the stress test that originally
uncovered the assert in predicate.c no longer triggers any asserts.
`check-world` passes.  The code and comments look good.

Your patches are ready for commit.

-- 
Mark Dilger



Re: LISTEN/NOTIFY testing woes

From
Tom Lane
Date:
Mark Dilger <hornschnorter@gmail.com> writes:
> On 11/23/19 8:50 PM, Mark Dilger wrote:
>> I have finished reading and applying your three patches and have moved 
>> on to testing them.  I hope to finish the review soon.

> After applying all three patches, the stress test that originally
> uncovered the assert in predicate.c no longer triggers any asserts.
> `check-world` passes.  The code and comments look good.

Thanks for reviewing!

After sleeping on it, I'm not really happy with what I did in
PrepareTransaction (that is, invent a separate PrePrepare_Notify
function).  The idea was to keep that looking parallel to what
CommitTransaction does, and preserve infrastructure against the
day that somebody gets motivated to allow LISTEN or NOTIFY in
a prepared transaction.  But on second thought, what would surely
happen when that feature gets added is just that AtPrepare_Notify
would serialize the pending LISTEN/NOTIFY actions into the 2PC
state file.  There wouldn't be any need for PrePrepare_Notify,
so there's no point in introducing that.  I'll just move the
comment saying that nothing has to happen at that point for NOTIFY.

            regards, tom lane



Re: LISTEN/NOTIFY testing woes

From
Mark Dilger
Date:

On 11/24/19 10:39 AM, Tom Lane wrote:
> Mark Dilger <hornschnorter@gmail.com> writes:
>> On 11/23/19 8:50 PM, Mark Dilger wrote:
>>> I have finished reading and applying your three patches and have moved
>>> on to testing them.  I hope to finish the review soon.
> 
>> After applying all three patches, the stress test that originally
>> uncovered the assert in predicate.c no longer triggers any asserts.
>> `check-world` passes.  The code and comments look good.
> 
> Thanks for reviewing!
> 
> After sleeping on it, I'm not really happy with what I did in
> PrepareTransaction (that is, invent a separate PrePrepare_Notify
> function).  The idea was to keep that looking parallel to what
> CommitTransaction does, and preserve infrastructure against the
> day that somebody gets motivated to allow LISTEN or NOTIFY in
> a prepared transaction.  But on second thought, what would surely
> happen when that feature gets added is just that AtPrepare_Notify
> would serialize the pending LISTEN/NOTIFY actions into the 2PC
> state file.  There wouldn't be any need for PrePrepare_Notify,
> so there's no point in introducing that.  I'll just move the
> comment saying that nothing has to happen at that point for NOTIFY.

I looked at that.  I thought it was an interesting decision to
factor out that error to its own function while leaving a
similar error inline just a little below in xact.c:

   if ((MyXactFlags & XACT_FLAGS_ACCESSEDTEMPNAMESPACE))
         ereport(ERROR,
                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                  errmsg("cannot PREPARE a transaction that has operated 
on temporary objects")));

I assumed you had factored it out in anticipation of supporting notify
here in the future.  If you want to backtrack that decision and leave it
inline, you would still keep the test rather than just a comment, right?
It sounds like you intend to let AtPrepare_Notify catch the problem
later, but since that's just an Assert and not an ereport(ERROR), that
provides less error checking for non-assert builds.


-- 
Mark Dilger



Re: LISTEN/NOTIFY testing woes

From
Tom Lane
Date:
Mark Dilger <hornschnorter@gmail.com> writes:
> On 11/24/19 10:39 AM, Tom Lane wrote:
>> After sleeping on it, I'm not really happy with what I did in
>> PrepareTransaction (that is, invent a separate PrePrepare_Notify
>> function).  The idea was to keep that looking parallel to what
>> CommitTransaction does, and preserve infrastructure against the
>> day that somebody gets motivated to allow LISTEN or NOTIFY in
>> a prepared transaction.  But on second thought, what would surely
>> happen when that feature gets added is just that AtPrepare_Notify
>> would serialize the pending LISTEN/NOTIFY actions into the 2PC
>> state file.  There wouldn't be any need for PrePrepare_Notify,
>> so there's no point in introducing that.  I'll just move the
>> comment saying that nothing has to happen at that point for NOTIFY.

> I assumed you had factored it out in anticipation of supporting notify
> here in the future.  If you want to backtrack that decision and leave it
> inline, you would still keep the test rather than just a comment, right?

No, there wouldn't be any error condition; that's just needed because the
feature isn't implemented yet.  So I'll leave that alone; the only thing
that needs to happen now in the PREPARE code path is to adjust the one
comment.

            regards, tom lane



Re: LISTEN/NOTIFY testing woes

From
Mark Dilger
Date:

On 11/24/19 11:04 AM, Tom Lane wrote:
> Mark Dilger <hornschnorter@gmail.com> writes:
>> On 11/24/19 10:39 AM, Tom Lane wrote:
>>> After sleeping on it, I'm not really happy with what I did in
>>> PrepareTransaction (that is, invent a separate PrePrepare_Notify
>>> function).  The idea was to keep that looking parallel to what
>>> CommitTransaction does, and preserve infrastructure against the
>>> day that somebody gets motivated to allow LISTEN or NOTIFY in
>>> a prepared transaction.  But on second thought, what would surely
>>> happen when that feature gets added is just that AtPrepare_Notify
>>> would serialize the pending LISTEN/NOTIFY actions into the 2PC
>>> state file.  There wouldn't be any need for PrePrepare_Notify,
>>> so there's no point in introducing that.  I'll just move the
>>> comment saying that nothing has to happen at that point for NOTIFY.
> 
>> I assumed you had factored it out in anticipation of supporting notify
>> here in the future.  If you want to backtrack that decision and leave it
>> inline, you would still keep the test rather than just a comment, right?
> 
> No, there wouldn't be any error condition; that's just needed because the
> feature isn't implemented yet.  So I'll leave that alone; the only thing
> that needs to happen now in the PREPARE code path is to adjust the one
> comment.

Ok.

-- 
Mark Dilger