LISTEN/NOTIFY testing woes - Mailing list pgsql-hackers

From Tom Lane
Subject LISTEN/NOTIFY testing woes
Date
Msg-id 13881.1574557302@sss.pgh.pa.us
Whole thread Raw
Responses Re: LISTEN/NOTIFY testing woes  (Mark Dilger <hornschnorter@gmail.com>)
Re: LISTEN/NOTIFY testing woes  (Martijn van Oosterhout <kleptog@gmail.com>)
List pgsql-hackers
I ran into a couple of issues while trying to devise a regression test
illustrating the LISTEN-in-serializable-transaction issue Mark Dilger
reported.  The first one is that an isolation test in which we expect
to see a cross-process NOTIFY immediately after a COMMIT turns out to
be not very stable: on my machine, it works as long as you're just
running the isolation tests by themselves, but it usually falls over
if I'm running check-world with any amount of parallelism.  The reason
for this seems to be that incoming notifies are only checked for when
we're about to wait for client input.  At that point we've already
sent the ReadyForQuery ('Z') protocol message, which will cause libpq
to stand down from looking for more input and return a null from
PQgetResult().  Depending on timing, the following Notify protocol
messages might arrive quickly enough that isolationtester.c sees them
before it goes off to do something else, but that's not very reliable.

In the case of self-notifies, postgres.c ensures that those get
transmitted to the frontend *before* ReadyForQuery, and this is what
makes self-notify cases stable enough to survive buildfarm testing.

I'm a bit surprised, now that I've seen this effect, that the existing
cross-session notify tests in async-notify.spec haven't given us
problems in the buildfarm.  (Maybe, now that I just pushed those into
the back branches, we'll start to see some failures?)  Anyway, what
I propose to do about this is patch 0001 attached, which tweaks
postgres.c to ensure that any cross-session notifies that arrived
during the just-finished transaction are also guaranteed to be sent
to the client before, not after, ReadyForQuery.

Another thing that I discovered while testing this is that as of HEAD,
you can't run "make installcheck" for the isolation tests more than
once without restarting the server.  If you do, you get a test result
mismatch because the async-notify test's first invocation of
pg_notification_queue_usage() returns a positive value.  Which is
entirely unsurprising, because the previous iteration ensured that
it would, and we've done nothing to make the queue tail advance since
then.

This seems both undesirable for our own testing, and rather bogus
from users' standpoints as well.  However, I think a simple fix is
available: just make the SQL pg_notification_queue_usage() function
advance the queue tail before measuring, as in 0002 below.  This will
restore the behavior of that function to what it was before 51004c717,
and it doesn't seem like it'd cost any performance in any plausible
use-cases.

0002 is only needed in HEAD, but I'll have to back-patch 0001 as
far as 9.6, to support a test case for the problem Mark discovered
and to ensure that back-patching b10f40bf0 doesn't cause any issues.

BTW, the fix and test case for Mark's issue look like 0003.  Without
the 0001 patch, it's unstable exactly when the "listener2: NOTIFY "c1"
with payload "" from notifier" message comes out.  But modulo that
issue, this test case reliably shows the assertion failure in the
back branches.

            regards, tom lane

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index d7a72c0..3b85e48 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -4194,7 +4194,18 @@ PostgresMain(int argc, char *argv[],
             }
             else
             {
+                /* Send out notify signals and transmit self-notifies */
                 ProcessCompletedNotifies();
+
+                /*
+                 * Also process incoming notifies, if any.  This is mostly to
+                 * ensure stable behavior in tests: if any notifies were
+                 * received during the just-finished transaction, they'll be
+                 * seen by the client before ReadyForQuery is.
+                 */
+                if (notifyInterruptPending)
+                    ProcessNotifyInterrupt();
+
                 pgstat_report_stat(false);
 
                 set_ps_display("idle", false);
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index a3209d0..cce2858 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -1554,6 +1573,9 @@ pg_notification_queue_usage(PG_FUNCTION_ARGS)
 {
     double        usage;
 
+    /* Advance the queue tail so we don't report a too-large result */
+    asyncQueueAdvanceTail();
+
     LWLockAcquire(AsyncQueueLock, LW_SHARED);
     usage = asyncQueueUsage();
     LWLockRelease(AsyncQueueLock);
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 8fe38c3..fa994c9 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2113,6 +2113,14 @@ CommitTransaction(void)
     AtEOXact_LargeObject(true);

     /*
+     * Insert notifications sent by NOTIFY commands into the queue.  This
+     * should be late in the pre-commit sequence to minimize time spent
+     * holding the notify-insertion lock.  However, this could result in
+     * creating a snapshot, so we must do it before serializable cleanup.
+     */
+    PreCommit_Notify();
+
+    /*
      * Mark serializable transaction as complete for predicate locking
      * purposes.  This should be done as late as we can put it and still allow
      * errors to be raised for failure patterns found at commit.  This is not
@@ -2122,13 +2130,6 @@ CommitTransaction(void)
     if (!is_parallel_worker)
         PreCommit_CheckForSerializationFailure();

-    /*
-     * Insert notifications sent by NOTIFY commands into the queue.  This
-     * should be late in the pre-commit sequence to minimize time spent
-     * holding the notify-insertion lock.
-     */
-    PreCommit_Notify();
-
     /* Prevent cancel/die interrupt while cleaning up */
     HOLD_INTERRUPTS();

@@ -2344,6 +2345,9 @@ PrepareTransaction(void)
     /* close large objects before lower-level cleanup */
     AtEOXact_LargeObject(true);

+    /* let NOTIFY do its thing, too */
+    PrePrepare_Notify();
+
     /*
      * Mark serializable transaction as complete for predicate locking
      * purposes.  This should be done as late as we can put it and still allow
@@ -2351,8 +2355,6 @@ PrepareTransaction(void)
      */
     PreCommit_CheckForSerializationFailure();

-    /* NOTIFY will be handled below */
-
     /*
      * Don't allow PREPARE TRANSACTION if we've accessed a temporary table in
      * this transaction.  Having the prepared xact hold locks on another
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index a3209d0..cce2858 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -851,13 +851,16 @@ Async_UnlistenOnExit(int code, Datum arg)
 }

 /*
- * AtPrepare_Notify
+ * PrePrepare_Notify
+ *
+ *        This is called during setup of the prepare phase of a two-phase
+ *        transaction, at a point corresponding to PreCommit_Notify.
  *
- *        This is called at the prepare phase of a two-phase
- *        transaction.  Save the state for possible commit later.
+ * We don't currently support having done anything related to LISTEN/NOTIFY
+ * in a prepared transaction, so just throw an error.
  */
 void
-AtPrepare_Notify(void)
+PrePrepare_Notify(void)
 {
     /* It's not allowed to have any pending LISTEN/UNLISTEN/NOTIFY actions */
     if (pendingActions || pendingNotifies)
@@ -867,6 +870,22 @@ AtPrepare_Notify(void)
 }

 /*
+ * AtPrepare_Notify
+ *
+ *        This is called in the prepare phase of a two-phase transaction,
+ *        at a point where we should save LISTEN/NOTIFY state.
+ *
+ * We don't currently support having done anything related to LISTEN/NOTIFY
+ * in a prepared transaction; this is just a placeholder for future work.
+ */
+void
+AtPrepare_Notify(void)
+{
+    /* PrePrepare_Notify should have thrown the error already */
+    Assert(!(pendingActions || pendingNotifies));
+}
+
+/*
  * PreCommit_Notify
  *
  *        This is called at transaction commit, before actually committing to
diff --git a/src/include/commands/async.h b/src/include/commands/async.h
index 90b682f..d18b74b 100644
--- a/src/include/commands/async.h
+++ b/src/include/commands/async.h
@@ -42,6 +42,7 @@ extern void AtCommit_Notify(void);
 extern void AtAbort_Notify(void);
 extern void AtSubCommit_Notify(void);
 extern void AtSubAbort_Notify(void);
+extern void PrePrepare_Notify(void);
 extern void AtPrepare_Notify(void);
 extern void ProcessCompletedNotifies(void);

diff --git a/src/test/isolation/expected/async-notify.out b/src/test/isolation/expected/async-notify.out
index 7ad26b7..7942778 100644
--- a/src/test/isolation/expected/async-notify.out
+++ b/src/test/isolation/expected/async-notify.out
@@ -1,4 +1,4 @@
-Parsed test spec with 2 sessions
+Parsed test spec with 3 sessions

 starting permutation: listenc notify1 notify2 notify3 notifyf
 step listenc: LISTEN c1; LISTEN c2;
@@ -83,6 +83,17 @@ listener: NOTIFY "c1" with payload "" from notifier
 listener: NOTIFY "c2" with payload "payload" from notifier
 listener: NOTIFY "c2" with payload "" from notifier

+starting permutation: l2listen l2begin notify1 lbegins llisten lcommit l2commit l2stop
+step l2listen: LISTEN c1;
+step l2begin: BEGIN;
+step notify1: NOTIFY c1;
+step lbegins: BEGIN ISOLATION LEVEL SERIALIZABLE;
+step llisten: LISTEN c1; LISTEN c2;
+step lcommit: COMMIT;
+step l2commit: COMMIT;
+listener2: NOTIFY "c1" with payload "" from notifier
+step l2stop: UNLISTEN *;
+
 starting permutation: llisten lbegin usage bignotify usage
 step llisten: LISTEN c1; LISTEN c2;
 step lbegin: BEGIN;
diff --git a/src/test/isolation/specs/async-notify.spec b/src/test/isolation/specs/async-notify.spec
index daf7bef..a7b2600 100644
--- a/src/test/isolation/specs/async-notify.spec
+++ b/src/test/isolation/specs/async-notify.spec
@@ -41,8 +41,18 @@ session "listener"
 step "llisten"    { LISTEN c1; LISTEN c2; }
 step "lcheck"    { SELECT 1 AS x; }
 step "lbegin"    { BEGIN; }
+step "lbegins"    { BEGIN ISOLATION LEVEL SERIALIZABLE; }
+step "lcommit"    { COMMIT; }
 teardown        { UNLISTEN *; }

+# In some tests we need a second listener, just to block the queue.
+
+session "listener2"
+step "l2listen"    { LISTEN c1; }
+step "l2begin"    { BEGIN; }
+step "l2commit"    { COMMIT; }
+step "l2stop"    { UNLISTEN *; }
+

 # Trivial cases.
 permutation "listenc" "notify1" "notify2" "notify3" "notifyf"
@@ -59,6 +69,10 @@ permutation "llisten" "notify1" "notify2" "notify3" "notifyf" "lcheck"
 # Again, with local delivery too.
 permutation "listenc" "llisten" "notify1" "notify2" "notify3" "notifyf" "lcheck"

+# Check for bug when initial listen is only action in a serializable xact,
+# and notify queue is not empty
+permutation "l2listen" "l2begin" "notify1" "lbegins" "llisten" "lcommit" "l2commit" "l2stop"
+
 # Verify that pg_notification_queue_usage correctly reports a non-zero result,
 # after submitting notifications while another connection is listening for
 # those notifications and waiting inside an active transaction.  We have to

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: backup manifests
Next
From: Steve Singer
Date:
Subject: Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace onthe fly