Re: NOTIFY does not work as expected - Mailing list pgsql-bugs

From Tom Lane
Subject Re: NOTIFY does not work as expected
Date
Msg-id 16416.1539970591@sss.pgh.pa.us
Whole thread Raw
In response to Re: NOTIFY does not work as expected  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: NOTIFY does not work as expected
List pgsql-bugs
I wrote:
> Andres Freund <andres@anarazel.de> writes:
>> That distinction was introduced because people (IIRC you actually) were
>> worried that we'd be less likely to get error messages out to the
>> client. Especially when you check unconditionally before actually doing
>> the write, it's going to be far less likely that we are able to send
>> something out to the client.

> Far less likely than what?  If we got a ProcDie signal we'd more often
> than not have handled it long before reaching here.  If we hadn't, though,
> we could arrive here with ProcDiePending set but the latch clear, in which
> case we fail to honor the interrupt until the client does something.
> Don't really think that's acceptable :-(.  I'm also not seeing why it's
> okay to commit ProcDie hara-kiri immediately if the socket is
> write-blocked but not otherwise --- the two cases are going to look about
> the same from the client side.

I spent some more time thinking about this.  It's possible to fix the bug
while preserving the current behavior for ProcDiePending if we adjust the
ProcessClientXXXInterrupt code to set the process latch in the cases where
ProcDiePending is true but we're not waiting for the socket to come ready.
See attached variant of 0001.

I am not, however, convinced that this is better rather than just more
complicated.

If we're willing to accept a ProcDie interrupt during secure_read at all,
I don't see why not to do it even if we got some data.  We'll accept the
interrupt anyway the next time something happens to do
CHECK_FOR_INTERRUPTS; and it's unlikely that that would not be till after
we'd completed the query, so the net effect is just going to be that we
waste some cycles first.

Likewise, I see little merit in holding off ProcDie during secure_write.
If we could try to postpone the interrupt until a message boundary, so as
to avoid losing protocol sync, there would be value in that --- but this
code is at the wrong level to implement any such behavior, and it's
not trying to.  So we still have the situation that the interrupt service
is being postponed without any identifiable gain in predictability of
behavior.

In short, I don't think that the existing logic here does anything useful
to meet the concern I had, and so I wouldn't mind throwing it away.

            regards, tom lane

PS: this patch extends the ProcessClientXXXInterrupt API to distinguish
before/during/after calls.  As written, there are only two behaviors
so we could've stuck with the bool API, but I thought this was clearer.

diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index d349d7c..66f61ce 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -145,6 +145,9 @@ secure_read(Port *port, void *ptr, size_t len)
     ssize_t        n;
     int            waitfor;

+    /* Deal with any already-pending interrupt condition. */
+    ProcessClientReadInterrupt(+1);
+
 retry:
 #ifdef USE_SSL
     waitfor = 0;
@@ -197,7 +200,7 @@ retry:
         if (event.events & WL_LATCH_SET)
         {
             ResetLatch(MyLatch);
-            ProcessClientReadInterrupt(true);
+            ProcessClientReadInterrupt(0);

             /*
              * We'll retry the read. Most likely it will return immediately
@@ -209,11 +212,10 @@ retry:
     }

     /*
-     * Process interrupts that happened while (or before) receiving. Note that
-     * we signal that we're not blocking, which will prevent some types of
-     * interrupts from being processed.
+     * Process interrupts that happened during a successful (or non-blocking,
+     * or hard-failed) read.
      */
-    ProcessClientReadInterrupt(false);
+    ProcessClientReadInterrupt(-1);

     return n;
 }
@@ -248,6 +250,9 @@ secure_write(Port *port, void *ptr, size_t len)
     ssize_t        n;
     int            waitfor;

+    /* Deal with any already-pending interrupt condition. */
+    ProcessClientWriteInterrupt(+1);
+
 retry:
     waitfor = 0;
 #ifdef USE_SSL
@@ -283,23 +288,22 @@ retry:
         if (event.events & WL_LATCH_SET)
         {
             ResetLatch(MyLatch);
-            ProcessClientWriteInterrupt(true);
+            ProcessClientWriteInterrupt(0);

             /*
              * We'll retry the write. Most likely it will return immediately
-             * because there's still no data available, and we'll wait for the
-             * socket to become ready again.
+             * because there's still no buffer space available, and we'll wait
+             * for the socket to become ready again.
              */
         }
         goto retry;
     }

     /*
-     * Process interrupts that happened while (or before) sending. Note that
-     * we signal that we're not blocking, which will prevent some types of
-     * interrupts from being processed.
+     * Process interrupts that happened during a successful (or non-blocking,
+     * or hard-failed) write.
      */
-    ProcessClientWriteInterrupt(false);
+    ProcessClientWriteInterrupt(-1);

     return n;
 }
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 6e13d14..23c04fa 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -315,7 +315,7 @@ interactive_getc(void)

     c = getc(stdin);

-    ProcessClientReadInterrupt(true);
+    ProcessClientReadInterrupt(-1);

     return c;
 }
@@ -520,35 +520,44 @@ ReadCommand(StringInfo inBuf)
 /*
  * ProcessClientReadInterrupt() - Process interrupts specific to client reads
  *
- * This is called just after low-level reads. That might be after the read
- * finished successfully, or it was interrupted via interrupt.
+ * This is called just before and after low-level reads.
+ * 'before' is +1 if about to read, 0 if no data was available to read and
+ * we plan to retry, -1 if done reading.
  *
  * Must preserve errno!
  */
 void
-ProcessClientReadInterrupt(bool blocked)
+ProcessClientReadInterrupt(int before)
 {
     int            save_errno = errno;

     if (DoingCommandRead)
     {
-        /* Check for general interrupts that arrived while reading */
+        /* Check for general interrupts that arrived before/while reading */
         CHECK_FOR_INTERRUPTS();

-        /* Process sinval catchup interrupts that happened while reading */
+        /* Process sinval catchup interrupts, if any */
         if (catchupInterruptPending)
             ProcessCatchupInterrupt();

-        /* Process sinval catchup interrupts that happened while reading */
+        /* Process notify interrupts, if any */
         if (notifyInterruptPending)
             ProcessNotifyInterrupt();
     }
-    else if (ProcDiePending && blocked)
+    else if (ProcDiePending)
     {
         /*
-         * We're dying. It's safe (and sane) to handle that now.
+         * We're dying.  If there is no data available to read, then it's safe
+         * (and sane) to handle that now.  If we haven't tried to read yet,
+         * make sure the process latch is set, so that if there is no data
+         * then we'll come back here and die.  If we're done reading, also
+         * make sure the process latch is set, as we might've undesirably
+         * cleared it while reading.
          */
-        CHECK_FOR_INTERRUPTS();
+        if (before == 0)
+            CHECK_FOR_INTERRUPTS();
+        else
+            SetLatch(MyLatch);
     }

     errno = save_errno;
@@ -557,36 +566,50 @@ ProcessClientReadInterrupt(bool blocked)
 /*
  * ProcessClientWriteInterrupt() - Process interrupts specific to client writes
  *
- * This is called just after low-level writes. That might be after the read
- * finished successfully, or it was interrupted via interrupt. 'blocked' tells
- * us whether the
+ * This is called just before and after low-level writes.
+ * 'before' is +1 if about to write, 0 if no data could be written and we plan
+ * to retry, -1 if done writing.
  *
  * Must preserve errno!
  */
 void
-ProcessClientWriteInterrupt(bool blocked)
+ProcessClientWriteInterrupt(int before)
 {
     int            save_errno = errno;

-    /*
-     * We only want to process the interrupt here if socket writes are
-     * blocking to increase the chance to get an error message to the client.
-     * If we're not blocked there'll soon be a CHECK_FOR_INTERRUPTS(). But if
-     * we're blocked we'll never get out of that situation if the client has
-     * died.
-     */
-    if (ProcDiePending && blocked)
+    if (ProcDiePending)
     {
         /*
-         * We're dying. It's safe (and sane) to handle that now. But we don't
-         * want to send the client the error message as that a) would possibly
-         * block again b) would possibly lead to sending an error message to
-         * the client, while we already started to send something else.
+         * We're dying.  If it's not possible to write, then we should handle
+         * that immediately, else a stuck client could indefinitely delay our
+         * response to the signal.  If we haven't tried to write yet, make
+         * sure the process latch is set, so that if the write would block
+         * then we'll come back here and die.  If we're done writing, also
+         * make sure the process latch is set, as we might've undesirably
+         * cleared it while writing.
          */
-        if (whereToSendOutput == DestRemote)
-            whereToSendOutput = DestNone;
+        if (before == 0)
+        {
+            /*
+             * Don't mess with whereToSendOutput if ProcessInterrupts wouldn't
+             * do anything.
+             */
+            if (InterruptHoldoffCount == 0 && CritSectionCount == 0)
+            {
+                /*
+                 * We don't want to send the client the error message, as a)
+                 * that would possibly block again, and b) it would likely
+                 * lead to loss of protocol sync because we may have already
+                 * sent a partial protocol message.
+                 */
+                if (whereToSendOutput == DestRemote)
+                    whereToSendOutput = DestNone;

-        CHECK_FOR_INTERRUPTS();
+                CHECK_FOR_INTERRUPTS();
+            }
+        }
+        else
+            SetLatch(MyLatch);
     }

     errno = save_errno;
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 63b4e48..b46c05c 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -71,8 +71,8 @@ extern void StatementCancelHandler(SIGNAL_ARGS);
 extern void FloatExceptionHandler(SIGNAL_ARGS) pg_attribute_noreturn();
 extern void RecoveryConflictInterrupt(ProcSignalReason reason); /* called from SIGUSR1
                                                                  * handler */
-extern void ProcessClientReadInterrupt(bool blocked);
-extern void ProcessClientWriteInterrupt(bool blocked);
+extern void ProcessClientReadInterrupt(int before);
+extern void ProcessClientWriteInterrupt(int before);

 extern void process_postgres_switches(int argc, char *argv[],
                           GucContext ctx, const char **dbname);

pgsql-bugs by date:

Previous
From: Martin Varady
Date:
Subject: Re: BUG #15445: Difference between two dates is not an integer
Next
From: Andres Freund
Date:
Subject: Re: NOTIFY does not work as expected