Re: allow online change primary_conninfo - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: allow online change primary_conninfo
Date
Msg-id 20190919.181133.15766613.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: allow online change primary_conninfo  (Sergei Kornilov <sk@zsrv.org>)
Responses Re: allow online change primary_conninfo
List pgsql-hackers
Hello. I looked this and have some comments.

At Wed, 28 Aug 2019 12:49:46 +0300, Sergei Kornilov <sk@zsrv.org> wrote in
<34084371566985786@sas1-0a6c2e2b59d7.qloud-c.yandex.net>
sk> Hello
sk>
sk> Updated patch attached. (also I merged into one file)
sk>
sk> > + <para>
sk> > + WAL receiver will be restarted after <varname>primary_slot_name</varname>
sk> > + was changed.
sk> >           </para>
sk> > The sentence sounds strange. Here is a suggestion:
sk> > The WAL receiver is restarted after an update of primary_slot_name (or
sk> > primary_conninfo).
sk>
sk> Changed.

-          This parameter can only be set at server start.
+          This parameter can only be set in the <filename>postgresql.conf</filename>
+          file or on the server command line.

I'm not sure it's good to change the context of this
description. This was mentioning that changing of this parameter
requires server (re)start. So if we want to be on the same
context after rewriting, it would be like "This parameter can be
set any time and causes WAL receiver restart with the new setting
if the server is in standby mode."

And If I'm not missing something, I don't find an (explict)
paramter of postmaster for setting primary_conninfo.



sk> > The comment at the top of the call of ProcessStartupSigHup() in
sk> > HandleStartupProcInterrupts() needs to be updated as it mentions a
sk> > configuration file re-read, but that's not the case anymore.
sk>
sk> Changed to "Process any requests or signals received recently." like in other places, e.g. syslogger
sk>
sk> > pendingRestartSource's name does not represent what it does, as it is
sk> > only associated with the restart of a WAL receiver when in streaming
sk> > state, and that's a no-op for the archive mode and the local mode.
sk>
sk> I renamed to pendingWalRcvRestart and replaced switch with simple condition.
sk>
sk> > So when shutting down the WAL receiver after a parameter change, what
sk> > happens is that the startup process waits for retrieve_retry_interval
sk> > before moving back to the archive mode. Only after scanning again the
sk> > archives do we restart a new WAL receiver.
sk>
sk> As I answered earlier, here is no switch to archive or wal_retrieve_retry_interval waiting in my patch. I recheck
oncurrent revision too: 

@@ -11787,48 +11793,6 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
                     if (!StandbyMode)
                         return false;

-                    /*
-                     * If primary_conninfo is set, launch walreceiver to try
-                     * to stream the missing WAL.
-                     *
-                     * If fetching_ckpt is true, RecPtr points to the initial
-                     * checkpoint location. In that case, we use RedoStartLSN

Mentioning code, the movement of the code block makes the
surrounding swtich almost useless and the structure and the
result looks somewhat strange.

Couldn't we do the same thing by just skipping the wait and
setting lastSourceFailed to true in the case of intentional
walreceiver restart?

The attached is an incomplete (fraction) patch to sketch what is
in my mind. With something like this and making
ProcessStartupSigHup kill walreceiver would work.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6876537b62..61b235f8f7 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11881,10 +11881,12 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
                      * obtaining the requested WAL. We're going to loop back
                      * and retry from the archive, but if it hasn't been long
                      * since last attempt, sleep wal_retrieve_retry_interval
-                     * milliseconds to avoid busy-waiting.
+                     * milliseconds to avoid busy-waiting. We don't wait if
+                     * explicitly requested to restart.
                      */
                     now = GetCurrentTimestamp();
-                    if (!TimestampDifferenceExceeds(last_fail_time, now,
+                    if (!pendingWalRcvRestart &&
+                        !TimestampDifferenceExceeds(last_fail_time, now,
                                                     wal_retrieve_retry_interval))
                     {
                         long        secs,
@@ -11905,6 +11907,17 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
                     }
                     last_fail_time = now;
                     currentSource = XLOG_FROM_ARCHIVE;
+
+                    /*
+                     * If wal receiver is requested to restart, we skip the
+                     * next XLOG_FROM_ARCHIVE to immediately starting it.
+                     */
+                    if (pendingWalRcvRestart)
+                    {
+                        lastSourceFailed = true;
+                        continue;
+                    }
+
                     break;
 
                 default:

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: one line doc patch for v12
Next
From: Amit Kapila
Date:
Subject: Re: one line doc patch for v12