Re: Problem while setting the fpw with SIGHUP - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: Problem while setting the fpw with SIGHUP
Date
Msg-id 20180420.173354.43303926.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: Problem while setting the fpw with SIGHUP  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
I noticed that the previous patch is a mixture with another
patch.. sorry.

At Thu, 19 Apr 2018 19:11:43 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
<CAA4eK1LZ8UBfOMWMHAQGZ0_5N7aWPOXgUYTvCV+J2SXkpmrjRg@mail.gmail.com>
> On Thu, Apr 19, 2018 at 7:19 AM, Michael Paquier <michael@paquier.xyz> wrote:
> > On Wed, Apr 18, 2018 at 10:52:51AM -0400, Robert Haas wrote:
> >> I would just document the risks.  If the documentation says that you
> >> can't rely on the value until after the next checkpoint, or whatever
> >> the rule is, then I think we're fine.  I don't think that we really
> >> have the infrastructure to do any better; if we try, we'll just end up
> >> with odd warts.  Documenting the current set of warts is less churn
> >> and less work.
> >
> > The last version of the patch proposed has eaten this diff which was
> > part of one of the past versions (v2-0001-Change-FPW-handling.patch from
> > https://www.postgresql.org/message-id/20180412.103430.133595350.horiguchi.kyotaro%40lab.ntt.co.jp):
> > +        The default is <literal>on</literal>. The change of the parameter takes
> > +        effect at the next checkpoint time.
> > So there were some documentation about the beHavior change for what it's
> > worth.
> >
> > And, er, actually, I was thinking again about the case where a user
> > wants to disable full_page_writes temporarily to do some bulk load and
> > then re-enable it.  With the patch proposed to actually update the FPW
> > effect at checkpoint time, then a user would need to issue a manual
> > checkpoint after updating the configuration and reloading, which may
> > create more I/O than he'd want to pay for, then a second checkpoint
> > would need to be issued after the configuration comes back again.
> >
> 
> Why a second checkpoint?  One checkpoint either manual or automatic
> should be enough to make the setting effective.

One significant point in my first patch is anyway the FPW is
useless untill the second checkpoint starts. In the sense of the
timing when *useful* FPW comes back, the second checkpoint is
required regardless of the patch. As Amit said, there is no
difference whether it is manual or automatic.

On the other hand the timing when FPW is actually turned off is
different. Focusing on user's view, one can run bulkload
immediately under the current behavior but should wait for the
next checkpoint starts with the first patch, which can be caused
manually but may be delayed after the currently running
checkpoint if any.

I think that starting the *first* checkpoint before bulkload is
not such a bother but on the other hand the behavior can lead to
FPW flood for those who are accostomed to the current behavior.

The attached first patch is the bugfix proposed in this thread.
The second fixes the cocurrent update problem only.

The third changes the behavior so that turning-on happenes only
on checkpoints and turning-off happens any time.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 9b3496ceb6f39b017698225fef289974bd01fb07 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Fri, 20 Apr 2018 15:01:26 +0900
Subject: [PATCH 1/3] Correctly attach FPI to the first record after a
 checkpoint

XLogInsert fails to attach a required FPIs to the first record after a
checkpoint if no other record has been written after full_page_writes
was turned on. This makes incosistency between fpw flag of the
checkpoint record and the following record's FPW status. This patch
makes XLogInsertRecord to cause a recomputation when the given record
is generated during FPW is off but found that the flag has been turned
on.
---
 src/backend/access/transam/xlog.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 08dc9ba031..27753f7321 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -937,7 +937,7 @@ static void WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt);
  *
  * If 'fpw_lsn' is valid, it is the oldest LSN among the pages that this
  * WAL record applies to, that were not included in the record as full page
- * images.  If fpw_lsn >= RedoRecPtr, the function does not perform the
+ * images.  If fpw_lsn <= RedoRecPtr, the function does not perform the
  * insertion and returns InvalidXLogRecPtr.  The caller can then recalculate
  * which pages need a full-page image, and retry.  If fpw_lsn is invalid, the
  * record is always inserted.
@@ -970,6 +970,7 @@ XLogInsertRecord(XLogRecData *rdata,
                                info == XLOG_SWITCH);
     XLogRecPtr    StartPos;
     XLogRecPtr    EndPos;
+    bool        prevDoPageWrites = doPageWrites;
 
     /* we assume that all of the record header is in the first chunk */
     Assert(rdata->len >= SizeOfXLogRecord);
@@ -1022,7 +1023,8 @@ XLogInsertRecord(XLogRecData *rdata,
      * This can only happen just after a checkpoint, so it's better to be slow
      * in this case and fast otherwise.
      *
-     * If we aren't doing full-page writes then RedoRecPtr doesn't actually
+     * If doPageWrites was just turned on, we force a recomputation. Otherwise
+     * if we aren't doing full-page writes then RedoRecPtr doesn't actually
      * affect the contents of the XLOG record, so we'll update our local copy
      * but not force a recomputation.  (If doPageWrites was just turned off,
      * we could recompute the record without full pages, but we choose not to
@@ -1035,7 +1037,9 @@ XLogInsertRecord(XLogRecData *rdata,
     }
     doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
 
-    if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr && doPageWrites)
+    if (doPageWrites &&
+        (!prevDoPageWrites ||
+         (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr)))
     {
         /*
          * Oops, some buffer now needs to be backed up that the caller didn't
-- 
2.16.3

From cd6f3162ed9ed678a61b6080a40f004920684b55 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Fri, 20 Apr 2018 17:27:49 +0900
Subject: [PATCH 2/3] step1

---
 src/backend/access/transam/xlog.c     | 19 ++++++++-----
 src/backend/postmaster/checkpointer.c | 51 ++++++++++++++++++++++++-----------
 src/include/catalog/pg_control.h      |  2 +-
 src/include/postmaster/bgwriter.h     |  1 +
 4 files changed, 50 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 27753f7321..72cc880139 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7580,14 +7580,15 @@ StartupXLOG(void)
     oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
 
     /*
-     * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE
-     * record before resource manager writes cleanup WAL records or checkpoint
-     * record is written.
+     * Update full_page_writes in shared memory with the lastest value before
+     * resource manager writes cleanup WAL records or checkpoint record is
+     * written. We don't need to write XLOG_FPW_CHANGE since this just
+     * reflects the status at the last redo'ed record. No lock is required
+     * since startup is the only updator of the flag at this
+     * point. Checkpointer will take over after SharedRecoveryInProgress is
+     * turned false.
      */
     Insert->fullPageWrites = lastFullPageWrites;
-    LocalSetXLogInsertAllowed();
-    UpdateFullPageWrites();
-    LocalXLogInsertAllowed = -1;
 
     if (InRecovery)
     {
@@ -7829,10 +7830,14 @@ StartupXLOG(void)
      * If this was a fast promotion, request an (online) checkpoint now. This
      * isn't required for consistency, but the last restartpoint might be far
      * back, and in case of a crash, recovering from it might take a longer
-     * than is appropriate now that we're not in standby mode anymore.
+     * than is appropriate now that we're not in standby mode anymore.  If
+     * checkpoint is not needed, tell checkpointer to update shared
+     * configuration instead.
      */
     if (fast_promoted)
         RequestCheckpoint(CHECKPOINT_FORCE);
+    else
+        WakeupCheckpointer();
 }
 
 /*
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 4b452e7cee..23d404f082 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -368,20 +368,26 @@ CheckpointerMain(void)
         {
             got_SIGHUP = false;
             ProcessConfigFile(PGC_SIGHUP);
-
-            /*
-             * Checkpointer is the last process to shut down, so we ask it to
-             * hold the keys for a range of other tasks required most of which
-             * have nothing to do with checkpointing at all.
-             *
-             * For various reasons, some config values can change dynamically
-             * so the primary copy of them is held in shared memory to make
-             * sure all backends see the same value.  We make Checkpointer
-             * responsible for updating the shared memory copy if the
-             * parameter setting changes because of SIGHUP.
-             */
-            UpdateSharedMemoryConfig();
         }
+
+        /*
+         * Checkpointer is the last process to shut down, so we ask it to hold
+         * the keys for a range of other tasks required most of which have
+         * nothing to do with checkpointing at all.
+         *
+         * For various reasons, some config values can change dynamically so
+         * the primary copy of them is held in shared memory to make sure all
+         * backends see the same value.  We make Checkpointer responsible for
+         * updating the shared memory copy if the parameter setting changes
+         * because of SIGHUP.
+         *
+         * Since some of the shared memory copy are in sync only after
+          * recovery ends, skipped updated are not refrected until the next
+          * SIGHUP. So call UpdateSharedMemoryConfig() regardless of SIGHUP in
+          * order to sync them without waiting the next reload.
+         */
+        UpdateSharedMemoryConfig();
+
         if (checkpoint_requested)
         {
             checkpoint_requested = false;
@@ -1090,6 +1096,19 @@ RequestCheckpoint(int flags)
     }
 }
 
+/*
+ * WakeupCheckpointer
+ *
+ * Requests checkpointer to wake up. This causes per-loop processing
+ * regardless of the necessity of a checkpoint.
+ */
+void
+WakeupCheckpointer(void)
+{
+    if (IsUnderPostmaster && CheckpointerShmem->checkpointer_pid != 0)
+        kill(CheckpointerShmem->checkpointer_pid, SIGINT);
+}
+
 /*
  * ForwardFsyncRequest
  *        Forward a file-fsync request from a backend to the checkpointer
@@ -1361,9 +1380,11 @@ UpdateSharedMemoryConfig(void)
 
     /*
      * If full_page_writes has been changed by SIGHUP, we update it in shared
-     * memory and write an XLOG_FPW_CHANGE record.
+     * memory and write an XLOG_FPW_CHANGE record. We don't do this until
+     * recovery ends.
      */
-    UpdateFullPageWrites();
+    if (!RecoveryInProgress())
+        UpdateFullPageWrites();
 
     elog(DEBUG2, "checkpointer updated shared memory configuration values");
 }
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 773d9e6eba..a7e22e0f5f 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -38,7 +38,7 @@ typedef struct CheckPoint
     TimeLineID    ThisTimeLineID; /* current TLI */
     TimeLineID    PrevTimeLineID; /* previous TLI, if this record begins a new
                                  * timeline (equals ThisTimeLineID otherwise) */
-    bool        fullPageWrites; /* current full_page_writes */
+    bool        fullPageWrites; /* full_page_writes at REDO start point */
     uint32        nextXidEpoch;    /* higher-order bits of nextXid */
     TransactionId nextXid;        /* next free XID */
     Oid            nextOid;        /* next free OID */
diff --git a/src/include/postmaster/bgwriter.h b/src/include/postmaster/bgwriter.h
index 941c6aba7d..3aac1606a0 100644
--- a/src/include/postmaster/bgwriter.h
+++ b/src/include/postmaster/bgwriter.h
@@ -29,6 +29,7 @@ extern void BackgroundWriterMain(void) pg_attribute_noreturn();
 extern void CheckpointerMain(void) pg_attribute_noreturn();
 
 extern void RequestCheckpoint(int flags);
+extern void WakeupCheckpointer(void);
 extern void CheckpointWriteDelay(int flags, double progress);
 
 extern bool ForwardFsyncRequest(RelFileNode rnode, ForkNumber forknum,
-- 
2.16.3

From 8cb3bb6cf49d64eee0f27fb9f2264abf0cf89f70 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Fri, 20 Apr 2018 17:28:56 +0900
Subject: [PATCH 3/3] step2

---
 doc/src/sgml/config.sgml              |  5 ++++-
 src/backend/access/transam/xlog.c     | 15 ++++++++++++---
 src/backend/postmaster/checkpointer.c | 14 ++++----------
 src/include/access/xlog.h             |  2 +-
 4 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5d5f2d23c4..d39f0c12d4 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2598,7 +2598,10 @@ include_dir 'conf.d'
        <para>
         This parameter can only be set in the <filename>postgresql.conf</filename>
         file or on the server command line.
-        The default is <literal>on</literal>.
+
+        The default is <literal>on</literal>. Turning of of this variable is
+        immediately takes effect and turning on takes effect at the next
+        checkpoint time.
        </para>
       </listitem>
      </varlistentry>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 72cc880139..efb3abd979 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8678,6 +8678,9 @@ CreateCheckPoint(int flags)
      */
     last_important_lsn = GetLastImportantRecPtr();
 
+    /* update fullPageWrites in shared memory if turned on */
+    UpdateFullPageWrites(true);
+
     /*
      * We must block concurrent insertions while examining insert state to
      * determine the checkpoint REDO pointer.
@@ -9554,11 +9557,13 @@ XLogReportParameters(void)
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
  *
- * Note: this function assumes there is no other process running
- * concurrently that could update it.
+ * Processes only turning on if turn_on is true or turning off if not.
+ *
+ * Note: this function assumes there is no other process running concurrently
+ * that could update it.
  */
 void
-UpdateFullPageWrites(void)
+UpdateFullPageWrites(bool turn_on)
 {
     XLogCtlInsert *Insert = &XLogCtl->Insert;
 
@@ -9572,6 +9577,10 @@ UpdateFullPageWrites(void)
     if (fullPageWrites == Insert->fullPageWrites)
         return;
 
+    /* return if we don't handle the change now  */
+    if ((turn_on && !fullPageWrites) || (!turn_on && fullPageWrites))
+        return;
+
     START_CRIT_SECTION();
 
     /*
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 23d404f082..60d1f6b3b0 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -332,12 +332,6 @@ CheckpointerMain(void)
      */
     PG_SETMASK(&UnBlockSig);
 
-    /*
-     * Ensure all shared memory values are set correctly for the config. Doing
-     * this here ensures no race conditions from other concurrent updaters.
-     */
-    UpdateSharedMemoryConfig();
-
     /*
      * Advertise our latch that backends can use to wake us up while we're
      * sleeping.
@@ -1379,12 +1373,12 @@ UpdateSharedMemoryConfig(void)
     SyncRepUpdateSyncStandbysDefined();
 
     /*
-     * If full_page_writes has been changed by SIGHUP, we update it in shared
-     * memory and write an XLOG_FPW_CHANGE record. We don't do this until
-     * recovery ends.
+     * If full_page_writes has been turned off by SIGHUP, we update it in
+     * shared memory and write an XLOG_FPW_CHANGE record. We don't do this
+     * until recovery ends.
      */
     if (!RecoveryInProgress())
-        UpdateFullPageWrites();
+        UpdateFullPageWrites(false);
 
     elog(DEBUG2, "checkpointer updated shared memory configuration values");
 }
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 421ba6d775..dc90d64a2c 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -270,7 +270,7 @@ extern void CreateCheckPoint(int flags);
 extern bool CreateRestartPoint(int flags);
 extern void XLogPutNextOid(Oid nextOid);
 extern XLogRecPtr XLogRestorePoint(const char *rpName);
-extern void UpdateFullPageWrites(void);
+extern void UpdateFullPageWrites(bool turn_on);
 extern void GetFullPageWriteInfo(XLogRecPtr *RedoRecPtr_p, bool *doPageWrites_p);
 extern XLogRecPtr GetRedoRecPtr(void);
 extern XLogRecPtr GetInsertRecPtr(void);
-- 
2.16.3


pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: Built-in connection pooling
Next
From: Konstantin Knizhnik
Date:
Subject: Re: Built-in connection pooling