Re: 回复: Why is XLOG_FPI_FOR_HINT always need backups? - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: 回复: Why is XLOG_FPI_FOR_HINT always need backups?
Date
Msg-id 20210716.163130.393769596852922603.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: 回复: Why is XLOG_FPI_FOR_HINT always need backups?  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: 回复: Why is XLOG_FPI_FOR_HINT always need backups?  (Fujii Masao <masao.fujii@oss.nttdata.com>)
List pgsql-hackers
At Thu, 15 Jul 2021 22:50:08 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> On 2021/07/07 16:11, Kyotaro Horiguchi wrote:
> > The doc of wal_log_hints says that "*even* for non-critical
> > modifications of so-called hint bits", which seems to me implies it is
> > following full_page_writes (and I think it is nonsense otherwise, as
> > you suspect).
> > XLogSaveBufferForHint sets REGBUF_FORCE_IMAGE since 2c03216d83116 when
> > the symbol was introduced. As my understanding XLogInsert did not have
> > an ability to enforce FPIs before the commit. The code comment above
> > is older than that commit. So it seems to me a thinko that
> > XLogSaveBufferForHint sets REGBUF_FORCE_IMAGE.
> > I think the attached fixes that thinko.
> 
> With the patch, I got the following error during crash recovery.
> I guess this happened because XLOG_FPI_FOR_HINT record had
> no backup blocks even though the replay logic for XLOG_FPI_FOR_HINT
> assumes it contains backup blocks.
> 
> FATAL: unexpected XLogReadBufferForRedo result when restoring backup
> block
> CONTEXT: WAL redo at 0/169C600 for XLOG/FPI_FOR_HINT: ; blkref #0: rel
> 1663/13236/16385, blk 0

Sorry, I missed that the XLogReadBufferForRedo is expected to return
BLK_RESTORED.  And XLogReadBufferForRedo errors out when it tries to
read nonexistent page without having an FPI (this happens for FSM
pages).  Rather than teaching XLogReadBufferExtended to behave
differrently for the case, I choosed to avoid trying to load the page
when the corresponding FPI block is missing in XLOG_FPI_FOR_HINT, as
if the record itself did not exist at all.

Since differently from XLOG_FPI, XLOG_FPI_FOR_HINT has only one page
reference at most, but in the attached the decision whether to read
the page or not is made for each block.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From a6ce75300d0d0e0234ff0292fda0fe8c2a168ada Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Wed, 7 Jul 2021 15:34:41 +0900
Subject: [PATCH v2] Make FPI_FOR_HINT follow standard FPI emitting policy

Commit 2c03216d831160bedd72d45f712601b6f7d03f1c changed
XLogSaveBufferForHint to enforce FPI but the caller didn't intend to
do so.  Restore the intended behavior that FPI_FOR_HINT follows
full_page_writes.
---
 src/backend/access/transam/xlog.c       | 20 ++++++++++++++++----
 src/backend/access/transam/xloginsert.c |  5 +----
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index edb15fe58d..dcd71c01fb 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10126,7 +10126,10 @@ xlog_redo(XLogReaderState *record)
     uint8        info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
     XLogRecPtr    lsn = record->EndRecPtr;
 
-    /* in XLOG rmgr, backup blocks are only used by XLOG_FPI records */
+    /*
+     * in XLOG rmgr, backup blocks are only used by XLOG_FPI and
+     * XLOG_FPI_FOR_HINT records.
+     */
     Assert(info == XLOG_FPI || info == XLOG_FPI_FOR_HINT ||
            !XLogRecHasAnyBlockRefs(record));
 
@@ -10345,14 +10348,23 @@ xlog_redo(XLogReaderState *record)
          *
          * XLOG_FPI_FOR_HINT records are generated when a page needs to be
          * WAL- logged because of a hint bit update. They are only generated
-         * when checksums are enabled. There is no difference in handling
-         * XLOG_FPI and XLOG_FPI_FOR_HINT records, they use a different info
-         * code just to distinguish them for statistics purposes.
+         * when checksums and/or wal_log_hints are enabled. The only difference
+         * in handling XLOG_FPI and XLOG_FPI_FOR_HINT records is the latter is
+         * allowed to be missing actual block image. In that case the record is
+         * effectively a NOP record and should not even try to read the page
+         * from disk.
          */
         for (uint8 block_id = 0; block_id <= record->max_block_id; block_id++)
         {
             Buffer        buffer;
 
+            if (!XLogRecHasBlockImage(record, block_id))
+            {
+                if (info == XLOG_FPI)
+                    elog(ERROR, "missing full page image in XLOG_FPI record");
+                continue;
+            }
+
             if (XLogReadBufferForRedo(record, block_id, &buffer) != BLK_RESTORED)
                 elog(ERROR, "unexpected XLogReadBufferForRedo result when restoring backup block");
             UnlockReleaseBuffer(buffer);
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index 3d2c9c3e8c..e596a0470a 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -287,8 +287,6 @@ XLogRegisterBlock(uint8 block_id, RelFileNode *rnode, ForkNumber forknum,
 {
     registered_buffer *regbuf;
 
-    /* This is currently only used to WAL-log a full-page image of a page */
-    Assert(flags & REGBUF_FORCE_IMAGE);
     Assert(begininsert_called);
 
     if (block_id >= max_registered_block_id)
@@ -995,7 +993,7 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std)
 
     if (lsn <= RedoRecPtr)
     {
-        int            flags;
+        int            flags = 0;
         PGAlignedBlock copied_buffer;
         char       *origdata = (char *) BufferGetBlock(buffer);
         RelFileNode rnode;
@@ -1022,7 +1020,6 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std)
 
         XLogBeginInsert();
 
-        flags = REGBUF_FORCE_IMAGE;
         if (buffer_std)
             flags |= REGBUF_STANDARD;
 
-- 
2.27.0


pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: refactoring basebackup.c
Next
From: Greg Nancarrow
Date:
Subject: Re: row filtering for logical replication