Thread: 回复: Why is XLOG_FPI_FOR_HINT always need backups?

回复: Why is XLOG_FPI_FOR_HINT always need backups?

From
"zwj"
Date:
Thank you for reply.
You are right and the PostgreSQL server writes the entire content of each disk page to WAL during the first modification of that page after a
checkpoint while data checksum is on.

But I wonder whether it is necessary or not while my file system can protect the blocks of database to be torn. And I read a comment in function MarkBufferDirtyHint:
```
/*
 * If we need to protect hint bit updates from torn writes, WAL-log a
 * full page image of the page. This full page image is only necessary
 * if the hint bit update is the first change to the page since the
 * last checkpoint.
 *
 * We don't check full_page_writes here because that logic is included
 * when we call XLogInsert() since the value changes dynamically.
 */

```
However, the code tell me it has nothing to do with full_page_writes. I can't figure it out. 

--
Zhang Wenjie

------------------ 原始邮件 ------------------
发件人: "Japin Li" <japinli@hotmail.com>;
发送时间: 2021年7月6日(星期二) 晚上7:04
收件人: "zwj"<757634191@qq.com>;
抄送: "pgsql-hackers"<pgsql-hackers@lists.postgresql.org>;
主题: Re: Why is XLOG_FPI_FOR_HINT always need backups?


On Tue, 06 Jul 2021 at 17:58, zwj <757634191@qq.com> wrote:
> Hi all,
>
> When I read the source code file src/backend/access/transam/xloginsert.c, I get something confused me.
> In the function XLogSaveBufferForHint, the flags are always REGBUF_FORCE_IMAGE which means it is always need backups.
> Is it right? Why do not check the full_page_writes?

The documentation [1] says:

------------------------------------------------------------------------------
wal_log_hints (boolean)
When this parameter is on, the PostgreSQL server writes the entire content of
each disk page to WAL during the first modification of that page after a
checkpoint, even for non-critical modifications of so-called hint bits.
------------------------------------------------------------------------------

Does that mean whether the full_page_writes enable or not, if the wal_log_hints
enabled, we always write the entire content of each disk page to WAL? If I'm
right, should we mention this in wal_log_hints?

[1] https://www.postgresql.org/docs/current/runtime-config-wal.html

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Re: 回复: Why is XLOG_FPI_FOR_HINT always need backups?

From
Kyotaro Horiguchi
Date:
Hello.

At Tue, 6 Jul 2021 20:42:23 +0800, "zwj" <757634191@qq.com> wrote in 
> But I wonder whether it is necessary or not while my file system can protect the blocks of database to be torn. And I
reada comment in function MarkBufferDirtyHint:
 
> 
> /*
>  * If we need to protect hint bit updates from torn writes, WAL-log a
>  * full page image of the page. This full page image is only necessary
>  * if the hint bit update is the first change to the page since the
>  * last checkpoint.
>  *
>  * We don't check full_page_writes here because that logic is included
>  * when we call XLogInsert() since the value changes dynamically.
>  */
> 
> However, the code tell me it has nothing to do with full_page_writes. I can't figure it out.

The doc of wal_log_hints says that "*even* for non-critical
modifications of so-called hint bits", which seems to me implies it is
following full_page_writes (and I think it is nonsense otherwise, as
you suspect).

XLogSaveBufferForHint sets REGBUF_FORCE_IMAGE since 2c03216d83116 when
the symbol was introduced. As my understanding XLogInsert did not have
an ability to enforce FPIs before the commit. The code comment above
is older than that commit. So it seems to me a thinko that
XLogSaveBufferForHint sets REGBUF_FORCE_IMAGE.

I think the attached fixes that thinko.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From d0d3d52953eda6451c212f4994fdc21a8284f6c1 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Wed, 7 Jul 2021 15:34:41 +0900
Subject: [PATCH] Make FPI_FOR_HINT follow standard FPI emitting policy

Commit 2c03216d831160bedd72d45f712601b6f7d03f1c changed
XLogSaveBufferForHint to enforce FPI but the caller didn't intend to
do so.  Restore the intended behavior that FPI_FOR_HINT follows
full_page_writes.
---
 src/backend/access/transam/xloginsert.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index 3d2c9c3e8c..e596a0470a 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -287,8 +287,6 @@ XLogRegisterBlock(uint8 block_id, RelFileNode *rnode, ForkNumber forknum,
 {
     registered_buffer *regbuf;
 
-    /* This is currently only used to WAL-log a full-page image of a page */
-    Assert(flags & REGBUF_FORCE_IMAGE);
     Assert(begininsert_called);
 
     if (block_id >= max_registered_block_id)
@@ -995,7 +993,7 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std)
 
     if (lsn <= RedoRecPtr)
     {
-        int            flags;
+        int            flags = 0;
         PGAlignedBlock copied_buffer;
         char       *origdata = (char *) BufferGetBlock(buffer);
         RelFileNode rnode;
@@ -1022,7 +1020,6 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std)
 
         XLogBeginInsert();
 
-        flags = REGBUF_FORCE_IMAGE;
         if (buffer_std)
             flags |= REGBUF_STANDARD;
 
-- 
2.27.0


Re: 回复: Why is XLOG_FPI_FOR_HINT always need backups?

From
Fujii Masao
Date:

On 2021/07/07 16:11, Kyotaro Horiguchi wrote:
> Hello.
> 
> At Tue, 6 Jul 2021 20:42:23 +0800, "zwj" <757634191@qq.com> wrote in
>> But I wonder whether it is necessary or not while my file system can protect the blocks of database to be torn. And
Iread a comment in function MarkBufferDirtyHint:
 
>>
>> /*
>>   * If we need to protect hint bit updates from torn writes, WAL-log a
>>   * full page image of the page. This full page image is only necessary
>>   * if the hint bit update is the first change to the page since the
>>   * last checkpoint.
>>   *
>>   * We don't check full_page_writes here because that logic is included
>>   * when we call XLogInsert() since the value changes dynamically.
>>   */
>>
>> However, the code tell me it has nothing to do with full_page_writes. I can't figure it out.
> 
> The doc of wal_log_hints says that "*even* for non-critical
> modifications of so-called hint bits", which seems to me implies it is
> following full_page_writes (and I think it is nonsense otherwise, as
> you suspect).
> 
> XLogSaveBufferForHint sets REGBUF_FORCE_IMAGE since 2c03216d83116 when
> the symbol was introduced. As my understanding XLogInsert did not have
> an ability to enforce FPIs before the commit. The code comment above
> is older than that commit. So it seems to me a thinko that
> XLogSaveBufferForHint sets REGBUF_FORCE_IMAGE.
> 
> I think the attached fixes that thinko.

With the patch, I got the following error during crash recovery.
I guess this happened because XLOG_FPI_FOR_HINT record had
no backup blocks even though the replay logic for XLOG_FPI_FOR_HINT
assumes it contains backup blocks.

FATAL:  unexpected XLogReadBufferForRedo result when restoring backup block
CONTEXT:  WAL redo at 0/169C600 for XLOG/FPI_FOR_HINT: ; blkref #0: rel 1663/13236/16385, blk 0

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: 回复: Why is XLOG_FPI_FOR_HINT always need backups?

From
Kyotaro Horiguchi
Date:
At Thu, 15 Jul 2021 22:50:08 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> On 2021/07/07 16:11, Kyotaro Horiguchi wrote:
> > The doc of wal_log_hints says that "*even* for non-critical
> > modifications of so-called hint bits", which seems to me implies it is
> > following full_page_writes (and I think it is nonsense otherwise, as
> > you suspect).
> > XLogSaveBufferForHint sets REGBUF_FORCE_IMAGE since 2c03216d83116 when
> > the symbol was introduced. As my understanding XLogInsert did not have
> > an ability to enforce FPIs before the commit. The code comment above
> > is older than that commit. So it seems to me a thinko that
> > XLogSaveBufferForHint sets REGBUF_FORCE_IMAGE.
> > I think the attached fixes that thinko.
> 
> With the patch, I got the following error during crash recovery.
> I guess this happened because XLOG_FPI_FOR_HINT record had
> no backup blocks even though the replay logic for XLOG_FPI_FOR_HINT
> assumes it contains backup blocks.
> 
> FATAL: unexpected XLogReadBufferForRedo result when restoring backup
> block
> CONTEXT: WAL redo at 0/169C600 for XLOG/FPI_FOR_HINT: ; blkref #0: rel
> 1663/13236/16385, blk 0

Sorry, I missed that the XLogReadBufferForRedo is expected to return
BLK_RESTORED.  And XLogReadBufferForRedo errors out when it tries to
read nonexistent page without having an FPI (this happens for FSM
pages).  Rather than teaching XLogReadBufferExtended to behave
differrently for the case, I choosed to avoid trying to load the page
when the corresponding FPI block is missing in XLOG_FPI_FOR_HINT, as
if the record itself did not exist at all.

Since differently from XLOG_FPI, XLOG_FPI_FOR_HINT has only one page
reference at most, but in the attached the decision whether to read
the page or not is made for each block.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From a6ce75300d0d0e0234ff0292fda0fe8c2a168ada Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Wed, 7 Jul 2021 15:34:41 +0900
Subject: [PATCH v2] Make FPI_FOR_HINT follow standard FPI emitting policy

Commit 2c03216d831160bedd72d45f712601b6f7d03f1c changed
XLogSaveBufferForHint to enforce FPI but the caller didn't intend to
do so.  Restore the intended behavior that FPI_FOR_HINT follows
full_page_writes.
---
 src/backend/access/transam/xlog.c       | 20 ++++++++++++++++----
 src/backend/access/transam/xloginsert.c |  5 +----
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index edb15fe58d..dcd71c01fb 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10126,7 +10126,10 @@ xlog_redo(XLogReaderState *record)
     uint8        info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
     XLogRecPtr    lsn = record->EndRecPtr;
 
-    /* in XLOG rmgr, backup blocks are only used by XLOG_FPI records */
+    /*
+     * in XLOG rmgr, backup blocks are only used by XLOG_FPI and
+     * XLOG_FPI_FOR_HINT records.
+     */
     Assert(info == XLOG_FPI || info == XLOG_FPI_FOR_HINT ||
            !XLogRecHasAnyBlockRefs(record));
 
@@ -10345,14 +10348,23 @@ xlog_redo(XLogReaderState *record)
          *
          * XLOG_FPI_FOR_HINT records are generated when a page needs to be
          * WAL- logged because of a hint bit update. They are only generated
-         * when checksums are enabled. There is no difference in handling
-         * XLOG_FPI and XLOG_FPI_FOR_HINT records, they use a different info
-         * code just to distinguish them for statistics purposes.
+         * when checksums and/or wal_log_hints are enabled. The only difference
+         * in handling XLOG_FPI and XLOG_FPI_FOR_HINT records is the latter is
+         * allowed to be missing actual block image. In that case the record is
+         * effectively a NOP record and should not even try to read the page
+         * from disk.
          */
         for (uint8 block_id = 0; block_id <= record->max_block_id; block_id++)
         {
             Buffer        buffer;
 
+            if (!XLogRecHasBlockImage(record, block_id))
+            {
+                if (info == XLOG_FPI)
+                    elog(ERROR, "missing full page image in XLOG_FPI record");
+                continue;
+            }
+
             if (XLogReadBufferForRedo(record, block_id, &buffer) != BLK_RESTORED)
                 elog(ERROR, "unexpected XLogReadBufferForRedo result when restoring backup block");
             UnlockReleaseBuffer(buffer);
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index 3d2c9c3e8c..e596a0470a 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -287,8 +287,6 @@ XLogRegisterBlock(uint8 block_id, RelFileNode *rnode, ForkNumber forknum,
 {
     registered_buffer *regbuf;
 
-    /* This is currently only used to WAL-log a full-page image of a page */
-    Assert(flags & REGBUF_FORCE_IMAGE);
     Assert(begininsert_called);
 
     if (block_id >= max_registered_block_id)
@@ -995,7 +993,7 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std)
 
     if (lsn <= RedoRecPtr)
     {
-        int            flags;
+        int            flags = 0;
         PGAlignedBlock copied_buffer;
         char       *origdata = (char *) BufferGetBlock(buffer);
         RelFileNode rnode;
@@ -1022,7 +1020,6 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std)
 
         XLogBeginInsert();
 
-        flags = REGBUF_FORCE_IMAGE;
         if (buffer_std)
             flags |= REGBUF_STANDARD;
 
-- 
2.27.0


Re: 回复: Why is XLOG_FPI_FOR_HINT always need backups?

From
Fujii Masao
Date:

On 2021/07/16 16:31, Kyotaro Horiguchi wrote:
> Sorry, I missed that the XLogReadBufferForRedo is expected to return
> BLK_RESTORED.  And XLogReadBufferForRedo errors out when it tries to
> read nonexistent page without having an FPI (this happens for FSM
> pages).  Rather than teaching XLogReadBufferExtended to behave
> differrently for the case, I choosed to avoid trying to load the page
> when the corresponding FPI block is missing in XLOG_FPI_FOR_HINT, as
> if the record itself did not exist at all.
> 
> Since differently from XLOG_FPI, XLOG_FPI_FOR_HINT has only one page
> reference at most, but in the attached the decision whether to read
> the page or not is made for each block.

Thanks for updating the patch! It basically looks good to me.

         * Full-page image (FPI) records contain nothing else but a backup
         * block (or multiple backup blocks). Every block reference must
         * include a full-page image - otherwise there would be no point in
         * this record.

The above comment also needs to be updated?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: 回复: Why is XLOG_FPI_FOR_HINT always need backups?

From
Kyotaro Horiguchi
Date:
At Sat, 17 Jul 2021 00:14:34 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> Thanks for updating the patch! It basically looks good to me.
> 
>          * Full-page image (FPI) records contain nothing else but a backup
>          * block (or multiple backup blocks). Every block reference must
>          * include a full-page image - otherwise there would be no point in
>          * this record.
> 
> The above comment also needs to be updated?

In short, no.  In contrast to the third paragraph, the first paragraph
should be thought that it is describing XLOG_FPI.  However, actually
it is not super obvious so it's better to make it clearer. Addition to
that, it seems to me (yes, to *me*) somewhat confused between "block
reference", "backup block" and "full-page image". So I'd like to
adjust the paragraph as the following.

> * XLOG_FPI records contain nothing else but one or more block
> * references. Every block reference must include a full-page image
> * regardless of the full_page_writes setting - otherwise there would
> * be no point in this record.

FYI (or, for the record), the first paragraph got to the current shape
by the commit 9155580fd5, where XLOG_FPI was modified to be able to
hold more than one block references.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From cd54ae028878fac0a97baebf71e39cb9b518e885 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Wed, 7 Jul 2021 15:34:41 +0900
Subject: [PATCH v3] Make FPI_FOR_HINT follow standard FPI emitting policy

Commit 2c03216d831160bedd72d45f712601b6f7d03f1c changed
XLogSaveBufferForHint to enforce FPI but the caller didn't intend to
do so.  Restore the intended behavior that FPI_FOR_HINT follows
full_page_writes.
---
 src/backend/access/transam/xlog.c       | 30 +++++++++++++++++--------
 src/backend/access/transam/xloginsert.c |  5 +----
 2 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 2ee9515139..ff10fef0d3 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10148,7 +10148,10 @@ xlog_redo(XLogReaderState *record)
     uint8        info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
     XLogRecPtr    lsn = record->EndRecPtr;
 
-    /* in XLOG rmgr, backup blocks are only used by XLOG_FPI records */
+    /*
+     * in XLOG rmgr, backup blocks are only used by XLOG_FPI and
+     * XLOG_FPI_FOR_HINT records.
+     */
     Assert(info == XLOG_FPI || info == XLOG_FPI_FOR_HINT ||
            !XLogRecHasAnyBlockRefs(record));
 
@@ -10356,25 +10359,34 @@ xlog_redo(XLogReaderState *record)
     else if (info == XLOG_FPI || info == XLOG_FPI_FOR_HINT)
     {
         /*
-         * Full-page image (FPI) records contain nothing else but a backup
-         * block (or multiple backup blocks). Every block reference must
-         * include a full-page image - otherwise there would be no point in
-         * this record.
+         * XLOG_FPI records contain nothing else but one or more block
+         * references. Every block reference must include a full-page image
+         * regardless of the full_page_writes setting - otherwise there would
+         * be no point in this record.
          *
          * No recovery conflicts are generated by these generic records - if a
          * resource manager needs to generate conflicts, it has to define a
          * separate WAL record type and redo routine.
          *
          * XLOG_FPI_FOR_HINT records are generated when a page needs to be
-         * WAL- logged because of a hint bit update. They are only generated
-         * when checksums are enabled. There is no difference in handling
-         * XLOG_FPI and XLOG_FPI_FOR_HINT records, they use a different info
-         * code just to distinguish them for statistics purposes.
+         * WAL-logged because of a hint bit update. They are only generated
+         * when checksums and/or wal_log_hints are enabled. The only difference
+         * in handling XLOG_FPI and XLOG_FPI_FOR_HINT records is that the
+         * latter is allowed to be missing actual block image. In that case the
+         * record is effectively a NOP record and should not even try to read
+         * the page from disk.
          */
         for (uint8 block_id = 0; block_id <= record->max_block_id; block_id++)
         {
             Buffer        buffer;
 
+            if (!XLogRecHasBlockImage(record, block_id))
+            {
+                if (info == XLOG_FPI)
+                    elog(ERROR, "missing full page image in XLOG_FPI record");
+                continue;
+            }
+
             if (XLogReadBufferForRedo(record, block_id, &buffer) != BLK_RESTORED)
                 elog(ERROR, "unexpected XLogReadBufferForRedo result when restoring backup block");
             UnlockReleaseBuffer(buffer);
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index 3d2c9c3e8c..e596a0470a 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -287,8 +287,6 @@ XLogRegisterBlock(uint8 block_id, RelFileNode *rnode, ForkNumber forknum,
 {
     registered_buffer *regbuf;
 
-    /* This is currently only used to WAL-log a full-page image of a page */
-    Assert(flags & REGBUF_FORCE_IMAGE);
     Assert(begininsert_called);
 
     if (block_id >= max_registered_block_id)
@@ -995,7 +993,7 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std)
 
     if (lsn <= RedoRecPtr)
     {
-        int            flags;
+        int            flags = 0;
         PGAlignedBlock copied_buffer;
         char       *origdata = (char *) BufferGetBlock(buffer);
         RelFileNode rnode;
@@ -1022,7 +1020,6 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std)
 
         XLogBeginInsert();
 
-        flags = REGBUF_FORCE_IMAGE;
         if (buffer_std)
             flags |= REGBUF_STANDARD;
 
-- 
2.27.0


Re: 回复: Why is XLOG_FPI_FOR_HINT always need backups?

From
Fujii Masao
Date:

On 2021/07/19 10:16, Kyotaro Horiguchi wrote:
> At Sat, 17 Jul 2021 00:14:34 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
>> Thanks for updating the patch! It basically looks good to me.
>>
>>          * Full-page image (FPI) records contain nothing else but a backup
>>          * block (or multiple backup blocks). Every block reference must
>>          * include a full-page image - otherwise there would be no point in
>>          * this record.
>>
>> The above comment also needs to be updated?
> 
> In short, no.  In contrast to the third paragraph, the first paragraph
> should be thought that it is describing XLOG_FPI.  However, actually
> it is not super obvious so it's better to make it clearer. Addition to
> that, it seems to me (yes, to *me*) somewhat confused between "block
> reference", "backup block" and "full-page image". So I'd like to
> adjust the paragraph as the following.

Understood. Thanks for updating the patch!

I slightly modified the comments and pushed the patch. Thanks!

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: 回复: Why is XLOG_FPI_FOR_HINT always need backups?

From
Kyotaro Horiguchi
Date:
At Wed, 21 Jul 2021 11:23:20 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> I slightly modified the comments and pushed the patch. Thanks!

Thank you for commiting this!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center