Thread: 回复: Why is XLOG_FPI_FOR_HINT always need backups?
Thank you for reply.
You are right and the PostgreSQL server writes the entire content of each disk page to WAL during the first modification of that page after a
checkpoint while data checksum is on. But I wonder whether it is necessary or not while my file system can protect the blocks of database to be torn. And I read a comment in function MarkBufferDirtyHint:
```
/*
* If we need to protect hint bit updates from torn writes, WAL-log a
* full page image of the page. This full page image is only necessary
* if the hint bit update is the first change to the page since the
* last checkpoint.
*
* We don't check full_page_writes here because that logic is included
* when we call XLogInsert() since the value changes dynamically.
*/
```
On Tue, 06 Jul 2021 at 17:58, zwj <757634191@qq.com> wrote:
> Hi all,
>
> When I read the source code file src/backend/access/transam/xloginsert.c, I get something confused me.
> In the function XLogSaveBufferForHint, the flags are always REGBUF_FORCE_IMAGE which means it is always need backups.
> Is it right? Why do not check the full_page_writes?
The documentation [1] says:
------------------------------------------------------------------------------
wal_log_hints (boolean)
When this parameter is on, the PostgreSQL server writes the entire content of
each disk page to WAL during the first modification of that page after a
checkpoint, even for non-critical modifications of so-called hint bits.
------------------------------------------------------------------------------
Does that mean whether the full_page_writes enable or not, if the wal_log_hints
enabled, we always write the entire content of each disk page to WAL? If I'm
right, should we mention this in wal_log_hints?
[1] https://www.postgresql.org/docs/current/runtime-config-wal.html
--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.
However, the code tell me it has nothing to do with full_page_writes. I can't figure it out.
--
Zhang Wenjie
Zhang Wenjie
------------------ 原始邮件 ------------------
发件人: "Japin Li" <japinli@hotmail.com>;
发送时间: 2021年7月6日(星期二) 晚上7:04
收件人: "zwj"<757634191@qq.com>;
抄送: "pgsql-hackers"<pgsql-hackers@lists.postgresql.org>;
主题: Re: Why is XLOG_FPI_FOR_HINT always need backups?
On Tue, 06 Jul 2021 at 17:58, zwj <757634191@qq.com> wrote:
> Hi all,
>
> When I read the source code file src/backend/access/transam/xloginsert.c, I get something confused me.
> In the function XLogSaveBufferForHint, the flags are always REGBUF_FORCE_IMAGE which means it is always need backups.
> Is it right? Why do not check the full_page_writes?
The documentation [1] says:
------------------------------------------------------------------------------
wal_log_hints (boolean)
When this parameter is on, the PostgreSQL server writes the entire content of
each disk page to WAL during the first modification of that page after a
checkpoint, even for non-critical modifications of so-called hint bits.
------------------------------------------------------------------------------
Does that mean whether the full_page_writes enable or not, if the wal_log_hints
enabled, we always write the entire content of each disk page to WAL? If I'm
right, should we mention this in wal_log_hints?
[1] https://www.postgresql.org/docs/current/runtime-config-wal.html
--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.
Hello. At Tue, 6 Jul 2021 20:42:23 +0800, "zwj" <757634191@qq.com> wrote in > But I wonder whether it is necessary or not while my file system can protect the blocks of database to be torn. And I reada comment in function MarkBufferDirtyHint: > > /* > * If we need to protect hint bit updates from torn writes, WAL-log a > * full page image of the page. This full page image is only necessary > * if the hint bit update is the first change to the page since the > * last checkpoint. > * > * We don't check full_page_writes here because that logic is included > * when we call XLogInsert() since the value changes dynamically. > */ > > However, the code tell me it has nothing to do with full_page_writes. I can't figure it out. The doc of wal_log_hints says that "*even* for non-critical modifications of so-called hint bits", which seems to me implies it is following full_page_writes (and I think it is nonsense otherwise, as you suspect). XLogSaveBufferForHint sets REGBUF_FORCE_IMAGE since 2c03216d83116 when the symbol was introduced. As my understanding XLogInsert did not have an ability to enforce FPIs before the commit. The code comment above is older than that commit. So it seems to me a thinko that XLogSaveBufferForHint sets REGBUF_FORCE_IMAGE. I think the attached fixes that thinko. regards. -- Kyotaro Horiguchi NTT Open Source Software Center From d0d3d52953eda6451c212f4994fdc21a8284f6c1 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Date: Wed, 7 Jul 2021 15:34:41 +0900 Subject: [PATCH] Make FPI_FOR_HINT follow standard FPI emitting policy Commit 2c03216d831160bedd72d45f712601b6f7d03f1c changed XLogSaveBufferForHint to enforce FPI but the caller didn't intend to do so. Restore the intended behavior that FPI_FOR_HINT follows full_page_writes. --- src/backend/access/transam/xloginsert.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c index 3d2c9c3e8c..e596a0470a 100644 --- a/src/backend/access/transam/xloginsert.c +++ b/src/backend/access/transam/xloginsert.c @@ -287,8 +287,6 @@ XLogRegisterBlock(uint8 block_id, RelFileNode *rnode, ForkNumber forknum, { registered_buffer *regbuf; - /* This is currently only used to WAL-log a full-page image of a page */ - Assert(flags & REGBUF_FORCE_IMAGE); Assert(begininsert_called); if (block_id >= max_registered_block_id) @@ -995,7 +993,7 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std) if (lsn <= RedoRecPtr) { - int flags; + int flags = 0; PGAlignedBlock copied_buffer; char *origdata = (char *) BufferGetBlock(buffer); RelFileNode rnode; @@ -1022,7 +1020,6 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std) XLogBeginInsert(); - flags = REGBUF_FORCE_IMAGE; if (buffer_std) flags |= REGBUF_STANDARD; -- 2.27.0
On 2021/07/07 16:11, Kyotaro Horiguchi wrote: > Hello. > > At Tue, 6 Jul 2021 20:42:23 +0800, "zwj" <757634191@qq.com> wrote in >> But I wonder whether it is necessary or not while my file system can protect the blocks of database to be torn. And Iread a comment in function MarkBufferDirtyHint: >> >> /* >> * If we need to protect hint bit updates from torn writes, WAL-log a >> * full page image of the page. This full page image is only necessary >> * if the hint bit update is the first change to the page since the >> * last checkpoint. >> * >> * We don't check full_page_writes here because that logic is included >> * when we call XLogInsert() since the value changes dynamically. >> */ >> >> However, the code tell me it has nothing to do with full_page_writes. I can't figure it out. > > The doc of wal_log_hints says that "*even* for non-critical > modifications of so-called hint bits", which seems to me implies it is > following full_page_writes (and I think it is nonsense otherwise, as > you suspect). > > XLogSaveBufferForHint sets REGBUF_FORCE_IMAGE since 2c03216d83116 when > the symbol was introduced. As my understanding XLogInsert did not have > an ability to enforce FPIs before the commit. The code comment above > is older than that commit. So it seems to me a thinko that > XLogSaveBufferForHint sets REGBUF_FORCE_IMAGE. > > I think the attached fixes that thinko. With the patch, I got the following error during crash recovery. I guess this happened because XLOG_FPI_FOR_HINT record had no backup blocks even though the replay logic for XLOG_FPI_FOR_HINT assumes it contains backup blocks. FATAL: unexpected XLogReadBufferForRedo result when restoring backup block CONTEXT: WAL redo at 0/169C600 for XLOG/FPI_FOR_HINT: ; blkref #0: rel 1663/13236/16385, blk 0 Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
At Thu, 15 Jul 2021 22:50:08 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > On 2021/07/07 16:11, Kyotaro Horiguchi wrote: > > The doc of wal_log_hints says that "*even* for non-critical > > modifications of so-called hint bits", which seems to me implies it is > > following full_page_writes (and I think it is nonsense otherwise, as > > you suspect). > > XLogSaveBufferForHint sets REGBUF_FORCE_IMAGE since 2c03216d83116 when > > the symbol was introduced. As my understanding XLogInsert did not have > > an ability to enforce FPIs before the commit. The code comment above > > is older than that commit. So it seems to me a thinko that > > XLogSaveBufferForHint sets REGBUF_FORCE_IMAGE. > > I think the attached fixes that thinko. > > With the patch, I got the following error during crash recovery. > I guess this happened because XLOG_FPI_FOR_HINT record had > no backup blocks even though the replay logic for XLOG_FPI_FOR_HINT > assumes it contains backup blocks. > > FATAL: unexpected XLogReadBufferForRedo result when restoring backup > block > CONTEXT: WAL redo at 0/169C600 for XLOG/FPI_FOR_HINT: ; blkref #0: rel > 1663/13236/16385, blk 0 Sorry, I missed that the XLogReadBufferForRedo is expected to return BLK_RESTORED. And XLogReadBufferForRedo errors out when it tries to read nonexistent page without having an FPI (this happens for FSM pages). Rather than teaching XLogReadBufferExtended to behave differrently for the case, I choosed to avoid trying to load the page when the corresponding FPI block is missing in XLOG_FPI_FOR_HINT, as if the record itself did not exist at all. Since differently from XLOG_FPI, XLOG_FPI_FOR_HINT has only one page reference at most, but in the attached the decision whether to read the page or not is made for each block. regards. -- Kyotaro Horiguchi NTT Open Source Software Center From a6ce75300d0d0e0234ff0292fda0fe8c2a168ada Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Date: Wed, 7 Jul 2021 15:34:41 +0900 Subject: [PATCH v2] Make FPI_FOR_HINT follow standard FPI emitting policy Commit 2c03216d831160bedd72d45f712601b6f7d03f1c changed XLogSaveBufferForHint to enforce FPI but the caller didn't intend to do so. Restore the intended behavior that FPI_FOR_HINT follows full_page_writes. --- src/backend/access/transam/xlog.c | 20 ++++++++++++++++---- src/backend/access/transam/xloginsert.c | 5 +---- 2 files changed, 17 insertions(+), 8 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index edb15fe58d..dcd71c01fb 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -10126,7 +10126,10 @@ xlog_redo(XLogReaderState *record) uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK; XLogRecPtr lsn = record->EndRecPtr; - /* in XLOG rmgr, backup blocks are only used by XLOG_FPI records */ + /* + * in XLOG rmgr, backup blocks are only used by XLOG_FPI and + * XLOG_FPI_FOR_HINT records. + */ Assert(info == XLOG_FPI || info == XLOG_FPI_FOR_HINT || !XLogRecHasAnyBlockRefs(record)); @@ -10345,14 +10348,23 @@ xlog_redo(XLogReaderState *record) * * XLOG_FPI_FOR_HINT records are generated when a page needs to be * WAL- logged because of a hint bit update. They are only generated - * when checksums are enabled. There is no difference in handling - * XLOG_FPI and XLOG_FPI_FOR_HINT records, they use a different info - * code just to distinguish them for statistics purposes. + * when checksums and/or wal_log_hints are enabled. The only difference + * in handling XLOG_FPI and XLOG_FPI_FOR_HINT records is the latter is + * allowed to be missing actual block image. In that case the record is + * effectively a NOP record and should not even try to read the page + * from disk. */ for (uint8 block_id = 0; block_id <= record->max_block_id; block_id++) { Buffer buffer; + if (!XLogRecHasBlockImage(record, block_id)) + { + if (info == XLOG_FPI) + elog(ERROR, "missing full page image in XLOG_FPI record"); + continue; + } + if (XLogReadBufferForRedo(record, block_id, &buffer) != BLK_RESTORED) elog(ERROR, "unexpected XLogReadBufferForRedo result when restoring backup block"); UnlockReleaseBuffer(buffer); diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c index 3d2c9c3e8c..e596a0470a 100644 --- a/src/backend/access/transam/xloginsert.c +++ b/src/backend/access/transam/xloginsert.c @@ -287,8 +287,6 @@ XLogRegisterBlock(uint8 block_id, RelFileNode *rnode, ForkNumber forknum, { registered_buffer *regbuf; - /* This is currently only used to WAL-log a full-page image of a page */ - Assert(flags & REGBUF_FORCE_IMAGE); Assert(begininsert_called); if (block_id >= max_registered_block_id) @@ -995,7 +993,7 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std) if (lsn <= RedoRecPtr) { - int flags; + int flags = 0; PGAlignedBlock copied_buffer; char *origdata = (char *) BufferGetBlock(buffer); RelFileNode rnode; @@ -1022,7 +1020,6 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std) XLogBeginInsert(); - flags = REGBUF_FORCE_IMAGE; if (buffer_std) flags |= REGBUF_STANDARD; -- 2.27.0
On 2021/07/16 16:31, Kyotaro Horiguchi wrote: > Sorry, I missed that the XLogReadBufferForRedo is expected to return > BLK_RESTORED. And XLogReadBufferForRedo errors out when it tries to > read nonexistent page without having an FPI (this happens for FSM > pages). Rather than teaching XLogReadBufferExtended to behave > differrently for the case, I choosed to avoid trying to load the page > when the corresponding FPI block is missing in XLOG_FPI_FOR_HINT, as > if the record itself did not exist at all. > > Since differently from XLOG_FPI, XLOG_FPI_FOR_HINT has only one page > reference at most, but in the attached the decision whether to read > the page or not is made for each block. Thanks for updating the patch! It basically looks good to me. * Full-page image (FPI) records contain nothing else but a backup * block (or multiple backup blocks). Every block reference must * include a full-page image - otherwise there would be no point in * this record. The above comment also needs to be updated? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
At Sat, 17 Jul 2021 00:14:34 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > Thanks for updating the patch! It basically looks good to me. > > * Full-page image (FPI) records contain nothing else but a backup > * block (or multiple backup blocks). Every block reference must > * include a full-page image - otherwise there would be no point in > * this record. > > The above comment also needs to be updated? In short, no. In contrast to the third paragraph, the first paragraph should be thought that it is describing XLOG_FPI. However, actually it is not super obvious so it's better to make it clearer. Addition to that, it seems to me (yes, to *me*) somewhat confused between "block reference", "backup block" and "full-page image". So I'd like to adjust the paragraph as the following. > * XLOG_FPI records contain nothing else but one or more block > * references. Every block reference must include a full-page image > * regardless of the full_page_writes setting - otherwise there would > * be no point in this record. FYI (or, for the record), the first paragraph got to the current shape by the commit 9155580fd5, where XLOG_FPI was modified to be able to hold more than one block references. regards. -- Kyotaro Horiguchi NTT Open Source Software Center From cd54ae028878fac0a97baebf71e39cb9b518e885 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Date: Wed, 7 Jul 2021 15:34:41 +0900 Subject: [PATCH v3] Make FPI_FOR_HINT follow standard FPI emitting policy Commit 2c03216d831160bedd72d45f712601b6f7d03f1c changed XLogSaveBufferForHint to enforce FPI but the caller didn't intend to do so. Restore the intended behavior that FPI_FOR_HINT follows full_page_writes. --- src/backend/access/transam/xlog.c | 30 +++++++++++++++++-------- src/backend/access/transam/xloginsert.c | 5 +---- 2 files changed, 22 insertions(+), 13 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 2ee9515139..ff10fef0d3 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -10148,7 +10148,10 @@ xlog_redo(XLogReaderState *record) uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK; XLogRecPtr lsn = record->EndRecPtr; - /* in XLOG rmgr, backup blocks are only used by XLOG_FPI records */ + /* + * in XLOG rmgr, backup blocks are only used by XLOG_FPI and + * XLOG_FPI_FOR_HINT records. + */ Assert(info == XLOG_FPI || info == XLOG_FPI_FOR_HINT || !XLogRecHasAnyBlockRefs(record)); @@ -10356,25 +10359,34 @@ xlog_redo(XLogReaderState *record) else if (info == XLOG_FPI || info == XLOG_FPI_FOR_HINT) { /* - * Full-page image (FPI) records contain nothing else but a backup - * block (or multiple backup blocks). Every block reference must - * include a full-page image - otherwise there would be no point in - * this record. + * XLOG_FPI records contain nothing else but one or more block + * references. Every block reference must include a full-page image + * regardless of the full_page_writes setting - otherwise there would + * be no point in this record. * * No recovery conflicts are generated by these generic records - if a * resource manager needs to generate conflicts, it has to define a * separate WAL record type and redo routine. * * XLOG_FPI_FOR_HINT records are generated when a page needs to be - * WAL- logged because of a hint bit update. They are only generated - * when checksums are enabled. There is no difference in handling - * XLOG_FPI and XLOG_FPI_FOR_HINT records, they use a different info - * code just to distinguish them for statistics purposes. + * WAL-logged because of a hint bit update. They are only generated + * when checksums and/or wal_log_hints are enabled. The only difference + * in handling XLOG_FPI and XLOG_FPI_FOR_HINT records is that the + * latter is allowed to be missing actual block image. In that case the + * record is effectively a NOP record and should not even try to read + * the page from disk. */ for (uint8 block_id = 0; block_id <= record->max_block_id; block_id++) { Buffer buffer; + if (!XLogRecHasBlockImage(record, block_id)) + { + if (info == XLOG_FPI) + elog(ERROR, "missing full page image in XLOG_FPI record"); + continue; + } + if (XLogReadBufferForRedo(record, block_id, &buffer) != BLK_RESTORED) elog(ERROR, "unexpected XLogReadBufferForRedo result when restoring backup block"); UnlockReleaseBuffer(buffer); diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c index 3d2c9c3e8c..e596a0470a 100644 --- a/src/backend/access/transam/xloginsert.c +++ b/src/backend/access/transam/xloginsert.c @@ -287,8 +287,6 @@ XLogRegisterBlock(uint8 block_id, RelFileNode *rnode, ForkNumber forknum, { registered_buffer *regbuf; - /* This is currently only used to WAL-log a full-page image of a page */ - Assert(flags & REGBUF_FORCE_IMAGE); Assert(begininsert_called); if (block_id >= max_registered_block_id) @@ -995,7 +993,7 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std) if (lsn <= RedoRecPtr) { - int flags; + int flags = 0; PGAlignedBlock copied_buffer; char *origdata = (char *) BufferGetBlock(buffer); RelFileNode rnode; @@ -1022,7 +1020,6 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std) XLogBeginInsert(); - flags = REGBUF_FORCE_IMAGE; if (buffer_std) flags |= REGBUF_STANDARD; -- 2.27.0
On 2021/07/19 10:16, Kyotaro Horiguchi wrote: > At Sat, 17 Jul 2021 00:14:34 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in >> Thanks for updating the patch! It basically looks good to me. >> >> * Full-page image (FPI) records contain nothing else but a backup >> * block (or multiple backup blocks). Every block reference must >> * include a full-page image - otherwise there would be no point in >> * this record. >> >> The above comment also needs to be updated? > > In short, no. In contrast to the third paragraph, the first paragraph > should be thought that it is describing XLOG_FPI. However, actually > it is not super obvious so it's better to make it clearer. Addition to > that, it seems to me (yes, to *me*) somewhat confused between "block > reference", "backup block" and "full-page image". So I'd like to > adjust the paragraph as the following. Understood. Thanks for updating the patch! I slightly modified the comments and pushed the patch. Thanks! Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
At Wed, 21 Jul 2021 11:23:20 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > I slightly modified the comments and pushed the patch. Thanks! Thank you for commiting this! regards. -- Kyotaro Horiguchi NTT Open Source Software Center