Multiple full page writes in a single checkpoint? - Mailing list pgsql-hackers

From Bruce Momjian
Subject Multiple full page writes in a single checkpoint?
Date
Msg-id 20210203230556.GB11069@momjian.us
Whole thread Raw
Responses Re: Multiple full page writes in a single checkpoint?
List pgsql-hackers
Cluster file encryption plans to use the LSN and page number as the
nonce for heap/index pages.  I am looking into the use of a unique nonce
during hint bit changes.  (You need to use a new nonce for re-encrypting
a page that changes.)

log_hint_bits already gives us a unique nonce for the first hint bit
change on a page during a checkpoint, but we only encrypt on page write
to the file system, so I am researching if log_hint_bits will already
generate a unique LSN for every page write to the file system, even if
there are multiple hint-bit-caused page writes to the file system during
a single checkpoint.  (We already know this works for multiple
checkpoints.)

Our docs on full_page_writes states:

    When this parameter is on, the
    <productname>PostgreSQL</productname> server writes the entire
    content of each disk page to WAL during the first modification
    of that page after a checkpoint.

and wal_log_hints states:

    When this parameter is <literal>on</literal>, the
    <productname>PostgreSQL</productname> server writes the entire
    content of each disk page to WAL during the first modification of
    that page after a checkpoint, even for non-critical modifications
    of so-called hint bits.

However, imagine these steps:

1.  checkpoint starts
2.  page is modified by row or hint bit change
3.  page gets a new LSN and is marked as dirty
4.  page image is flushed to WAL
5.  pages is written to disk and marked as clean
6.  page is modified by data or hint bit change
7.  pages gets a new LSN and is marked as dirty
8.  page image is flushed to WAL
9.  checkpoint completes
10. pages is written to disk and marked as clean

Is the above case valid, and would it cause two full page writes to WAL?
More specifically, wouldn't it cause every write of the page to the file
system to use a new LSN?

If so, this means wal_log_hints is sufficient to guarantee a new nonce
for every page image, even for multiple hint bit changes and page writes
during a single checkpoint, and there is then no need for a hit bit
counter on the page --- the unique LSN does that for us.  I know
log_hint_bits was designed to fix torn pages, but it seems to also do
exactly what cluster file encryption needs.

If the above is all true, should we update the docs, READMEs, or C
comments about this?  I think the cluster file encryption patch would at
least need to document that we need to keep this behavior, because I
don't think log_hint_bits needs to behave this way for checksum
purposes because of the way full page writes are processed during crash
recovery.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: new heapcheck contrib module
Next
From: Andres Freund
Date:
Subject: Re: Multiple full page writes in a single checkpoint?