Hi,
On 2021-02-03 18:05:56 -0500, Bruce Momjian wrote:
> log_hint_bits already gives us a unique nonce for the first hint bit
> change on a page during a checkpoint, but we only encrypt on page write
> to the file system, so I am researching if log_hint_bits will already
> generate a unique LSN for every page write to the file system, even if
> there are multiple hint-bit-caused page writes to the file system during
> a single checkpoint. (We already know this works for multiple
> checkpoints.)
No, it won't:
> However, imagine these steps:
>
> 1. checkpoint starts
> 2. page is modified by row or hint bit change
> 3. page gets a new LSN and is marked as dirty
> 4. page image is flushed to WAL
> 5. pages is written to disk and marked as clean
> 6. page is modified by data or hint bit change
> 7. pages gets a new LSN and is marked as dirty
> 8. page image is flushed to WAL
> 9. checkpoint completes
> 10. pages is written to disk and marked as clean
>
> Is the above case valid, and would it cause two full page writes to WAL?
> More specifically, wouldn't it cause every write of the page to the file
> system to use a new LSN?
No. 8) won't happen. Look e.g. at XLogSaveBufferForHint():
/*
* Update RedoRecPtr so that we can make the right decision
*/
RedoRecPtr = GetRedoRecPtr();
/*
* We assume page LSN is first data on *every* page that can be passed to
* XLogInsert, whether it has the standard page layout or not. Since we're
* only holding a share-lock on the page, we must take the buffer header
* lock when we look at the LSN.
*/
lsn = BufferGetLSNAtomic(buffer);
if (lsn <= RedoRecPtr)
/* wal log hint bit */
The RedoRecPtr is determined at 1. and doesn't change between 4) and
8). The LSN for 4) has to be *past* the RedoRecPtr from 1). Therefore we
don't do another FPW.
Changing this is *completely* infeasible. In a lot of workloads it'd
cause a *massive* explosion of WAL volume. Like quadratically. You'll
need to find another way to generate a nonce.
In the non-hint bit case you'll automatically have a higher LSN in 7/8
though. So you won't need to do anything about getting a higher nonce.
For the hint bit case in 8 you could consider just using any LSN generated
after 4 (preferrably already flushed to disk) - but that seems somewhat
ugly from a debuggability POV :/. Alternatively you could just create
tiny WAL record to get a new LSN, but that'll sometimes trigger new WAL
flushes when the pages are dirtied.
Greetings,
Andres Freund