During a post-commit review of bc22dc0 "Get rid of WALBufMappingLock", I got
suspicious of this older AdvanceXLInsertBuffer() code:
/*
* If online backup is not in progress, mark the header to indicate
* that WAL records beginning in this page have removable backup
* blocks. This allows the WAL archiver to know whether it is safe to
* compress archived WAL data by transforming full-block records into
* the non-full-block format. It is sufficient to record this at the
* page level because we force a page switch (in fact a segment
* switch) when starting a backup, so the flag will be off before any
* records can be written during the backup. At the end of a backup,
* the last page will be marked as all unsafe when perhaps only part
* is unsafe, but at worst the archiver would miss the opportunity to
* compress a few records.
*/
if (Insert->runningBackups == 0)
NewPage->xlp_info |= XLP_BKP_REMOVABLE;
No lock stops runningBackups from changing while walwriter runs that code.
The attached test case reproduces setting XLP_BKP_REMOVABLE on a page that
should not have it:
2025-07-04 16:35:17.322 PDT [1496038] PANIC: initialization of 0/1000000 not reflecting backup started at 0/F00028
The test patch is for v17, but a similar test reaches the same race condition
in master (equivalent to v18 for $SUBJECT purposes, I bet).
Nothing in core reacts to XLP_BKP_REMOVABLE, and I'm not aware of non-core
code that reacts to it and is still maintained.
https://codesearch.debian.net/search?q=XLP_BKP_REMOVABLE&literal=1&perpkg=1
finds none, and 2016-2018 discussion
postgr.es/m/flat/579297F8.7020107%40anastigmatix.net found none. I wonder if
ceasing to set XLP_BKP_REMOVABLE for v19 is the best fix.
I don't plan to address this myself, so I'm leaving this here.
Thanks,
nm