Hello, Peter.
> AFAICT that's not true, at least not in any practical sense. See the
> comment in the middle of MarkBufferDirtyHint() that begins with "If we
> must not write WAL, due to a relfilenode-specific...", and see the
> "Checksums" section at the end of src/backend/storage/page/README. The
> last paragraph in the README is particularly relevant:
I have attached a TAP-test to demonstrate how easily checksums on standby and primary starts to differ. The test shows two different scenarios - for both heap and index (and the bit is placed in both standby and primary).
Yes, MarkBufferDirtyHint does not mark the page as dirty… So, hint bits on secondary could be easily lost. But it leaves the page dirty if it already is (or it could be marked dirty by WAL replay later). So, hints bits could be easily flushed and taken into account during checksum calculation on both - standby and primary.
> "We can set the hint, just not dirty the page as a result so the hint
> is lost when we evict the page or shutdown"
Yes, it is not allowed to mark a page as dirty because of hints on standby. Because we could achieve this:
CHECKPOINT
SET HINT BIT
TORN FLUSH + CRASH = BROKEN CHECKSUM, SERVER FAULT
But this scenario is totally fine:
CHECKPOINT
FPI (page is still dirty)
SET HINT BIT
TORN FLUSH + CRASH = PAGE IS RECOVERED, EVERYTHING IS OK
And, as result, this is fine too:
CHECKPOINT
FPI WITH MASKED LP_DEAD (page is still dirty)
SET HINT BIT
TORN FLUSH + CRASH = PAGE IS RECOVERED + LP_DEAD MASKED AGAIN IF STANDBY
So, my point here - it is fine to mask LP_DEAD bits during replay because they are already different on standby and primary. And it is fine to set and flush hint bits (and LP_DEADs) on standby because they already could be easily flushed (just need to consider minRecovertPoint and, probably, OldesXmin from primary in case of LP_DEAD to make promotion easily).
>> And `btree_mask` (and other mask functions) already used for consistency checks to exclude LP_DEAD.
> I don't see how that is relevant. btree_mask() is only used by
> wal_consistency_checking, which is mostly just for Postgres hackers.
I was thinking about the possibility to reuse these functions in masking during replay.
Thanks,
Michail.