Hi Andrey,
> 0. checkpointer is going to flush a heap buffer but waits on content lock
> 1. client is resetting PD_ALL_VISIBLE from page
> 2. postmaster is killed and command client to go down
> 3. client calls LWLockReleaseAll() at ProcKill() (?)
> 4. checkpointer flushes buffer with reset PG_ALL_VISIBLE that is not WAL-logged to standby
> 5. subsequent deletes do not log resetting this bit
> 6. deleted data is observable on standby with IndexOnlyScan
Thanks for investigating this in more detail. If this is indeed what
happens it is a violation of the "log before changing" approach. For
this reason we have PageHeaderData.pd_lsn for instance - to make sure
pages are evicted only *after* the record that changed it is written
to disk (because WAL records can't be applied to pages from the
future).
I guess the intent here could be to do an optimization of some sort
but the facts that 1. the instance can be killed at any time and 2.
there might be replicas - were not considered.
> Any idea how to fix this?
IMHO: logging the changes first, then allowing to evict the page.