Lack of PageSetLSN in heap_xlog_visible - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Lack of PageSetLSN in heap_xlog_visible
Date
Msg-id fed17dac-8cb8-4f5b-d462-1bb4908c029e@garret.ru
Whole thread Raw
Responses Re: Lack of PageSetLSN in heap_xlog_visible  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
Hi hackers!

heap_xlog_visible is not bumping heap page LSN when setting all-visible 
flag in it.
There is long comment explaining it:

         /*
          * We don't bump the LSN of the heap page when setting the 
visibility
          * map bit (unless checksums or wal_hint_bits is enabled, in which
          * case we must), because that would generate an unworkable 
volume of
          * full-page writes.  This exposes us to torn page hazards, but 
since
          * we're not inspecting the existing page contents in any way, we
          * don't care.
          *
          * However, all operations that clear the visibility map bit 
*do* bump
          * the LSN, and those operations will only be replayed if the 
XLOG LSN
          * follows the page LSN.  Thus, if the page LSN has advanced 
past our
          * XLOG record's LSN, we mustn't mark the page all-visible, because
          * the subsequent update won't be replayed to clear the flag.
          */

But it still not clear for me that not bumping LSN in this place is 
correct if wal_log_hints is set.
In this case we will have VM page with larger LSN than heap page, 
because visibilitymap_set
bumps LSN of VM page. It means that in theory after recovery we may have 
page marked as all-visible in VM,
but not having PD_ALL_VISIBLE  in page header. And it violates VM 
constraint:

  * When we *set* a visibility map during VACUUM, we must write WAL. 
This may
  * seem counterintuitive, since the bit is basically a hint: if it is 
clear,
  * it may still be the case that every tuple on the page is visible to all
  * transactions; we just don't know that for certain.  The difficulty 
is that
  * there are two bits which are typically set together: the 
PD_ALL_VISIBLE bit
  * on the page itself, and the visibility map bit.  If a crash occurs 
after the
  * visibility map page makes it to disk and before the updated heap 
page makes
  * it to disk, redo must set the bit on the heap page.  Otherwise, the next
  * insert, update, or delete on the heap page will fail to realize that the
  * visibility map bit must be cleared, possibly causing index-only scans to
  * return wrong answers.





pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Move backup-related code to xlogbackup.c/.h
Next
From: Bharath Rupireddy
Date:
Subject: Re: archive modules