At Wed, 22 Mar 2017 02:15:26 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoAq2YHs3MvSky6TxX1oKqyiPwUphdSa2sJCab_V4ci4VQ@mail.gmail.com> > On Mon, Mar 20, 2017 at 11:28 PM, Robert Haas <robertmhaas@gmail.com> wrote: > > On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > >> Isn't HEAP2_CLEAN only issued before an intended HOT update? (Which then > >> can't leave the block as all visible or all frozen). I think the issue is > >> here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE. Am I reading this correctly, > >> that neither of those ever update the FSM, regardless of FPI? > > > > Yes, updates to the FSM are never logged. Forcing replay of > > HEAP2_FREEZE_PAGE to update the FSM might be a good idea. > > > > I think I was missing something. I imaged your situation is that FPI > is replayed during crash recovery after the crashed server vacuums the > page and marked it as all-frozen. But this situation is also resolved > by that solution.
# HEAP2_CLEAN is issued in lazy_vacuum_page
It will work but I'm not sure it is right direction for HEAP2_FREEZE_PAGE to touch FSM.
As Masahiko said, the situation must be created by HEAP2_VISIBLE without preceding HEAP2_CLEAN, or with HEAP2_CLEAN with FPI. I think only the latter can happen. The comment in heap_xlog_clean below is right generally but if a page filled with tuples becomes almost empty and freezable by this cleanup, a problematic situation like this occurs.
I now think this is not the cause of the problem I am seeing. I made the replay of FREEZE_PAGE update the FSM (both with and without FPI), but that did not fix it. With frequent crashes, it still accumulated a lot of frozen and empty (but full according to FSM) pages. I also set up replica streaming and turned off crashing on the master, and the FSM of the replica stays accurate, so the WAL stream and replay logic is doing the right thing on the replica.
I now think the dirtied FSM pages are somehow not getting marked as dirty, or are getting marked as dirty but somehow the checkpoint is skipping them. It looks like MarkBufferDirtyHint does do some operations unlocked which could explain lost update, but it seems unlikely that that would happen often enough to see the amount of lost updates I am seeing.
> /* > * Update the FSM as well. > * > * XXX: Don't do this if the page was restored from full page image. We > * don't bother to update the FSM in that case, it doesn't need to be > * totally accurate anyway. > */
What does that save us? If we restored from FPI, we already have the block in memory (we don't need to see the old version, just the new one), so it doesn't save us a random read IO.