Fwd: free space map and visibility map - Mailing list pgsql-hackers

From Jeff Janes
Subject Fwd: free space map and visibility map
Date
Msg-id CAMkU=1zKfqGePWG+qqKthmWERBn8UAA2_9Sb+qTUUREhFkqLCA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] free space map and visibility map  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: free space map and visibility map  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
I accidentally sent this off-list, sending to the list now:

On Sun, Mar 26, 2017 at 10:38 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
At Sat, 25 Mar 2017 19:53:47 -0700, Jeff Janes <jeff.janes@gmail.com> wrote in <CAMkU=1x3+DPsfSU+AF7WAzAVugmEhUA2+jNf7SuAL-MSKQ+_KA@mail.gmail.com>
> On Thu, Mar 23, 2017 at 7:01 PM, Kyotaro HORIGUCHI <
> horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>
> > At Wed, 22 Mar 2017 02:15:26 +0900, Masahiko Sawada <sawada.mshk@gmail.com>
> > wrote in <CAD21AoAq2YHs3MvSky6TxX1oKqyiPwUphdSa2sJCab_V4ci4VQ@mail.
> > gmail.com>
> > > On Mon, Mar 20, 2017 at 11:28 PM, Robert Haas <robertmhaas@gmail.com>
> > wrote:
> > > > On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes <jeff.janes@gmail.com>
> > wrote:
> > > >> Isn't HEAP2_CLEAN only issued before an intended HOT update?  (Which
> > then
> > > >> can't leave the block as all visible or all frozen).  I think the
> > issue is
> > > >> here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE.  Am I reading this
> > correctly,
> > > >> that neither of those ever update the FSM, regardless of FPI?
> > > >
> > > > Yes, updates to the FSM are never logged.  Forcing replay of
> > > > HEAP2_FREEZE_PAGE to update the FSM might be a good idea.
> > > >
> > >
> > > I think I was missing something. I imaged your situation is that FPI
> > > is replayed during crash recovery after the crashed server vacuums the
> > > page and marked it as all-frozen. But this situation is also resolved
> > > by that solution.
> >
> > # HEAP2_CLEAN is issued in lazy_vacuum_page
> >
> > It will work but I'm not sure it is right direction for
> > HEAP2_FREEZE_PAGE to touch FSM.
> >
> > As Masahiko said, the situation must be created by HEAP2_VISIBLE
> > without preceding HEAP2_CLEAN, or with HEAP2_CLEAN with FPI. I
> > think only the latter can happen. The comment in heap_xlog_clean
> > below is right generally but if a page filled with tuples becomes
> > almost empty and freezable by this cleanup, a problematic
> > situation like this occurs.
> >
>
> I now think this is not the cause of the problem I am seeing.  I made the
> replay of FREEZE_PAGE update the FSM (both with and without FPI), but that
> did not fix it.  With frequent crashes, it still accumulated a lot of
> frozen and empty (but full according to FSM) pages.  I also set up replica
> streaming and turned off crashing on the master, and the FSM of the replica
> stays accurate, so the WAL stream and replay logic is doing the right thing
> on the replica.
>
> I now think the dirtied FSM pages are somehow not getting marked as dirty,
> or are getting marked as dirty but somehow the checkpoint is skipping
> them.  It looks like MarkBufferDirtyHint does do some operations unlocked
> which could explain lost update, but it seems unlikely that that would
> happen often enough to see the amount of lost updates I am seeing.

Hmm.. clearing dirty hint seems already protected by exclusive
lock. And I think it can occur without lock failure.

Other than by FPI, FSM update is omitted when record LSN is older
than page LSN. If heap page is evicted but FSM page is not after
vacuuming and before power cut, replaying HEAP2_CLEAN skips
update of FSM even though FPI is not attached. Of course this
cannot occur on standby. One FSM page covers as many heap pages
as about 4k, so FSM can stay far longer than heap pages.

This corresponds to action == BLK_DONE case, right?
 

ALL_FROZEN is set with other than HEAP2_FREEZE_PAGE. When a page
is already empty when entering lazy_sacn_heap, or a page of
non-indexed heap is empitied in lazy_scan_heap, HRAP2_VISIBLE is
issued to set ALL_FROZEN.

Perhaps the problem will be fixed by forcing heap_xlog_visible to
update FSM (addition to FREEZE_PAGE), or the same in
heap_xlog_clean. (As menthined in the previous mail, I prefer the
latter.)

When I make heap_xlog_clean update FSM even on BLK_RESTORED (but not on BLK_DONE), it solves the problem I was seeing.  Which still leaves me wondering why the problem doesn't show up on the standby because, unlike BLK_DONE, BLK_RESTORED should have the same issue on standby as it does on a recovering master, shouldn't it? Maybe the difference is that the existence a replication slot delays the clean up in a way that causes a different pattern of WAL records.


> > > /*
> > >  * Update the FSM as well.
> > >  *
> > >  * XXX: Don't do this if the page was restored from full page image. We
> > >  * don't bother to update the FSM in that case, it doesn't need to be
> > >  * totally accurate anyway.
> > >  */
> >
>
> What does that save us?  If we restored from FPI, we already have the block
> in memory (we don't need to see the old version, just the new one), so it
> doesn't save us a random read IO.

Updates on random pages can cause visits to many unloaded FSM
pages. It may be intending to avoid that.

But I think that that would be no worse for BLK_RESTORED than it is for BLK_NEEDS_REDO.  Why optimize only one of the cases, if it is worth optimizing either one?

Cheers,

Jeff

Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: O(1) DSM handle operations
Next
From: Robert Haas
Date:
Subject: Re: Removing binaries