Re: free space map and visibility map - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: free space map and visibility map
Date
Msg-id CAD21AoAnCw8y37dJSEhdbpue5H1v5FVLKUA_uh2MhZe331HNyw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] free space map and visibility map  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: free space map and visibility map  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
On Mon, Mar 27, 2017 at 2:38 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> At Sat, 25 Mar 2017 19:53:47 -0700, Jeff Janes <jeff.janes@gmail.com> wrote in
<CAMkU=1x3+DPsfSU+AF7WAzAVugmEhUA2+jNf7SuAL-MSKQ+_KA@mail.gmail.com>
>> On Thu, Mar 23, 2017 at 7:01 PM, Kyotaro HORIGUCHI <
>> horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>
>> > At Wed, 22 Mar 2017 02:15:26 +0900, Masahiko Sawada <sawada.mshk@gmail.com>
>> > wrote in <CAD21AoAq2YHs3MvSky6TxX1oKqyiPwUphdSa2sJCab_V4ci4VQ@mail.
>> > gmail.com>
>> > > On Mon, Mar 20, 2017 at 11:28 PM, Robert Haas <robertmhaas@gmail.com>
>> > wrote:
>> > > > On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes <jeff.janes@gmail.com>
>> > wrote:
>> > > >> Isn't HEAP2_CLEAN only issued before an intended HOT update?  (Which
>> > then
>> > > >> can't leave the block as all visible or all frozen).  I think the
>> > issue is
>> > > >> here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE.  Am I reading this
>> > correctly,
>> > > >> that neither of those ever update the FSM, regardless of FPI?
>> > > >
>> > > > Yes, updates to the FSM are never logged.  Forcing replay of
>> > > > HEAP2_FREEZE_PAGE to update the FSM might be a good idea.
>> > > >
>> > >
>> > > I think I was missing something. I imaged your situation is that FPI
>> > > is replayed during crash recovery after the crashed server vacuums the
>> > > page and marked it as all-frozen. But this situation is also resolved
>> > > by that solution.
>> >
>> > # HEAP2_CLEAN is issued in lazy_vacuum_page
>> >
>> > It will work but I'm not sure it is right direction for
>> > HEAP2_FREEZE_PAGE to touch FSM.
>> >
>> > As Masahiko said, the situation must be created by HEAP2_VISIBLE
>> > without preceding HEAP2_CLEAN, or with HEAP2_CLEAN with FPI. I
>> > think only the latter can happen. The comment in heap_xlog_clean
>> > below is right generally but if a page filled with tuples becomes
>> > almost empty and freezable by this cleanup, a problematic
>> > situation like this occurs.
>> >
>>
>> I now think this is not the cause of the problem I am seeing.  I made the
>> replay of FREEZE_PAGE update the FSM (both with and without FPI), but that
>> did not fix it.  With frequent crashes, it still accumulated a lot of
>> frozen and empty (but full according to FSM) pages.  I also set up replica
>> streaming and turned off crashing on the master, and the FSM of the replica
>> stays accurate, so the WAL stream and replay logic is doing the right thing
>> on the replica.
>>
>> I now think the dirtied FSM pages are somehow not getting marked as dirty,
>> or are getting marked as dirty but somehow the checkpoint is skipping
>> them.  It looks like MarkBufferDirtyHint does do some operations unlocked
>> which could explain lost update, but it seems unlikely that that would
>> happen often enough to see the amount of lost updates I am seeing.
>
> Hmm.. clearing dirty hint seems already protected by exclusive
> lock. And I think it can occur without lock failure.
>
> Other than by FPI, FSM update is omitted when record LSN is older
> than page LSN. If heap page is evicted but FSM page is not after
> vacuuming and before power cut, replaying HEAP2_CLEAN skips
> update of FSM even though FPI is not attached. Of course this
> cannot occur on standby. One FSM page covers as many heap pages
> as about 4k, so FSM can stay far longer than heap pages.
>
> ALL_FROZEN is set with other than HEAP2_FREEZE_PAGE. When a page
> is already empty when entering lazy_sacn_heap, or a page of
> non-indexed heap is empitied in lazy_scan_heap, HRAP2_VISIBLE is
> issued to set ALL_FROZEN.
>
> Perhaps the problem will be fixed by forcing heap_xlog_visible to
> update FSM (addition to FREEZE_PAGE), or the same in
> heap_xlog_clean. (As menthined in the previous mail, I prefer the
> latter.)

Maybe it's enough just to make both heap_xlog_visible and
heap_xlog_freeze_page forcibly updates the FSM (heap_xlog_freeze_page
might be unnecessary). Because the problem happens on the page that is
full according to FSM but is empty and marked as all-visible or
all-frozen. Though heap_xlog_clean loads the heap page to the memory
for redo operation, forcing heap_xlog_clean to update FSM might be
overkill for this solution. Because it can happen on every pages that
are not marked as neither all-visible nor all-frozen. Basically 100%
accuracy of FSM is not required. On the other hand, if we makes
heap_xlog_visible updates the FSM, it requires to load both heap page
and FSM page, which can also be overhead. Another idea is, we can
heap_xlog_visible to have the freespace of corresponding heap page,
and then update FSM during recovery.

>
>> > > /*
>> > >  * Update the FSM as well.
>> > >  *
>> > >  * XXX: Don't do this if the page was restored from full page image. We
>> > >  * don't bother to update the FSM in that case, it doesn't need to be
>> > >  * totally accurate anyway.
>> > >  */
>> >
>>
>> What does that save us?  If we restored from FPI, we already have the block
>> in memory (we don't need to see the old version, just the new one), so it
>> doesn't save us a random read IO.
>
> Updates on random pages can cause visits to many unloaded FSM
> pages. It may be intending to avoid that. Or, especially for
> INSERT, successive operations tends to occur on the same heap
> page, the complexity of calculating FSM wouldn't be so small
> relatively. FMS tells a lie that the page has spare space after
> that but it doesn't harm. But I think that the things are
> different for operations that increments free space.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: PATCH: Batch/pipelining support for libpq
Next
From: Simon Riggs
Date:
Subject: Re: Monitoring roles patch