Thread: [HACKERS] free space map and visibility map

[HACKERS] free space map and visibility map

From

Jeff Janes

Date:

17 March 2017, 21:37:50

With some intensive crash-recovery testing, I've run into a situation where I get some bad table bloat. There will be large swaths of the table which are empty (all results from heap_page_items other than lp are either zero or NULL), but have zero available space in the fsm, and are marked as all-visible and all-frozen in the vm.

I guess it is a result of a crash causing updates to the fsm to be lost. Then due to the (crash-recovered) visibility map showing them as all visible and all frozen, vacuum never touches the pages again, so the fsm never gets corrected.

'VACUUM (DISABLE_PAGE_SKIPPING) foo;' does fix it, but that seems to be the only thing that will.

Is there a way to improve this, short of making updates to the fsm be a wal-logged operation?

It is probably not a very pressing issue, as crashes are normally pretty rare, I would hope. But it seems worth improving if there is a good way to do so.

Cheers,

Jeff

Re: [HACKERS] free space map and visibility map

From

Masahiko Sawada

Date:

19 March 2017, 00:09:42

On Fri, Mar 17, 2017 at 9:37 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> With some intensive crash-recovery testing, I've run into a situation where
> I get some bad table bloat.  There will be large swaths of the table which
> are empty (all results from heap_page_items other than lp are either zero or
> NULL), but have zero available space in the fsm, and are marked as
> all-visible and all-frozen in the vm.
>
> I guess it is a result of a crash causing updates to the fsm to be lost.
> Then due to the (crash-recovered) visibility map showing them as all visible
> and all frozen, vacuum never touches the pages again, so the fsm never gets
> corrected.

I guess that this happens only if heap_xlog_clean applies FPI. Right?
Updating fsm can be lost but fsm is updated by replaying HEAP2_CLEAN
record during crash recovery.

>
> 'VACUUM (DISABLE_PAGE_SKIPPING) foo;'   does fix it, but that seems to be
> the only thing that will.

If the above is correct, another one option is to allow
heap_xlog_clean to update fsm even when appling FPI.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [HACKERS] free space map and visibility map

From

Jeff Janes

Date:

19 March 2017, 00:42:09

On Sat, Mar 18, 2017 at 2:09 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Mar 17, 2017 at 9:37 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> With some intensive crash-recovery testing, I've run into a situation where
> I get some bad table bloat. There will be large swaths of the table which
> are empty (all results from heap_page_items other than lp are either zero or
> NULL), but have zero available space in the fsm, and are marked as
> all-visible and all-frozen in the vm.
>
> I guess it is a result of a crash causing updates to the fsm to be lost.
> Then due to the (crash-recovered) visibility map showing them as all visible
> and all frozen, vacuum never touches the pages again, so the fsm never gets
> corrected.

I guess that this happens only if heap_xlog_clean applies FPI. Right?
Updating fsm can be lost but fsm is updated by replaying HEAP2_CLEAN
record during crash recovery.

Isn't HEAP2_CLEAN only issued before an intended HOT update? (Which then can't leave the block as all visible or all frozen). I think the issue is here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE. Am I reading this correctly, that neither of those ever update the FSM, regardless of FPI?

I don't know how to test the issue of which record is most responsible. I could turn off FPW globally and see what happens, with some tweaking to my testing harness.

Cheers,

Jeff

Re: [HACKERS] free space map and visibility map

From

Robert Haas

Date:

20 March 2017, 17:28:30

On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> Isn't HEAP2_CLEAN only issued before an intended HOT update?  (Which then
> can't leave the block as all visible or all frozen).  I think the issue is
> here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE.  Am I reading this correctly,
> that neither of those ever update the FSM, regardless of FPI?

Yes, updates to the FSM are never logged.  Forcing replay of
HEAP2_FREEZE_PAGE to update the FSM might be a good idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] free space map and visibility map

From

Masahiko Sawada

Date:

21 March 2017, 20:15:26

On Mon, Mar 20, 2017 at 11:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> Isn't HEAP2_CLEAN only issued before an intended HOT update?  (Which then
>> can't leave the block as all visible or all frozen).  I think the issue is
>> here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE.  Am I reading this correctly,
>> that neither of those ever update the FSM, regardless of FPI?
>
> Yes, updates to the FSM are never logged.  Forcing replay of
> HEAP2_FREEZE_PAGE to update the FSM might be a good idea.
>

I think I was missing something. I imaged your situation is that FPI
is replayed during crash recovery after the crashed server vacuums the
page and marked it as all-frozen. But this situation is also resolved
by that solution.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [HACKERS] free space map and visibility map

From

Kyotaro HORIGUCHI

Date:

24 March 2017, 05:01:28

At Wed, 22 Mar 2017 02:15:26 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoAq2YHs3MvSky6TxX1oKqyiPwUphdSa2sJCab_V4ci4VQ@mail.gmail.com>
> On Mon, Mar 20, 2017 at 11:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> >> Isn't HEAP2_CLEAN only issued before an intended HOT update?  (Which then
> >> can't leave the block as all visible or all frozen).  I think the issue is
> >> here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE.  Am I reading this correctly,
> >> that neither of those ever update the FSM, regardless of FPI?
> >
> > Yes, updates to the FSM are never logged.  Forcing replay of
> > HEAP2_FREEZE_PAGE to update the FSM might be a good idea.
> >
> 
> I think I was missing something. I imaged your situation is that FPI
> is replayed during crash recovery after the crashed server vacuums the
> page and marked it as all-frozen. But this situation is also resolved
> by that solution.

# HEAP2_CLEAN is issued in lazy_vacuum_page

It will work but I'm not sure it is right direction for
HEAP2_FREEZE_PAGE to touch FSM.

As Masahiko said, the situation must be created by HEAP2_VISIBLE
without preceding HEAP2_CLEAN, or with HEAP2_CLEAN with FPI. I
think only the latter can happen. The comment in heap_xlog_clean
below is right generally but if a page filled with tuples becomes
almost empty and freezable by this cleanup, a problematic
situation like this occurs.

> /*
>  * Update the FSM as well.
>  *
>  * XXX: Don't do this if the page was restored from full page image. We
>  * don't bother to update the FSM in that case, it doesn't need to be
>  * totally accurate anyway.
>  */
> if (action == BLK_NEEDS_REDO)
>     XLogRecordPageWithFreeSpace(rnode, blkno, freespace);

HEAP_INSERT/HEAP2_MULTI_INSERT/UPDATE does the similar. All of
these reduces freespace but HEAP2_CLEAN increases. HEAP2_CLEAN
occurs infrequently than the three. So I suppose HEAP2_CLEAN may
always update FSM.

Even if the page is not frozen, the similar situation is made
with just ALL_VISIBLE. Without any updates on the page, freespace
information for the page won't be corrected until the next
freezing(or 'aggressive') vacuum occurs.

From this point of view, HEAP2_FREEZE_PAGE is not responsible for
updating FSM. But if we see that always updating FSM on
HEAP2_CLEAN is too much, HEAP2_FREEZE_PAGE would be the next way
to go.

(I don't understand the reason for skipping updating FSM only forFPI. This seems introduced by f8f42279)

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [HACKERS] free space map and visibility map

From

Masahiko Sawada

Date:

24 March 2017, 06:58:56

On Fri, Mar 24, 2017 at 11:01 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> At Wed, 22 Mar 2017 02:15:26 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoAq2YHs3MvSky6TxX1oKqyiPwUphdSa2sJCab_V4ci4VQ@mail.gmail.com>
>> On Mon, Mar 20, 2017 at 11:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> > On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> >> Isn't HEAP2_CLEAN only issued before an intended HOT update?  (Which then
>> >> can't leave the block as all visible or all frozen).  I think the issue is
>> >> here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE.  Am I reading this correctly,
>> >> that neither of those ever update the FSM, regardless of FPI?
>> >
>> > Yes, updates to the FSM are never logged.  Forcing replay of
>> > HEAP2_FREEZE_PAGE to update the FSM might be a good idea.
>> >
>>
>> I think I was missing something. I imaged your situation is that FPI
>> is replayed during crash recovery after the crashed server vacuums the
>> page and marked it as all-frozen. But this situation is also resolved
>> by that solution.
>
> # HEAP2_CLEAN is issued in lazy_vacuum_page
>
> It will work but I'm not sure it is right direction for
> HEAP2_FREEZE_PAGE to touch FSM.
>
> As Masahiko said, the situation must be created by HEAP2_VISIBLE
> without preceding HEAP2_CLEAN, or with HEAP2_CLEAN with FPI. I
> think only the latter can happen. The comment in heap_xlog_clean
> below is right generally but if a page filled with tuples becomes
> almost empty and freezable by this cleanup, a problematic
> situation like this occurs.
>
>> /*
>>  * Update the FSM as well.
>>  *
>>  * XXX: Don't do this if the page was restored from full page image. We
>>  * don't bother to update the FSM in that case, it doesn't need to be
>>  * totally accurate anyway.
>>  */
>> if (action == BLK_NEEDS_REDO)
>>       XLogRecordPageWithFreeSpace(rnode, blkno, freespace);
>
> HEAP_INSERT/HEAP2_MULTI_INSERT/UPDATE does the similar. All of
> these reduces freespace but HEAP2_CLEAN increases. HEAP2_CLEAN
> occurs infrequently than the three. So I suppose HEAP2_CLEAN may
> always update FSM.
>
> Even if the page is not frozen, the similar situation is made
> with just ALL_VISIBLE. Without any updates on the page, freespace
> information for the page won't be corrected until the next
> freezing(or 'aggressive') vacuum occurs.
>
> From this point of view, HEAP2_FREEZE_PAGE is not responsible for
> updating FSM. But if we see that always updating FSM on
> HEAP2_CLEAN is too much, HEAP2_FREEZE_PAGE would be the next way
> to go.
>
> (I don't understand the reason for skipping updating FSM only for
>  FPI. This seems introduced by f8f42279)
>

This code is introduced by e9816533e39be464227b748ee5eeb3d9f688cd76
and discussion is here[1].
ISTM that this code is implemented based on that all page will be
vacuumed eventually. But now that we have freeze map and the pages
could never be vacuum, it would be worth to consider that behavior
again.

[1] https://www.postgresql.org/message-id/flat/49072021.7010801%40enterprisedb.com#49072021.7010801@enterprisedb.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: free space map and visibility map

From

Jeff Janes

Date:

26 March 2017, 05:53:47

On Thu, Mar 23, 2017 at 7:01 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:

At Wed, 22 Mar 2017 02:15:26 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoAq2YHs3MvSky6TxX1oKqyiPwUphdSa2sJCab_V4ci4VQ@mail.gmail.com>
> On Mon, Mar 20, 2017 at 11:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> >> Isn't HEAP2_CLEAN only issued before an intended HOT update? (Which then
> >> can't leave the block as all visible or all frozen). I think the issue is
> >> here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE. Am I reading this correctly,
> >> that neither of those ever update the FSM, regardless of FPI?
> >
> > Yes, updates to the FSM are never logged. Forcing replay of
> > HEAP2_FREEZE_PAGE to update the FSM might be a good idea.
> >
>
> I think I was missing something. I imaged your situation is that FPI
> is replayed during crash recovery after the crashed server vacuums the
> page and marked it as all-frozen. But this situation is also resolved
> by that solution.

# HEAP2_CLEAN is issued in lazy_vacuum_page

It will work but I'm not sure it is right direction for
HEAP2_FREEZE_PAGE to touch FSM.

As Masahiko said, the situation must be created by HEAP2_VISIBLE
without preceding HEAP2_CLEAN, or with HEAP2_CLEAN with FPI. I
think only the latter can happen. The comment in heap_xlog_clean
below is right generally but if a page filled with tuples becomes
almost empty and freezable by this cleanup, a problematic
situation like this occurs.

I now think this is not the cause of the problem I am seeing. I made the replay of FREEZE_PAGE update the FSM (both with and without FPI), but that did not fix it. With frequent crashes, it still accumulated a lot of frozen and empty (but full according to FSM) pages. I also set up replica streaming and turned off crashing on the master, and the FSM of the replica stays accurate, so the WAL stream and replay logic is doing the right thing on the replica.

I now think the dirtied FSM pages are somehow not getting marked as dirty, or are getting marked as dirty but somehow the checkpoint is skipping them. It looks like MarkBufferDirtyHint does do some operations unlocked which could explain lost update, but it seems unlikely that that would happen often enough to see the amount of lost updates I am seeing.

> /*
> * Update the FSM as well.
> *
> * XXX: Don't do this if the page was restored from full page image. We
> * don't bother to update the FSM in that case, it doesn't need to be
> * totally accurate anyway.
> */

What does that save us? If we restored from FPI, we already have the block in memory (we don't need to see the old version, just the new one), so it doesn't save us a random read IO.

Cheers,

Jeff

Re: free space map and visibility map

From

Kyotaro HORIGUCHI

Date:

27 March 2017, 08:38:27

At Sat, 25 Mar 2017 19:53:47 -0700, Jeff Janes <jeff.janes@gmail.com> wrote in
<CAMkU=1x3+DPsfSU+AF7WAzAVugmEhUA2+jNf7SuAL-MSKQ+_KA@mail.gmail.com>
> On Thu, Mar 23, 2017 at 7:01 PM, Kyotaro HORIGUCHI <
> horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> 
> > At Wed, 22 Mar 2017 02:15:26 +0900, Masahiko Sawada <sawada.mshk@gmail.com>
> > wrote in <CAD21AoAq2YHs3MvSky6TxX1oKqyiPwUphdSa2sJCab_V4ci4VQ@mail.
> > gmail.com>
> > > On Mon, Mar 20, 2017 at 11:28 PM, Robert Haas <robertmhaas@gmail.com>
> > wrote:
> > > > On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes <jeff.janes@gmail.com>
> > wrote:
> > > >> Isn't HEAP2_CLEAN only issued before an intended HOT update?  (Which
> > then
> > > >> can't leave the block as all visible or all frozen).  I think the
> > issue is
> > > >> here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE.  Am I reading this
> > correctly,
> > > >> that neither of those ever update the FSM, regardless of FPI?
> > > >
> > > > Yes, updates to the FSM are never logged.  Forcing replay of
> > > > HEAP2_FREEZE_PAGE to update the FSM might be a good idea.
> > > >
> > >
> > > I think I was missing something. I imaged your situation is that FPI
> > > is replayed during crash recovery after the crashed server vacuums the
> > > page and marked it as all-frozen. But this situation is also resolved
> > > by that solution.
> >
> > # HEAP2_CLEAN is issued in lazy_vacuum_page
> >
> > It will work but I'm not sure it is right direction for
> > HEAP2_FREEZE_PAGE to touch FSM.
> >
> > As Masahiko said, the situation must be created by HEAP2_VISIBLE
> > without preceding HEAP2_CLEAN, or with HEAP2_CLEAN with FPI. I
> > think only the latter can happen. The comment in heap_xlog_clean
> > below is right generally but if a page filled with tuples becomes
> > almost empty and freezable by this cleanup, a problematic
> > situation like this occurs.
> >
> 
> I now think this is not the cause of the problem I am seeing.  I made the
> replay of FREEZE_PAGE update the FSM (both with and without FPI), but that
> did not fix it.  With frequent crashes, it still accumulated a lot of
> frozen and empty (but full according to FSM) pages.  I also set up replica
> streaming and turned off crashing on the master, and the FSM of the replica
> stays accurate, so the WAL stream and replay logic is doing the right thing
> on the replica.
> 
> I now think the dirtied FSM pages are somehow not getting marked as dirty,
> or are getting marked as dirty but somehow the checkpoint is skipping
> them.  It looks like MarkBufferDirtyHint does do some operations unlocked
> which could explain lost update, but it seems unlikely that that would
> happen often enough to see the amount of lost updates I am seeing.

Hmm.. clearing dirty hint seems already protected by exclusive
lock. And I think it can occur without lock failure.

Other than by FPI, FSM update is omitted when record LSN is older
than page LSN. If heap page is evicted but FSM page is not after
vacuuming and before power cut, replaying HEAP2_CLEAN skips
update of FSM even though FPI is not attached. Of course this
cannot occur on standby. One FSM page covers as many heap pages
as about 4k, so FSM can stay far longer than heap pages.

ALL_FROZEN is set with other than HEAP2_FREEZE_PAGE. When a page
is already empty when entering lazy_sacn_heap, or a page of
non-indexed heap is empitied in lazy_scan_heap, HRAP2_VISIBLE is
issued to set ALL_FROZEN.

Perhaps the problem will be fixed by forcing heap_xlog_visible to
update FSM (addition to FREEZE_PAGE), or the same in
heap_xlog_clean. (As menthined in the previous mail, I prefer the
latter.)

> > > /*
> > >  * Update the FSM as well.
> > >  *
> > >  * XXX: Don't do this if the page was restored from full page image. We
> > >  * don't bother to update the FSM in that case, it doesn't need to be
> > >  * totally accurate anyway.
> > >  */
> >
> 
> What does that save us?  If we restored from FPI, we already have the block
> in memory (we don't need to see the old version, just the new one), so it
> doesn't save us a random read IO.

Updates on random pages can cause visits to many unloaded FSM
pages. It may be intending to avoid that. Or, especially for
INSERT, successive operations tends to occur on the same heap
page, the complexity of calculating FSM wouldn't be so small
relatively. FMS tells a lie that the page has spare space after
that but it doesn't harm. But I think that the things are
different for operations that increments free space.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: free space map and visibility map

From

Masahiko Sawada

Date:

27 March 2017, 10:49:08

On Mon, Mar 27, 2017 at 2:38 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> At Sat, 25 Mar 2017 19:53:47 -0700, Jeff Janes <jeff.janes@gmail.com> wrote in
<CAMkU=1x3+DPsfSU+AF7WAzAVugmEhUA2+jNf7SuAL-MSKQ+_KA@mail.gmail.com>
>> On Thu, Mar 23, 2017 at 7:01 PM, Kyotaro HORIGUCHI <
>> horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>
>> > At Wed, 22 Mar 2017 02:15:26 +0900, Masahiko Sawada <sawada.mshk@gmail.com>
>> > wrote in <CAD21AoAq2YHs3MvSky6TxX1oKqyiPwUphdSa2sJCab_V4ci4VQ@mail.
>> > gmail.com>
>> > > On Mon, Mar 20, 2017 at 11:28 PM, Robert Haas <robertmhaas@gmail.com>
>> > wrote:
>> > > > On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes <jeff.janes@gmail.com>
>> > wrote:
>> > > >> Isn't HEAP2_CLEAN only issued before an intended HOT update?  (Which
>> > then
>> > > >> can't leave the block as all visible or all frozen).  I think the
>> > issue is
>> > > >> here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE.  Am I reading this
>> > correctly,
>> > > >> that neither of those ever update the FSM, regardless of FPI?
>> > > >
>> > > > Yes, updates to the FSM are never logged.  Forcing replay of
>> > > > HEAP2_FREEZE_PAGE to update the FSM might be a good idea.
>> > > >
>> > >
>> > > I think I was missing something. I imaged your situation is that FPI
>> > > is replayed during crash recovery after the crashed server vacuums the
>> > > page and marked it as all-frozen. But this situation is also resolved
>> > > by that solution.
>> >
>> > # HEAP2_CLEAN is issued in lazy_vacuum_page
>> >
>> > It will work but I'm not sure it is right direction for
>> > HEAP2_FREEZE_PAGE to touch FSM.
>> >
>> > As Masahiko said, the situation must be created by HEAP2_VISIBLE
>> > without preceding HEAP2_CLEAN, or with HEAP2_CLEAN with FPI. I
>> > think only the latter can happen. The comment in heap_xlog_clean
>> > below is right generally but if a page filled with tuples becomes
>> > almost empty and freezable by this cleanup, a problematic
>> > situation like this occurs.
>> >
>>
>> I now think this is not the cause of the problem I am seeing.  I made the
>> replay of FREEZE_PAGE update the FSM (both with and without FPI), but that
>> did not fix it.  With frequent crashes, it still accumulated a lot of
>> frozen and empty (but full according to FSM) pages.  I also set up replica
>> streaming and turned off crashing on the master, and the FSM of the replica
>> stays accurate, so the WAL stream and replay logic is doing the right thing
>> on the replica.
>>
>> I now think the dirtied FSM pages are somehow not getting marked as dirty,
>> or are getting marked as dirty but somehow the checkpoint is skipping
>> them.  It looks like MarkBufferDirtyHint does do some operations unlocked
>> which could explain lost update, but it seems unlikely that that would
>> happen often enough to see the amount of lost updates I am seeing.
>
> Hmm.. clearing dirty hint seems already protected by exclusive
> lock. And I think it can occur without lock failure.
>
> Other than by FPI, FSM update is omitted when record LSN is older
> than page LSN. If heap page is evicted but FSM page is not after
> vacuuming and before power cut, replaying HEAP2_CLEAN skips
> update of FSM even though FPI is not attached. Of course this
> cannot occur on standby. One FSM page covers as many heap pages
> as about 4k, so FSM can stay far longer than heap pages.
>
> ALL_FROZEN is set with other than HEAP2_FREEZE_PAGE. When a page
> is already empty when entering lazy_sacn_heap, or a page of
> non-indexed heap is empitied in lazy_scan_heap, HRAP2_VISIBLE is
> issued to set ALL_FROZEN.
>
> Perhaps the problem will be fixed by forcing heap_xlog_visible to
> update FSM (addition to FREEZE_PAGE), or the same in
> heap_xlog_clean. (As menthined in the previous mail, I prefer the
> latter.)

Maybe it's enough just to make both heap_xlog_visible and
heap_xlog_freeze_page forcibly updates the FSM (heap_xlog_freeze_page
might be unnecessary). Because the problem happens on the page that is
full according to FSM but is empty and marked as all-visible or
all-frozen. Though heap_xlog_clean loads the heap page to the memory
for redo operation, forcing heap_xlog_clean to update FSM might be
overkill for this solution. Because it can happen on every pages that
are not marked as neither all-visible nor all-frozen. Basically 100%
accuracy of FSM is not required. On the other hand, if we makes
heap_xlog_visible updates the FSM, it requires to load both heap page
and FSM page, which can also be overhead. Another idea is, we can
heap_xlog_visible to have the freespace of corresponding heap page,
and then update FSM during recovery.

>
>> > > /*
>> > >  * Update the FSM as well.
>> > >  *
>> > >  * XXX: Don't do this if the page was restored from full page image. We
>> > >  * don't bother to update the FSM in that case, it doesn't need to be
>> > >  * totally accurate anyway.
>> > >  */
>> >
>>
>> What does that save us?  If we restored from FPI, we already have the block
>> in memory (we don't need to see the old version, just the new one), so it
>> doesn't save us a random read IO.
>
> Updates on random pages can cause visits to many unloaded FSM
> pages. It may be intending to avoid that. Or, especially for
> INSERT, successive operations tends to occur on the same heap
> page, the complexity of calculating FSM wouldn't be so small
> relatively. FMS tells a lie that the page has spare space after
> that but it doesn't harm. But I think that the things are
> different for operations that increments free space.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: free space map and visibility map

From

Kyotaro HORIGUCHI

Date:

28 March 2017, 07:28:34

I'd like to have a comment from Heikki or Tom.

At Mon, 27 Mar 2017 16:49:08 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoAnCw8y37dJSEhdbpue5H1v5FVLKUA_uh2MhZe331HNyw@mail.gmail.com>
> On Mon, Mar 27, 2017 at 2:38 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > Other than by FPI, FSM update is omitted when record LSN is older
> > than page LSN. If heap page is evicted but FSM page is not after
> > vacuuming and before power cut, replaying HEAP2_CLEAN skips
> > update of FSM even though FPI is not attached. Of course this
> > cannot occur on standby. One FSM page covers as many heap pages
> > as about 4k, so FSM can stay far longer than heap pages.
> >
> > ALL_FROZEN is set with other than HEAP2_FREEZE_PAGE. When a page
> > is already empty when entering lazy_sacn_heap, or a page of
> > non-indexed heap is empitied in lazy_scan_heap, HRAP2_VISIBLE is
> > issued to set ALL_FROZEN.
> >
> > Perhaps the problem will be fixed by forcing heap_xlog_visible to
> > update FSM (addition to FREEZE_PAGE), or the same in
> > heap_xlog_clean. (As menthined in the previous mail, I prefer the
> > latter.)
> 
> Maybe it's enough just to make both heap_xlog_visible and
> heap_xlog_freeze_page forcibly updates the FSM (heap_xlog_freeze_page
> might be unnecessary). Because the problem happens on the page that is
> full according to FSM but is empty and marked as all-visible or

It would work and straightforward.

Currently FSM seems to be assumed as a part of heap from the view
of WAL. From the point of view, the problem is heap_xlog_clean
omits updating FSM for certain cases. My only concern is whether
updating heap information by visibility map record is right or
not. The code indents to reduce FSM updates without having
problem. For the insert/update cases, the problem is too-large
freespace information in FSM can cause needless fetches of heap
pages. But things are a bit different for the clean case. The
problem is too-small freespace information that causes
everlasting empty pages.

I dug out the original discussion. The mention on this was found
here.

https://www.postgresql.org/message-id/24334.1225205478%40sss.pgh.pa.us

Tom Lane wrote:
| Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
| > One issue with this patch is that it doesn't update the FSM at all when 
| > pages are restored from full page images. It would require fetching the 
| > page and checking the free space on it, or peeking into the size of the 
| > backup block data, and I'm not sure if it's worth the extra code to do that.
| 
| I'd vote not to bother, at least not in the first cut.  As you say, 100%
| accuracy isn't required, and I think that in typical scenarios an
| insert/update that causes a page to become full would be relatively less
| likely to have a full-page image.

This is the 'first cut' shape, which hadn't cause a apparent
problem without ALL_FROZEN.

> all-frozen. Though heap_xlog_clean loads the heap page to the memory
> for redo operation, forcing heap_xlog_clean to update FSM might be
> overkill for this solution. Because it can happen on every pages that
> are not marked as neither all-visible nor all-frozen. Basically 100%

I'm not sure that it is defeinitely not an overkill but it seems
to me the same with the 20% rule of insert/update cases. We must
avoid 0% or too-small (under 20%?) FSM info on heap_clean for the
case especially for FREEZEing.

> accuracy of FSM is not required. On the other hand, if we makes

Yes, what is needed here is not accuracy, but miminum guratantee
not to cause a critical problem.

> heap_xlog_visible updates the FSM, it requires to load both heap page
> and FSM page, which can also be overhead. Another idea is, we can
> heap_xlog_visible to have the freespace of corresponding heap page,
> and then update FSM during recovery.

I haven't considered it. Counting freepsace by visiblilty logs is
worse in I/O perspective. Seems somewhat arbitrary but having
freespace in VM records seems to work.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Fwd: free space map and visibility map

From

Jeff Janes

Date:

28 March 2017, 18:50:58

I accidentally sent this off-list, sending to the list now:

On Sun, Mar 26, 2017 at 10:38 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:

At Sat, 25 Mar 2017 19:53:47 -0700, Jeff Janes <jeff.janes@gmail.com> wrote in <CAMkU=1x3+DPsfSU+AF7WAzAVugmEhUA2+jNf7SuAL-MSKQ+_KA@mail.gmail.com>
> On Thu, Mar 23, 2017 at 7:01 PM, Kyotaro HORIGUCHI <
> horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>
> > At Wed, 22 Mar 2017 02:15:26 +0900, Masahiko Sawada <sawada.mshk@gmail.com>
> > wrote in <CAD21AoAq2YHs3MvSky6TxX1oKqyiPwUphdSa2sJCab_V4ci4VQ@mail.
> > gmail.com>
> > > On Mon, Mar 20, 2017 at 11:28 PM, Robert Haas <robertmhaas@gmail.com>
> > wrote:
> > > > On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes <jeff.janes@gmail.com>
> > wrote:
> > > >> Isn't HEAP2_CLEAN only issued before an intended HOT update? (Which
> > then
> > > >> can't leave the block as all visible or all frozen). I think the
> > issue is
> > > >> here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE. Am I reading this
> > correctly,
> > > >> that neither of those ever update the FSM, regardless of FPI?
> > > >
> > > > Yes, updates to the FSM are never logged. Forcing replay of
> > > > HEAP2_FREEZE_PAGE to update the FSM might be a good idea.
> > > >
> > >
> > > I think I was missing something. I imaged your situation is that FPI
> > > is replayed during crash recovery after the crashed server vacuums the
> > > page and marked it as all-frozen. But this situation is also resolved
> > > by that solution.
> >
> > # HEAP2_CLEAN is issued in lazy_vacuum_page
> >
> > It will work but I'm not sure it is right direction for
> > HEAP2_FREEZE_PAGE to touch FSM.
> >
> > As Masahiko said, the situation must be created by HEAP2_VISIBLE
> > without preceding HEAP2_CLEAN, or with HEAP2_CLEAN with FPI. I
> > think only the latter can happen. The comment in heap_xlog_clean
> > below is right generally but if a page filled with tuples becomes
> > almost empty and freezable by this cleanup, a problematic
> > situation like this occurs.
> >
>
> I now think this is not the cause of the problem I am seeing. I made the
> replay of FREEZE_PAGE update the FSM (both with and without FPI), but that
> did not fix it. With frequent crashes, it still accumulated a lot of
> frozen and empty (but full according to FSM) pages. I also set up replica
> streaming and turned off crashing on the master, and the FSM of the replica
> stays accurate, so the WAL stream and replay logic is doing the right thing
> on the replica.
>
> I now think the dirtied FSM pages are somehow not getting marked as dirty,
> or are getting marked as dirty but somehow the checkpoint is skipping
> them. It looks like MarkBufferDirtyHint does do some operations unlocked
> which could explain lost update, but it seems unlikely that that would
> happen often enough to see the amount of lost updates I am seeing.

Hmm.. clearing dirty hint seems already protected by exclusive
lock. And I think it can occur without lock failure.

Other than by FPI, FSM update is omitted when record LSN is older
than page LSN. If heap page is evicted but FSM page is not after
vacuuming and before power cut, replaying HEAP2_CLEAN skips
update of FSM even though FPI is not attached. Of course this
cannot occur on standby. One FSM page covers as many heap pages
as about 4k, so FSM can stay far longer than heap pages.

This corresponds to action == BLK_DONE case, right?

ALL_FROZEN is set with other than HEAP2_FREEZE_PAGE. When a page
is already empty when entering lazy_sacn_heap, or a page of
non-indexed heap is empitied in lazy_scan_heap, HRAP2_VISIBLE is
issued to set ALL_FROZEN.

Perhaps the problem will be fixed by forcing heap_xlog_visible to
update FSM (addition to FREEZE_PAGE), or the same in
heap_xlog_clean. (As menthined in the previous mail, I prefer the
latter.)

When I make heap_xlog_clean update FSM even on BLK_RESTORED (but not on BLK_DONE), it solves the problem I was seeing. Which still leaves me wondering why the problem doesn't show up on the standby because, unlike BLK_DONE, BLK_RESTORED should have the same issue on standby as it does on a recovering master, shouldn't it? Maybe the difference is that the existence a replication slot delays the clean up in a way that causes a different pattern of WAL records.

> > > /*
> > > * Update the FSM as well.
> > > *
> > > * XXX: Don't do this if the page was restored from full page image. We
> > > * don't bother to update the FSM in that case, it doesn't need to be
> > > * totally accurate anyway.
> > > */
> >
>
> What does that save us? If we restored from FPI, we already have the block
> in memory (we don't need to see the old version, just the new one), so it
> doesn't save us a random read IO.

Updates on random pages can cause visits to many unloaded FSM
pages. It may be intending to avoid that.

But I think that that would be no worse for BLK_RESTORED than it is for BLK_NEEDS_REDO. Why optimize only one of the cases, if it is worth optimizing either one?

Cheers,

Jeff

Attachment

fsm_clean.patch

Re: free space map and visibility map

From

Kyotaro HORIGUCHI

Date:

29 March 2017, 04:40:07

Hello,

At Tue, 28 Mar 2017 08:50:58 -0700, Jeff Janes <jeff.janes@gmail.com> wrote in
<CAMkU=1zKfqGePWG+qqKthmWERBn8UAA2_9Sb+qTUUREhFkqLCA@mail.gmail.com>
> > > I now think this is not the cause of the problem I am seeing.  I made the
> > > replay of FREEZE_PAGE update the FSM (both with and without FPI), but
> > that
> > > did not fix it.  With frequent crashes, it still accumulated a lot of
> > > frozen and empty (but full according to FSM) pages.  I also set up
> > replica
> > > streaming and turned off crashing on the master, and the FSM of the
> > replica
> > > stays accurate, so the WAL stream and replay logic is doing the right
> > thing
> > > on the replica.
> > >
> > > I now think the dirtied FSM pages are somehow not getting marked as
> > dirty,
> > > or are getting marked as dirty but somehow the checkpoint is skipping
> > > them.  It looks like MarkBufferDirtyHint does do some operations unlocked
> > > which could explain lost update, but it seems unlikely that that would
> > > happen often enough to see the amount of lost updates I am seeing.
> >
> > Hmm.. clearing dirty hint seems already protected by exclusive
> > lock. And I think it can occur without lock failure.
> >
> > Other than by FPI, FSM update is omitted when record LSN is older
> > than page LSN. If heap page is evicted but FSM page is not after
> > vacuuming and before power cut, replaying HEAP2_CLEAN skips
> > update of FSM even though FPI is not attached. Of course this
> > cannot occur on standby. One FSM page covers as many heap pages
> > as about 4k, so FSM can stay far longer than heap pages.
> >
> 
> This corresponds to action == BLK_DONE case, right?

Yes. WAL with older LSN results in BLK_DONE. It works as long as
heap page and FSM are consistent but leaves FSM broken during
crach-recovery for the situation.

> > ALL_FROZEN is set with other than HEAP2_FREEZE_PAGE. When a page
> > is already empty when entering lazy_sacn_heap, or a page of
> > non-indexed heap is empitied in lazy_scan_heap, HRAP2_VISIBLE is
> > issued to set ALL_FROZEN.
> >
> > Perhaps the problem will be fixed by forcing heap_xlog_visible to
> > update FSM (addition to FREEZE_PAGE), or the same in
> > heap_xlog_clean. (As menthined in the previous mail, I prefer the
> > latter.)
> >
> 
> When I make heap_xlog_clean update FSM even on BLK_RESTORED (but not on
> BLK_DONE), it solves the problem I was seeing.  Which still leaves me
> wondering why the problem doesn't show up on the standby because, unlike
> BLK_DONE, BLK_RESTORED should have the same issue on standby as it does on
> a recovering master, shouldn't it? Maybe the difference is that the
> existence a replication slot delays the clean up in a way that causes a
> different pattern of WAL records.

While all WAL records are new to target page during standby
recovery, several WAL records at the beginning can be old in
a crash-recovery.

> > > > > /*
> > > > >  * Update the FSM as well.
> > > > >  *
> > > > >  * XXX: Don't do this if the page was restored from full page image.
> > We
> > > > >  * don't bother to update the FSM in that case, it doesn't need to be
> > > > >  * totally accurate anyway.
> > > > >  */
> > > >
> > >
> > > What does that save us?  If we restored from FPI, we already have the
> > block
> > > in memory (we don't need to see the old version, just the new one), so it
> > > doesn't save us a random read IO.
> >
> > Updates on random pages can cause visits to many unloaded FSM
> > pages. It may be intending to avoid that.
> 
> 
> But I think that that would be no worse for BLK_RESTORED than it is for
> BLK_NEEDS_REDO.  Why optimize only one of the cases, if it is worth
> optimizing either one?

I agree with you. FPI increases and descreases free space just
the same as redoing WAL record. The following is the discussion
about that.

https://www.postgresql.org/message-id/49072021.7010801%40enterprisedb.com

https://www.postgresql.org/message-id/24334.1225205478%40sss.pgh.pa.us

Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> > One issue with this patch is that it doesn't update the FSM at all when 
> > pages are restored from full page images. It would require fetching the 
> > page and checking the free space on it, or peeking into the size of the 
> > backup block data, and I'm not sure if it's worth the extra code to do that.
> 
> I'd vote not to bother, at least not in the first cut.  As you say, 100%
> accuracy isn't required, and I think that in typical scenarios an
> insert/update that causes a page to become full would be relatively less
> likely to have a full-page image.

So, the reason seems to be that it just doesn't seem necessary.

Including another branch of this thread, the following options
are proposed.

- Let FREEZE_PAGE and VISIBLE update FSM.
 This causes extra fetch of a heap page, summing up of free space and FSM update for every frozen pages.

- Let CLEAN always update FSM.
 This causes extra counting of free space and FSM update for every vacuuming of heap pages regardless of frozen-ness.

- Let FREEZE_PAGE/VISIBLE or CLEAN records have free space.
 This doesn't need to fetch a heap page. But breaks the policy (really?) that FSM is not WAL-logged, or that FSM is
updatedjust as the result of heap udpates. 
 

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center