Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic - Mailing list pgsql-hackers

From Andres Freund
Subject Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
Date
Msg-id 20210616192202.6q63mu66h4uyn343@alap3.anarazel.de
Whole thread Raw
In response to Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
List pgsql-hackers
Hi,

On 2021-06-16 09:46:07 -0700, Peter Geoghegan wrote:
> On Wed, Jun 16, 2021 at 9:03 AM Peter Geoghegan <pg@bowt.ie> wrote:
> > On Wed, Jun 16, 2021 at 3:59 AM Matthias van de Meent
> > > So the implicit assumption in heap_page_prune that
> > > HeapTupleSatisfiesVacuum(OldestXmin) is always consistent with
> > > heap_prune_satisfies_vacuum(vacrel) has never been true. In that case,
> > > we'll need to redo the condition in heap_page_prune as well.
> >
> > I don't think that this shows that the assumption within
> > lazy_scan_prune() (the assumption that both "satisfies vacuum"
> > functions agree) is wrong, with the obvious exception of cases
> > involving the bug that Justin reported. GlobalVis*.maybe_needed is
> > supposed to be conservative.
> 
> I suppose it's true that they can disagree because we call
> vacuum_set_xid_limits() to get an OldestXmin inside vacuumlazy.c
> before calling GlobalVisTestFor() inside vacuumlazy.c to get a
> vistest. But that only implies that a tuple that would have been
> considered RECENTLY_DEAD inside lazy_scan_prune() (it just missed
> being considered DEAD according to OldestXmin) is seen as an LP_DEAD
> stub line pointer. Which really means it's DEAD to lazy_scan_prune()
> anyway. These days the only way that lazy_scan_prune() can consider a
> tuple fully DEAD is if it's no longer a tuple -- it has to actually be
> an LP_DEAD stub line pointer.

I think it's more complicated than that - "before" isn't a guarantee when the
horizon can go backwards. Consider the case where a hot_standby_feedback=on
replica without a slot connects - that can result in the xid horizon going
backwards.

I think a good way to address this might be to have GlobalVisUpdateApply()
ensure that maybe_needed does not go backwards within one backend.

This is *nearly* already guaranteed within vacuum, except for the case where a
catalog access between vacuum_set_xid_limits() and GlobalVisTestFor() could
lead to an attempt at pruning, which could move maybe_needed to go backwards
theoretically if inbetween those two steps a replica connected that causes the
horizon to go backwards.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: snapshot too old issues, first around wraparound and then more.
Next
From: Matthias van de Meent
Date:
Subject: Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic