Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
Date
Msg-id CAEze2WgWa7w5ovuB8tPioH+5msdhtFaP2KfRv1kU9uo0Egv--A@mail.gmail.com
Whole thread Raw
In response to Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Wed, 16 Jun 2021 at 21:22, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2021-06-16 09:46:07 -0700, Peter Geoghegan wrote:
> > On Wed, Jun 16, 2021 at 9:03 AM Peter Geoghegan <pg@bowt.ie> wrote:
> > > On Wed, Jun 16, 2021 at 3:59 AM Matthias van de Meent
> > > > So the implicit assumption in heap_page_prune that
> > > > HeapTupleSatisfiesVacuum(OldestXmin) is always consistent with
> > > > heap_prune_satisfies_vacuum(vacrel) has never been true. In that case,
> > > > we'll need to redo the condition in heap_page_prune as well.
> > >
> > > I don't think that this shows that the assumption within
> > > lazy_scan_prune() (the assumption that both "satisfies vacuum"
> > > functions agree) is wrong, with the obvious exception of cases
> > > involving the bug that Justin reported. GlobalVis*.maybe_needed is
> > > supposed to be conservative.
> >
> > I suppose it's true that they can disagree because we call
> > vacuum_set_xid_limits() to get an OldestXmin inside vacuumlazy.c
> > before calling GlobalVisTestFor() inside vacuumlazy.c to get a
> > vistest. But that only implies that a tuple that would have been
> > considered RECENTLY_DEAD inside lazy_scan_prune() (it just missed
> > being considered DEAD according to OldestXmin) is seen as an LP_DEAD
> > stub line pointer. Which really means it's DEAD to lazy_scan_prune()
> > anyway. These days the only way that lazy_scan_prune() can consider a
> > tuple fully DEAD is if it's no longer a tuple -- it has to actually be
> > an LP_DEAD stub line pointer.
>
> I think it's more complicated than that - "before" isn't a guarantee when the
> horizon can go backwards. Consider the case where a hot_standby_feedback=on
> replica without a slot connects - that can result in the xid horizon going
> backwards.
>
> I think a good way to address this might be to have GlobalVisUpdateApply()
> ensure that maybe_needed does not go backwards within one backend.
>
> This is *nearly* already guaranteed within vacuum, except for the case where a
> catalog access between vacuum_set_xid_limits() and GlobalVisTestFor() could
> lead to an attempt at pruning, which could move maybe_needed to go backwards
> theoretically if inbetween those two steps a replica connected that causes the
> horizon to go backwards.

I'm tempted to suggest "update one of GlobalVisUpdateApply /
ComputeXidHorizons to be non-decreasing". We already have the
information that any previous GlobalVis*->maybe_needed is correct, and
that if maybe_needed has been higher that that value is still correct,
so we might just as well update the code to envelop that case. There's
some cases where this might be dangerous: New transactions after a
time with no active backends (in this case it should be fine to
guarantee non-decreasing GlobalVisTestNonRemovableHorizon), and
walsender. I'm uncertain whether or not it's dangerous to _not_
rollback maybe_needed for a new walsender-backend (e.g. the backend
might want to construct a snapshot of (then) before
GlobalVisTestNonRemovableHorizon), especially when considering the
comment in ProcessStandbyHSFeedbackMessage:

     * There is a small window for a race condition here: although we just
     * checked that feedbackXmin precedes nextXid, the nextXid could have
     * gotten advanced between our fetching it and applying the xmin below,
     * perhaps far enough to make feedbackXmin wrap around.  In that case the
     * xmin we set here would be "in the future" and have no effect.  No point
     * in worrying about this since it's too late to save the desired data
     * anyway.  Assuming that the standby sends us an increasing sequence of
     * xmins, this could only happen during the first reply cycle, else our
     * own xmin would prevent nextXid from advancing so far.

At the very least, changing GlobalVisUpdateApply/ComputeXidHorizons
would increase the potential amount of data lost in such race
conditions, if any.

As further note, my suggested changes in vacuumlazy (specifically, the
'continue' path added in lazy_scan_prune in my recent v2 patchset) is
likely incorrect because of a potential undocumented requirement of
heap_page_prune: leave no dead tuples with xmax < vacrel->OldestXmin.
I realised that in my patch, we would allow some these tuples to
continue to exist IFF the GlobalVisTestNonRemovableHorizon moved back
during the vacuum, which would violate such requirement.


Kind regards,

Matthias van de Meent.



pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Support for NSS as a libpq TLS backend
Next
From: Zhihong Yu
Date:
Subject: Re: A qsort template