Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic - Mailing list pgsql-hackers
From | Matthias van de Meent |
---|---|
Subject | Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic |
Date | |
Msg-id | CAEze2WgWa7w5ovuB8tPioH+5msdhtFaP2KfRv1kU9uo0Egv--A@mail.gmail.com Whole thread Raw |
In response to | Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
On Wed, 16 Jun 2021 at 21:22, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2021-06-16 09:46:07 -0700, Peter Geoghegan wrote: > > On Wed, Jun 16, 2021 at 9:03 AM Peter Geoghegan <pg@bowt.ie> wrote: > > > On Wed, Jun 16, 2021 at 3:59 AM Matthias van de Meent > > > > So the implicit assumption in heap_page_prune that > > > > HeapTupleSatisfiesVacuum(OldestXmin) is always consistent with > > > > heap_prune_satisfies_vacuum(vacrel) has never been true. In that case, > > > > we'll need to redo the condition in heap_page_prune as well. > > > > > > I don't think that this shows that the assumption within > > > lazy_scan_prune() (the assumption that both "satisfies vacuum" > > > functions agree) is wrong, with the obvious exception of cases > > > involving the bug that Justin reported. GlobalVis*.maybe_needed is > > > supposed to be conservative. > > > > I suppose it's true that they can disagree because we call > > vacuum_set_xid_limits() to get an OldestXmin inside vacuumlazy.c > > before calling GlobalVisTestFor() inside vacuumlazy.c to get a > > vistest. But that only implies that a tuple that would have been > > considered RECENTLY_DEAD inside lazy_scan_prune() (it just missed > > being considered DEAD according to OldestXmin) is seen as an LP_DEAD > > stub line pointer. Which really means it's DEAD to lazy_scan_prune() > > anyway. These days the only way that lazy_scan_prune() can consider a > > tuple fully DEAD is if it's no longer a tuple -- it has to actually be > > an LP_DEAD stub line pointer. > > I think it's more complicated than that - "before" isn't a guarantee when the > horizon can go backwards. Consider the case where a hot_standby_feedback=on > replica without a slot connects - that can result in the xid horizon going > backwards. > > I think a good way to address this might be to have GlobalVisUpdateApply() > ensure that maybe_needed does not go backwards within one backend. > > This is *nearly* already guaranteed within vacuum, except for the case where a > catalog access between vacuum_set_xid_limits() and GlobalVisTestFor() could > lead to an attempt at pruning, which could move maybe_needed to go backwards > theoretically if inbetween those two steps a replica connected that causes the > horizon to go backwards. I'm tempted to suggest "update one of GlobalVisUpdateApply / ComputeXidHorizons to be non-decreasing". We already have the information that any previous GlobalVis*->maybe_needed is correct, and that if maybe_needed has been higher that that value is still correct, so we might just as well update the code to envelop that case. There's some cases where this might be dangerous: New transactions after a time with no active backends (in this case it should be fine to guarantee non-decreasing GlobalVisTestNonRemovableHorizon), and walsender. I'm uncertain whether or not it's dangerous to _not_ rollback maybe_needed for a new walsender-backend (e.g. the backend might want to construct a snapshot of (then) before GlobalVisTestNonRemovableHorizon), especially when considering the comment in ProcessStandbyHSFeedbackMessage: * There is a small window for a race condition here: although we just * checked that feedbackXmin precedes nextXid, the nextXid could have * gotten advanced between our fetching it and applying the xmin below, * perhaps far enough to make feedbackXmin wrap around. In that case the * xmin we set here would be "in the future" and have no effect. No point * in worrying about this since it's too late to save the desired data * anyway. Assuming that the standby sends us an increasing sequence of * xmins, this could only happen during the first reply cycle, else our * own xmin would prevent nextXid from advancing so far. At the very least, changing GlobalVisUpdateApply/ComputeXidHorizons would increase the potential amount of data lost in such race conditions, if any. As further note, my suggested changes in vacuumlazy (specifically, the 'continue' path added in lazy_scan_prune in my recent v2 patchset) is likely incorrect because of a potential undocumented requirement of heap_page_prune: leave no dead tuples with xmax < vacrel->OldestXmin. I realised that in my patch, we would allow some these tuples to continue to exist IFF the GlobalVisTestNonRemovableHorizon moved back during the vacuum, which would violate such requirement. Kind regards, Matthias van de Meent.
pgsql-hackers by date: