Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic |
Date | |
Msg-id | CAH2-Wzm3tnHJExZ+fPgVJX7v1Vo7jWDFtN040zeYsGyjfjP7qw@mail.gmail.com Whole thread Raw |
In response to | Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic (Matthias van de Meent <boekewurm+postgres@gmail.com>) |
Responses |
Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
|
List | pgsql-hackers |
On Wed, Jun 16, 2021 at 3:59 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > On Tue, 15 Jun 2021 at 03:22, Andres Freund <andres@anarazel.de> wrote: > > > @@ -4032,6 +4039,24 @@ GlobalVisTestShouldUpdate(GlobalVisState *state) > > > static void > > > GlobalVisUpdateApply(ComputeXidHorizonsResult *horizons) > > > { > > > + /* assert non-decreasing nature of horizons */ > > > > Thinking more about it, I don't think these are correct. See the > > following comment in procarray.c: > > > > * Note: despite the above, it's possible for the calculated values to move > > * backwards on repeated calls. > > So the implicit assumption in heap_page_prune that > HeapTupleSatisfiesVacuum(OldestXmin) is always consistent with > heap_prune_satisfies_vacuum(vacrel) has never been true. In that case, > we'll need to redo the condition in heap_page_prune as well. I don't think that this shows that the assumption within lazy_scan_prune() (the assumption that both "satisfies vacuum" functions agree) is wrong, with the obvious exception of cases involving the bug that Justin reported. GlobalVis*.maybe_needed is supposed to be conservative. > PFA my adapted patch that fixes this new-ish issue, and does not > include the (incorrect) assertions in GlobalVisUpdateApply. I've > tested this against the reproducing case, both with and without the > fix in GetOldestNonRemovableTransactionId, and it fails fall into an > infinite loop. > > I would appreciate it if someone could validate the new logic in the > HEAPTUPLE_DEAD case. Although I believe it correctly handles the case > where the vistest non-removable horizon moved backwards, a second pair > of eyes would be appreciated. If you look at the lazy_scan_prune() logic immediately prior to commit 8523492d4e3, you'll see that it used to have a HEAPTUPLE_DEAD case that didn't involve a restart -- this was the "tupgone" mechanism. Back then we actually had to remove any corresponding index tuples from indexes when in this rare case. Plus there was a huge amount of complicated mechanism to handle a very rare case, all of which was removed by commit 8523492d4e3. Things like extra recovery conflict code just for this rare case, or needing to acquire a super exclusive lock on pages during VACUUM's second heap pass. This is all cruft that I was happy to get rid of. This is a good discussion of the tupgone stuff and the problems it caused, which is good background information: https://www.postgresql.org/message-id/20200724165514.dnu5hr4vvgkssf5p%40alap3.anarazel.de Even if it was true that heap_prune_satisfies_vacuum() won't agree with HeapTupleSatisfiesVacuum() after repeated retries within lazy_scan_prune(), it would probably best if we then made code outside vacuumlazy.c agree with the lazy_scan_prune() assumption, rather than the other way around. Have you actually been able to demonstrate a problem involving lazy_scan_prune()'s goto, except the main GetOldestNonRemovableTransactionId() bug reported by Justin? -- Peter Geoghegan
pgsql-hackers by date: