Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum - Mailing list pgsql-bugs

From Peter Geoghegan
Subject Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Date
Msg-id CAH2-WznO02LHO04mgc_U=c6y9+VBXfDDTYfOn=c4EHXoHY87HQ@mail.gmail.com
Whole thread Raw
In response to Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (Andres Freund <andres@anarazel.de>)
Responses Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (Andres Freund <andres@anarazel.de>)
List pgsql-bugs
On Sat, Mar 12, 2022 at 1:49 PM Andres Freund <andres@anarazel.de> wrote:
> GlobalVisTestFor() doesn't trawl through the procarray to compute horizons. It
> just gets already computed horizons, potentially very approximate ones that
> later will get updated when necessary to determine the visibility of a
> concrete xid.

That much is easy to understand. It's the most significant way in
which the snapshot scalability work improves scalability.

> You could call GlobalVisTestFor() and then later call call
> vacuum_set_xid_limits() and the test by GlobalVisTestFor() would still be
> accurate.

That seems kinda obvious, since all it does is get the appropriate
state for the relkind of the relation being vacuumed. Typically it
just returns a pointer to GlobalVisDataRels, which is a static
variable in procarray.c.

> The main reason to have vistest at all is that for on-access pruning computing
> accurate horizons is excruciatingly expensive - the reason behind procarray
> scaling badly were all the xmin accesses during normal snapshot computations!
> And due to the way the pruning code is used for both vacuuming and on-access
> pruning that kind of bleeds into vacuumlazy.c.  That could have been reduced,
> but it seemed beneficial to allow to update horizons as we go.

That I get too.

> > Periodically refreshing OldestXmin for freezing won't work, though. At
> > the very least it seems much less compelling than the vistest idea.
>
> I think there's a fair bit of value in using an "as aggressive as possible"
> initial OldestXmin. Which is used to set a bunch of cutoffs / determine
> aggressiveness etc.

I just don't think that it makes very much difference, since we're
only talking about freezing here -- which is not something we tend to
be too eager with anyway (FreezeLimit is very seldom the same as
OldestXmin, etc).

Maybe it could take a long time for vac_open_indexes() to return, but
that's really an edge case. I'm puzzled why you're placing so much
emphasis on this. I can see why that's valuable in a theoretical
abstract kind of way -- but that's about it. And so it feels like I'm
still missing something.

> Once we compute "measured" relfrozenxid however, there's afaict otherwise not
> much point in updating OldestXmin as we go, at least if we start to use
> vistest for the horizon determinations inside vacuumlazy.c.

> If we don't switch to using vistest for HTSV determinations in vacuumlazy.c,
> then I don't think we can refresh vistest without causing problems in
> lazy_scan_prune().

Why? Right now we could easily have a concurrent opportunistic prune
that uses a vistest that's well ahead of the one in VACUUM. And so
AFAICT it's already quite possible that any page encountered within
lazy_scan_prune was already pruned using a much more recent
vistest. So what's the problem with doing the same thing inside the
backend running VACUUM instead?

Is it something to do with the VACUUM backend's state?

We can recompute backend horizons during VACUUM after OldestXmin is
established, so I can't see why the backend's procarray.c state should
matter. If it did that would seem like a big problem in general. In
particular, my commit 9dd963ae25 (an nbtree VACUUM enhancement)
independently calls GetOldestNonRemovableTransactionId() midway
through VACUUM.

> But I think refreshing horizons is a different discussion from using
> as-recent-as-possible initial horizons...

Agreed. This is why I find your emphasis on as-recent-as-possible
initial horizons confusing. It seems as if you're giving almost equal
emphasis to both issues, even though one issue (more eager pruning
during VACUUM) is of obvious practical value, while the other
(as-recent-as-possible initial horizons) is pretty theoretical and
academic -- at best.

--
Peter Geoghegan



pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #17385: "RESET transaction_isolation" inside serializable transaction causes Assert at the transaction end
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: BUG #17385: "RESET transaction_isolation" inside serializable transaction causes Assert at the transaction end