Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum - Mailing list pgsql-bugs

From Peter Geoghegan
Subject Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Date
Msg-id CAH2-Wz=S91VGD7QgROVYM8A_Ou6FWUi+UE-OdQxVEAYEgd9R2A@mail.gmail.com
Whole thread Raw
In response to Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (Andres Freund <andres@anarazel.de>)
Responses Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
List pgsql-bugs
On Wed, Nov 10, 2021 at 6:16 PM Andres Freund <andres@anarazel.de> wrote:
> Hm. To me all of this is more general than vacuum[lazy].c. Or even than
> anything heap related.

Here is a sensible compromise: put most of what you want to say
wherever (I guess procarray.c), and then move the vacuumlazy.c call to
GlobalVisTestFor() back, so that it comes immediately after the
vacuum_set_xid_limits() call. Then place a few breadcrumb comments
that reference the place in procarray.c that has the real discussion.

> We need pruning to be at least as aggressive as relfrozenxid. If we did it the
> other way round, we couldn't guarantee that.

I thought that that's what it was, but the code doesn't actually say
anything about it. The distance between the two actually-related
things is jarring, at least to me.

> I think we should work towards not actually using a statically determined
> relfrozenxid. We cause a lot of unnecessary re-vacuuming by using a static
> cutoff - instead we should check what the actually oldest xid in the table is
> and set relfrozenxid to that.

I agree, but that doesn't seem relevant to me. AFAICT the "effective"
relfrozenxid when applying this hypothetical future optimization (the
"actually oldest xid in the table", as you put it) must never end up
exceeding the original OldestXmin cutoff. And so I don't think that it
changes the fundamental invariants for either OldestXmin, or for
freezeLimit/relfrozenxid. Specifically, the "freezeLimit <=
OldestXmin" invariant.

We could probably *also* freeze tuples opportunistically (e.g., freeze
a few tuples on a page early to be able to mark it all-frozen sooner),
since freezing is basically just an all-visible marking that applies
at the tuple level. We could perhaps even do this when the tuples
would not be visible to our original OldestXmin (they just have to be
visible to every possible MVCC snapshot, and so VACUUM's OldestXmin
itself doesn't necessarily have to be considered). This additional
optimization doesn't seem like it changes the invariants, either,
though. Since I'm pretty sure that freezing tuples early isn't
compatible with allowing those tuples to affect the final
freezeLimit/relfrozenxid (when we have both optimization, working
together).

-- 
Peter Geoghegan



pgsql-bugs by date:

Previous
From: Andres Freund
Date:
Subject: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Next
From: Michael Paquier
Date:
Subject: Re: BUG #17280: global-buffer-overflow on select from pg_stat_slru