Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum - Mailing list pgsql-bugs

From Peter Geoghegan
Subject Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Date
Msg-id CAH2-WznYsUxVT156rCQ+q=YD4S4=1M37hWvvHLz-H1pwSM8-Ew@mail.gmail.com
Whole thread Raw
In response to Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (Andres Freund <andres@anarazel.de>)
Responses Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (Peter Geoghegan <pg@bowt.ie>)
Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (Andres Freund <andres@anarazel.de>)
List pgsql-bugs
On Fri, Dec 10, 2021 at 8:57 PM Andres Freund <andres@anarazel.de> wrote:
> 0004 is a patch that tries to address your point about the GlobalVisTestFor()
>   placement, sort of

Following up on this now (will follow up on the "harden pruning" patch
separately).

>   My earlier point that moving it to earlier would be a bad idea because it'd
>   make the horizon "older" was bogus - GlobalVisTestFor() doesn't itself do
>   any horizon determination. However moving the GlobalVisTestFor() earlier
>   still seems wrong - imo the vacuum_set_xid_limits() call should be moved to
>   later. The attached patch moves both, wrapped in a new function, to just
>   before the scan in lazy_scan_heap().

Attached is (roughly speaking) a rebased version of this patch (i.e.
your v4-0004- patch from the patch series). Like your patch, this
patch documents our expectations for vistest and OldestXmin. Unlike
your patch, it makes all vacrel initialization take place before
lazy_scan_heap is called, from inside heap_vacuum_rel.

An important detail here is how OldestXmin (and vistest) are related
to rel_pages -- I also added something about that. It definitely
wouldn't be okay to (say) move the call to RelationGetNumberOfBlocks
to any place before the call to vacuum_set_xid_limits. And so it very
much seems in scope here.

I'm a bit confused about at least one detail here: you've said here
that "GlobalVisTestFor() doesn't itself do any horizon determination",
which seems to contradict comments from the v4-0004- patch that was
attached to the same email. This WIP patch claims that there is some
advantage to delaying the call to GlobalVisTestFor regardless, though
I think that there is a good chance that that detail is wrong. Please
let me know which it is.

I also updated nearby comments about the update to the target heap
rel's pg_class entry -- that work was done in passing.

Thanks
-- 
Peter Geoghegan

Attachment

pgsql-bugs by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Cannot refresh materialized view concurrently if you have a column name called "mv"
Next
From: Peter Geoghegan
Date:
Subject: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum