Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum - Mailing list pgsql-bugs

From Peter Geoghegan
Subject Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Date
Msg-id CAH2-Wzk_VEZnPH5r6CgP92qhhK_oaXid9DwaXB0XC2wsfNqMpA@mail.gmail.com
Whole thread Raw
In response to Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (Andres Freund <andres@anarazel.de>)
Responses Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (Andres Freund <andres@anarazel.de>)
List pgsql-bugs
Attached is v2 revision of the less critical vacuumlazy.c vistest
patch. I will respond to your second email about the pruneheap.c patch
on a new dedicated thread, per your suggestion.

NB: I intend to commit this revision of the patch (or something pretty
close to it) in the next few days, barring any objections.

On Wed, Mar 9, 2022 at 4:25 PM Andres Freund <andres@anarazel.de> wrote:
> I think it'd be nicer if we did the horizon determination after allocating
> space for dead tuples... But it's still better than today, so...

Why would it be nicer? Do you mean that it would be neater that way,
since every single field in vacrel gets initialized before the
lazy_scan_prune call, without exception?

I agree with that, but it doesn't seem important right now.

> > An important detail here is how OldestXmin (and vistest) are related
> > to rel_pages -- I also added something about that. It definitely
> > wouldn't be okay to (say) move the call to RelationGetNumberOfBlocks
> > to any place before the call to vacuum_set_xid_limits. And so it very
> > much seems in scope here.
>
> You're right, it needs to be that way round.

I think that this is well worth documenting. But it's also nice that
*almost* all initialization takes place inside heap_vacuum_rel() now,
before we call lazy_scan_heap().

> I don't think the minor optimization does anything (which I had stated wrongly
> at some point in this thread).

Got it. Comments that said that there was value in that are removed
from v2. But v2 does at least speculate about teaching lazy_scan_prune
to advance vistest as an optimization, from time to time -- which, as
you pointed out, is likely to be added in the future.

Talking about the optimization as future work makes sense, though not
because the future work is necessarily all that important. It just
seemed to me to be the easiest way of getting the following point
across to the reader: "vistest and OldestXmin are two distinct
representations of the same cutoff for now -- though don't make the
mistake of believing that they have to be the same cutoff".

> >       /*
> >        * Call lazy_scan_heap to perform all required heap pruning, index
> > -      * vacuuming, and heap vacuuming (plus related processing)
> > +      * vacuuming, and heap vacuuming (plus related processing).
> > +      *
> > +      * Note: Resource management for parallel VACUUM is quite subtle, and so
> > +      * lazy_scan_heap must allocate (and deallocate) vacrel->dead_items space
> > +      * for itself.  Every other vacrel field is initialized by now, though.
> >        */
> >       lazy_scan_heap(vacrel, params->nworkers);
>
> What are you referring to with "quite subtle"?

There are 2 subtleties that I can think of offhand:

1. We deliberately do the initial lazy_check_wraparound_failsafe()
precheck before reaching dead_items_alloc(). Per the comments at the
start of lazy_scan_heap(), we won't use parallel VACUUM if the
failsafe kicks in during the precheck.

2. There is also deallocation to consider here: lazy_scan_heap () also
calls dead_items_cleanup(), which might have to release shared memory
-- which is actually handled by ending parallel mode.
dead_items_cleanup() must be called before update_index_statistics(),
which comes last of all -- since we can't update pg_class when in
parallel mode.

In v2 the same comment block now simply says that we have initialized
everything from vacrel except for vacrel->dead_items. It doesn't
explain why.

Thanks
--
Peter Geoghegan

Attachment

pgsql-bugs by date:

Previous
From: Andres Freund
Date:
Subject: Re: Pg 15 devel crashes when fetching data from table using cursor
Next
From: Tom Lane
Date:
Subject: Re: Pg 15 devel crashes when fetching data from table using cursor