Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum - Mailing list pgsql-bugs

From Peter Geoghegan
Subject Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Date
Msg-id CAH2-Wzk+=kCS5SMMQ++s5vX9NmWKxJmOBfU2wqR-n78cOYFgfA@mail.gmail.com
Whole thread Raw
In response to Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (Andres Freund <andres@anarazel.de>)
Responses Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
List pgsql-bugs
On Mon, Nov 22, 2021 at 9:59 AM Andres Freund <andres@anarazel.de> wrote:
> I'm not quite sure what you referring to with "convincing me"? Whether to go
> for something comprehensive or more minimal? I'm kind of agnostic on what the
> right approach is.

I thought that you were clearly leaning in the direction of a
minimal-only fix for backpatch. I just meant that there is no point in
saying much more about it. It's not like I have a problem with your
approach. My personal opinion is that we'd be somewhat better off if
we went with my more comprehensive approach, even on 14. That's not
how you see it - which is fair enough (reasoning about these risks is
hard, and it tends to be quite subjective).

And so, as I said before, I can leave it at that.

> > Why don't you produce a minimal fix for backpatch? I'll review that,
> > just as you reviewed my patch.
>
> Here's a draft of that change.

Looks reasonable to me.

> I think it's worth to clean up the regression test I wrote and use it
> regardless of which version of the fix we end up choosing. But I'm a bit bit
> on the fence - it's quite complicated.

+1 to the idea of keeping it somewhere. Without necessarily running it
on the BF.

> OTOH, I think with the framework in place it'd not be too hard to write a few
> more tests for odd pruning scenarios...

Can we commit a test like this without running it by default, at all?
It's not like it has never been done before.

> If we were to, it'd probably best to to comment out the mon_* stuff, as that
> is what requires the amcheck / pageinspect extensions, which we'd likely not
> want to depend on. But it's pretty crucial for development.

Good idea.

The nice thing about using amcheck for something like this (compared
to pageinspect) is that we don't have to specify exact details about
what it looks like when the test passes. Because passing the test just
means the absence of any error.

How hard would it be to just add amcheck coverage on HEAD? Something
that goes just a bit further with LP_REDIRECT validation (currently
verify_heapam does practically nothing like that). We should add
comprehensive HOT chain validation too, but that can wait a little
while longer. It's a lot more complicated.

Another thing is that the new assertions themselves are part of our
overall testing strategy. I see that you've gone further with that
than I did in your v3-0003-* patch. I should adopt that code to my
more comprehensive fix for HEAD, while still keeping around the
existing heap_prune_chain() assertions (perhaps just as
documentation/comments to make the code easier to follow).

-- 
Peter Geoghegan



pgsql-bugs by date:

Previous
From: Andres Freund
Date:
Subject: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Next
From: Andres Freund
Date:
Subject: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum