Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum - Mailing list pgsql-bugs

From Peter Geoghegan
Subject Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Date
Msg-id CAH2-Wz=czwtPTE_RNjBWQ5mTA1WQ6zY41RK1q5dn2kwV8YjZGg@mail.gmail.com
Whole thread Raw
In response to Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (Andres Freund <andres@anarazel.de>)
Responses Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
List pgsql-bugs
On Thu, Nov 11, 2021 at 12:05 PM Andres Freund <andres@anarazel.de> wrote:
> > ...why is that super likely? I have to admit that I have no idea what
> > you mean by that.
>
> Err, I missed a *not* in there, sorry for the confusion.

Can you at least move the GlobalVisTestFor() call back? Move it from
lazy_scan_heap() to the point right before we call lazy_scan_heap(),
from heap_vacuum_rel()? That still seems like a big improvement, while
obviously having no real side-effects.

> > Okay. Even still, I suggest that you say something about this new
> > DELETE_IN_PROGRESS pruning behavior in vacuumlazy.c. Maybe in
> > lazy_scan_prune()?
>
> That doesn't quite make sense to me. There's quite a few ways that a HTSV can
> change for the same tuple, and listing them all in vacuumlazy.c doesn't really
> help people notice that when they add a new call or such. There's plenty
> other callsites - like the one in heap_prune_chain()...

Well, the retry in lazy_scan_prune() says that it only expects to have
to retry in the event of an aborted transaction. If nothing else, it
seems like you might want to amend that. If in fact the same
"DELETE_IN_PROGRESS becomes DEAD" issue applies (can't see why it
wouldn't).

> How about changing the comment above lazy_scan_prune() to note that the
> concurrent abort case is one example of this behaviour? And then add a comment
> to HeapTupleSatisfiesVacuum() along the lines of:
>
> NB: Repeated calls to HeapTupleSatisfiesVacuum for the same tuple may change
> even while the buffer is locked. Cases of this include:
> - the transaction that inserted a tuple (version) aborted, changing the result
>   from HEAPTUPLE_INSERT_IN_PROGRESS to HEAPTUPLE_DEAD
> - when used with GlobalVisTestIsRemovableXid(), HEAPTUPLE_DELETE_IN_PROGRESS
>   can change to HEAPTUPLE_DEAD if the horizon is updated
>
> However, HEAPTUPLE_DEAD will not change back to HEAPTUPLE_RECENTLY_DEAD or
> such.

Okay - sounds good.

> > Agreed. But at the very least we should be thinking about the
> > whole-page picture. If we made freezing a whole page cheap enough
> > (smaller WAL records), then maybe the separate all-visible state
> > becomes totally unnecessary (except for pg_upgrade).
>
> I doubt it. But I think getting rid of unnecessarily dirtying the page a
> second (or third) time would be a big enough win already...

Why do you doubt it? Do you suspect that the additional cost of
freezing the whole page (relative to just setting it all-visible)
cannot be reduced to the point where it can be ignored? That might be
true, but I'd like to know if that's what you meant. Because I can't
think of any other reason, and it's not like I haven't thought about
it a fair bit already.

-- 
Peter Geoghegan



pgsql-bugs by date:

Previous
From: Andres Freund
Date:
Subject: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Next
From: Peter Geoghegan
Date:
Subject: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum