Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum - Mailing list pgsql-bugs
From | Andres Freund |
---|---|
Subject | Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum |
Date | |
Msg-id | 20211110192010.ckvfzz352hsba5xf@alap3.anarazel.de Whole thread Raw |
In response to | Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum |
List | pgsql-bugs |
Hi, On 2021-11-09 15:31:37 -0800, Peter Geoghegan wrote: > I'm not sure why this seems to have become more of a problem following > the snapshot scalability work from Andres -- Alexander mentioned that > commit dc7420c2 looked like it was the source of the problem here, but > I can't see any reason why that might be true (even though I accept > that it might well *appear* to be true). I believe Andres has some > theory on that, but I don't know the details myself. AFAICT, this is a > live bug on all supported versions. We simply weren't being careful > enough about breaking the invariant that an LP_REDIRECT can only point > to a valid heap-only tuple. The really surprising thing here is that > it took this long for it to visibly break. The way this definitely breaks - I have been able to reproduce this in isolation - is when one tuple is processed twice by heap_prune_chain(), and the result of HeapTupleSatisfiesVacuum() changes from HEAPTUPLE_DELETE_IN_PROGRESS to DEAD. Consider a page like this: lp 1: redirect to lp2 lp 2: deleted by xid x, not yet committed and a sequence of events like this: 1) heap_prune_chain(rootlp = 1) 2) commit x 3) heap_prune_chain(rootlp = 2) 1) heap_prune_chain(rootlp = 1) will go to lp2, and see a HEAPTUPLE_DELETE_IN_PROGRESS and thus not do anything. 3) then could, with the snapshot scalability changes, get DEAD back from HTSV. Due to the "fuzzy" nature of the post-snapshot-scalability xid horizons, that is possible, because we can end up rechecking the boundary condition and seeing that now the horizon allows us to prune x / lp2. At that point we have a redirect tuple pointing into an unused slot. Which is "illegal", because something independent can be inserted into that slot. What made this hard to understand (and likely hard to hit) is that we don't recompute the xid horizons more than once per hot pruning ([1]). At first I concluded that a change from RECENTLY_DEAD to DEAD could thus not happen - and it doesn't: We go from HEAPTUPLE_DELETE_IN_PROGRESS to DEAD, which is possible because there was no horizon test for HEAPTUPLE_DELETE_IN_PROGRESS. Note that there are several paths < 14, that cause HTSV()'s answer to change for the same xid. E.g. when the transaction inserting a tuple version aborts, we go from HEAPTUPLE_INSERT_IN_PROGRESS to DEAD. But I haven't quite found a path to trigger problems with that, because there won't be redirects to a tuple version that is HEAPTUPLE_INSERT_IN_PROGRESS (but there can be redirects to a HEAPTUPLE_DELETE_IN_PROGRESS or RECENTLY_DEAD). I hit a crash once in 13 with a slightly evolved version of the test (many connections creating / dropping the partitions as in the original scenario, using :client_id to target different tables). It's possible that my instrumentation was the cause of that. Unfortunately it took quite a few hours to hit the problem in 13... Greetings, Andres Freund [1] it's a bit more complicated than that, we only recompute the horizon when a) we've not done it before in the current xact, b) RecentXmin changed during a snapshot computation. Recomputing the horizon is expensive-ish, so we don't want to do it constantly.
pgsql-bugs by date: