Re: PANIC: wrong buffer passed to visibilitymap_clear - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: PANIC: wrong buffer passed to visibilitymap_clear |
Date | |
Msg-id | 20210412204046.p3mh7wwfs44f2qid@alap3.anarazel.de Whole thread Raw |
In response to | Re: PANIC: wrong buffer passed to visibilitymap_clear (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
Hi, On 2021-04-11 13:55:30 -0400, Tom Lane wrote: > Either way, it's hard to argue that heap_update hasn't crossed the > complexity threshold where it's impossible to maintain safely. We > need to simplify it. Yea, I think we're well beyond that point. I can see a few possible steps to wrangle the existing complexity into an easier to understand shape: - Rename heapam.c goto labels, they're useless to understand what is happening. - Move HeapTupleSatisfiesUpdate() call and the related branches afterwards into its own function. - Move "temporarily mark it locked" branch into its own function. It's a minimal implementation of tuple locking, so it seems more than separate enough. - Move the "store the new tuple" part into its own function (pretty much the critical section). - Probably worth unifying the exit paths - there's a fair bit of duplication by now... Half related: - I think we might also need to do something about the proliferation of bitmaps in heap_update(). We now separately allocate 5 bitmapsets - that strikes me as fairly insane. However, these would not really address the complexity in itself, just make it easier to manage. ISTM that a lot of the complexity is related to needing to retry (and avoiding doing so unnecessarily), which in turn is related to avoiding deadlocks. We actually know how to not need that to the same degree - the (need_toast || newtupsize > pagefree) locks the tuple and afterwards has a lot more freedom. We obviously can't just always do that, due to the WAL logging overhead. I wonder if we could make that path avoid the WAL logging overhead. We don't actually need a full blown tuple lock, potentially even with its own multixact, here. The relevant comment (in heap_lock_tuple()) says: /* * XLOG stuff. You might think that we don't need an XLOG record because * there is no state change worth restoring after a crash. You would be * wrong however: we have just written either a TransactionId or a * MultiXactId that may never have been seen on disk before, and we need * to make sure that there are XLOG entries covering those ID numbers. * Else the same IDs might be re-used after a crash, which would be * disastrous if this page made it to disk before the crash. Essentially * we have to enforce the WAL log-before-data rule even in this case. * (Also, in a PITR log-shipping or 2PC environment, we have to have XLOG * entries for everything anyway.) */ But I don't really think that doing full-blown WAL tuple-locking WAL logging is really the right solution. - The "next xid" concerns are at least as easily solved by WAL logging a distinct "highest xid assigned" record when necessary. Either by having a shared memory variable saying "latestLoggedXid" or such, or by having end-of-recovery advance nextXid to beyond what ExtendCLOG() extended to. That reduces the overhead to at most once-per-xact (and commonly smaller) or nothing, respectively. - While there's obviously a good bit of simplicity ensuring that a replica is exactly the same ("Also, in a PITR log-shipping or 2PC environment ..."), we don't actually enforce that strictly anyway - so I am not sure why it's necessary to pay the price here. But maybe I'm all wet here, I certainly haven't had enough coffee. Greetings, Andres Freund
pgsql-hackers by date: