Re: raise ERROR between EndPrepare and PostPrepare_Locks causes ROLLBACK 2pc PAINC - Mailing list pgsql-hackers
| From | Andy Fan |
|---|---|
| Subject | Re: raise ERROR between EndPrepare and PostPrepare_Locks causes ROLLBACK 2pc PAINC |
| Date | |
| Msg-id | 874im44mi9.fsf@163.com Whole thread Raw |
| In response to | Re: raise ERROR between EndPrepare and PostPrepare_Locks causes ROLLBACK 2pc PAINC (Michael Paquier <michael@paquier.xyz>) |
| List | pgsql-hackers |
Michael Paquier <michael@paquier.xyz> writes: Hi, Thanks for showing interests for this topic! > On Wed, Mar 25, 2026 at 08:39:07AM +0800, Andy Fan wrote: >> I found a similar but not exactly same case at 2014 [1] which >> might be helpful to recall a boarder understanding on this area. >> >> [1] https://www.postgresql.org/message-id/534AF601.1030007%40vmware.com > > Incorrect shared state when an ERROR happens at an arbitrary location > is usually bad, yes. > > For this one, your suggestion of delaying the end of the critical > section started at StartPrepare() and ending in EndPrepare() is not an > acceptable solution as far as I can see, unfortunately: it would mean > doing a SyncRepWaitForLSN() while in a critical section, and I doubt > we'd want to do that. I do have a more general question: (Q1). what is critical section designed to? (Q2). What is the badness if we put more code into it except the ERROR->PANIC logic? Then (Q3) will SyncRepWaitForLSN be possible to raise ERROR? if no for now and possible in future, then my Q2 raise. > Anyway, I doubt that this one is worth caring for. The current locking > 2PC scheme means, as far as I remember, that it is not really possible > to interact with an external command in a specific session between > the EndPrepare() and the PostPrepare_Locks() > calls. Then Q3 comes. The deeper answer might be Q2 or Q1. > To put it in other words, let's imagine that we use a breakpoint > between these two calls (or a wait injection point if you automate > that). Is it possible for a second backend to mess with the state of > the first backend waiting until its locks are transfered to the dummy > PGPROC entry? That's what the 2014 thread is about: there was a race > condition reachable between two sessions. This is true, so the issue 2014 thread is more critical than the current one and which has been fixed. > If the answer to this question is yes, I'd agree that this is > something that deserves a closer lookup. Generally yes.. But I can't stop thinking the Q3 -> Q2 -> Q1 when I want to accept this asnwer. > And before you ask: attempting to interact with a 2PC > state from a second session with a first session waiting between these > two points would not work: the 2PC entry is locked, cleaned up after > EndPrepare() and PostPrepare_Locks() at PostPrepare_Twophase(). > Trying to request an access to this entry fails, as the first backend > is marked as locking it. A second backend attempting to lock it would > fail, complaining that the 2PC entry with a GXID is "busy". I can understand what you are saying now, but what does it make difference on the above case? > SyncRepWaitForLSN() would be a problematic pattern between the > EndPrepare() and the PostPrepare_Locks(), but we never ERROR there on > purpose: even if we cancel while waiting for a transaction commit we'd > just get a WARNING, meaning that we'd be able to transfer our locks > anyway. again Q2 -> Q3. > Or perhaps you have a realistic scenario where it is possible to mess > up with the shared state, outside a elog(ERROR) forced between these > two points? No really. I just inject some exception on some predefined places. I even don't know why people defined this places before. As for me, I prefer to know MORE design points for the CRITIAL SECTION besides what I side before. -- Best Regards Andy Fan
pgsql-hackers by date: