Re: [BUGS] Re: BUG #14680: startup process on standby encounter adeadlock of TwoPhaseStateLock when redo 2PC xlog - Mailing list pgsql-bugs
From | Michael Paquier |
---|---|
Subject | Re: [BUGS] Re: BUG #14680: startup process on standby encounter adeadlock of TwoPhaseStateLock when redo 2PC xlog |
Date | |
Msg-id | CAB7nPqRRuR6ig5c5JWNKEdifHY9-SrJ4eWwNy7DC9sAkGzwJzA@mail.gmail.com Whole thread Raw |
In response to | Re: [BUGS] Re: BUG #14680: startup process on standby encounter adeadlock of TwoPhaseStateLock when redo 2PC xlog (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Responses |
Re: [BUGS] Re: BUG #14680: startup process on standby encounter adeadlock of TwoPhaseStateLock when redo 2PC xlog
(Alvaro Herrera <alvherre@2ndquadrant.com>)
|
List | pgsql-bugs |
On Sun, Jun 11, 2017 at 12:24 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Alvaro Herrera wrote: > > For some reason I sent a preliminary version of the email I was writing. > Some additional thoughts that were supposed to be in there: No problem. Thanks for the input. >> I also agree that restoreTwoPhaseData doesn't need to hold the lock, >> since we don't expect anything running concurrently, but that it's okay >> to hold it and makes the whole thing simpler. > > It's okay to hold the lock for the whole duration of the function. Yup. >> We could try to apply an equivalent rationale to >> RecoverPreparedTransactions: even though we have already been running in >> HOT standby mode for a while, there's no possibility of concurrent >> activity either: since we exited recovery, master cannot write any more >> 2PC xacts, and we haven't yet set the flag that lets open sessions write >> WAL. However, it seems mildly dangerous to run the bottom sections of >> the loop with the lock held, since it acquires other lwlocks and >> generally does a lot of crap. > > Therefore, let's adopt your idea of acquiring the lock only for the > specific low-level calls that require it. But the comment atop the loop > needs a lot of work to explain why it's doing that (i.e. how come it's > reading TwoPhaseState without a lock), as in my proposed patch. > > (At first, I didn't really like this very much and wanted to add more > locking to that function, but it turned quite messy, so I back-tracked). Yes, I got the same temptation but give up as well because of the unnecessary complications. GXactLoadSubxactData() does nothing fancy, but my worries are in ProcessRecords() and PostPrepare_Twophase(), particularly if they do more complicated and fancy stuff in the future for whatever reason. The comment you have added is this one: /* - * Don't need a lock in the recovery phase. + * It's okay to access TwoPhaseState without a lock here: recovery is + * finished (so if we were a standby, there's no master that can prepare + * transactions anymore), and we haven't yet set WAL as open for writes, + * so local existing backends, if any, cannot do so either. We could use a + * coding pattern similar to restoreTwoPhaseData, i.e., run the whole loop + * with the lock held; but this loop is far more complex, so instead only + * grab the lock while calling the low-level functions that require it. */ I have reworked the comment as follows: /* - * Don't need a lock in the recovery phase. + * It is fine to access TwoPhaseState without a lock here: recovery is + * finished (so if we were a standby, there's no master that can prepare + * transactions anymore), and we haven't yet set WAL as open for writes, + * so local existing backends, if any, cannot do so either. We could use a + * coding pattern similar to restoreTwoPhaseData, i.e., run the whole loop + * with the lock held; but this loop is far more complex, so instead only + * grab the lock while calling the low-level functions working directly on + * manipulating the two-phase state data. Functions working directly on + * PGPROC entries linked with the two-phase transaction work with other + * types of locks but we don't want to complicate that more than necessary. */ The v5 attached does not have any other changes than what you did in v4 (plus typos noticed). - if (info == XLOG_XACT_COMMIT || info == XLOG_XACT_COMMIT_PREPARED) + if (info == XLOG_XACT_COMMIT) { xl_xact_commit *xlrec = (xl_xact_commit *) XLogRecGetData(record); xl_xact_parsed_commit parsed; No objections to the shuffling done here. + * An exclusive lock on TwoPhaseStateLock is taken for the duration of the + * scan of TwoPhaseState data as its may be updated. Typo noticed here => s/its/it/. -- Michael -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Attachment
pgsql-bugs by date: