Re: [BUGS] Re: BUG #14680: startup process on standby encounter adeadlock of TwoPhaseStateLock when redo 2PC xlog - Mailing list pgsql-bugs

From Michael Paquier
Subject Re: [BUGS] Re: BUG #14680: startup process on standby encounter adeadlock of TwoPhaseStateLock when redo 2PC xlog
Date
Msg-id CAB7nPqRRuR6ig5c5JWNKEdifHY9-SrJ4eWwNy7DC9sAkGzwJzA@mail.gmail.com
Whole thread Raw
In response to Re: [BUGS] Re: BUG #14680: startup process on standby encounter adeadlock of TwoPhaseStateLock when redo 2PC xlog  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: [BUGS] Re: BUG #14680: startup process on standby encounter adeadlock of TwoPhaseStateLock when redo 2PC xlog  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-bugs
On Sun, Jun 11, 2017 at 12:24 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Alvaro Herrera wrote:
>
> For some reason I sent a preliminary version of the email I was writing.
> Some additional thoughts that were supposed to be in there:

No problem. Thanks for the input.

>> I also agree that restoreTwoPhaseData doesn't need to hold the lock,
>> since we don't expect anything running concurrently, but that it's okay
>> to hold it and makes the whole thing simpler.
>
> It's okay to hold the lock for the whole duration of the function.

Yup.

>> We could try to apply an equivalent rationale to
>> RecoverPreparedTransactions: even though we have already been running in
>> HOT standby mode for a while, there's no possibility of concurrent
>> activity either: since we exited recovery, master cannot write any more
>> 2PC xacts, and we haven't yet set the flag that lets open sessions write
>> WAL.  However, it seems mildly dangerous to run the bottom sections of
>> the loop with the lock held, since it acquires other lwlocks and
>> generally does a lot of crap.
>
> Therefore, let's adopt your idea of acquiring the lock only for the
> specific low-level calls that require it.  But the comment atop the loop
> needs a lot of work to explain why it's doing that (i.e. how come it's
> reading TwoPhaseState without a lock), as in my proposed patch.
>
> (At first, I didn't really like this very much and wanted to add more
> locking to that function, but it turned quite messy, so I back-tracked).

Yes, I got the same temptation but give up as well because of the
unnecessary complications. GXactLoadSubxactData() does nothing fancy,
but my worries are in ProcessRecords() and PostPrepare_Twophase(),
particularly if they do more complicated and fancy stuff in the future
for whatever reason. The comment you have added is this one:
        /*
-        * Don't need a lock in the recovery phase.
+        * It's okay to access TwoPhaseState without a lock here: recovery is
+        * finished (so if we were a standby, there's no master that can prepare
+        * transactions anymore), and we haven't yet set WAL as open for writes,
+        * so local existing backends, if any, cannot do so either.
We could use a
+        * coding pattern similar to restoreTwoPhaseData, i.e., run
the whole loop
+        * with the lock held; but this loop is far more complex, so
instead only
+        * grab the lock while calling the low-level functions that require it.
         */
I have reworked the comment as follows:
    /*
-    * Don't need a lock in the recovery phase.
+    * It is fine to access TwoPhaseState without a lock here: recovery is
+    * finished (so if we were a standby, there's no master that can prepare
+    * transactions anymore), and we haven't yet set WAL as open for writes,
+    * so local existing backends, if any, cannot do so either.  We could use a
+    * coding pattern similar to restoreTwoPhaseData, i.e., run the whole loop
+    * with the lock held; but this loop is far more complex, so instead only
+    * grab the lock while calling the low-level functions working directly on
+    * manipulating the two-phase state data.  Functions working directly on
+    * PGPROC entries linked with the two-phase transaction work with other
+    * types of locks but we don't want to complicate that more than necessary.
     */
The v5 attached does not have any other changes than what you did in
v4 (plus typos noticed).

-   if (info == XLOG_XACT_COMMIT || info == XLOG_XACT_COMMIT_PREPARED)
+   if (info == XLOG_XACT_COMMIT)
    {
        xl_xact_commit *xlrec = (xl_xact_commit *) XLogRecGetData(record);
        xl_xact_parsed_commit parsed;
No objections to the shuffling done here.

+ * An exclusive lock on TwoPhaseStateLock is taken for the duration of the
+ * scan of TwoPhaseState data as its may be updated.
Typo noticed here => s/its/it/.
-- 
Michael

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Attachment

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: [BUGS] BUG #14701: pg_dump fails to dump pg_catalog schema
Next
From: Tom Lane
Date:
Subject: Re: [BUGS] BUG #14701: pg_dump fails to dump pg_catalog schema