Re: [BUGS] Re: BUG #14680: startup process on standby encounter adeadlock of TwoPhaseStateLock when redo 2PC xlog - Mailing list pgsql-bugs

From Alvaro Herrera
Subject Re: [BUGS] Re: BUG #14680: startup process on standby encounter adeadlock of TwoPhaseStateLock when redo 2PC xlog
Date
Msg-id 20170611022536.goef4cdbmgicye5g@alvherre.pgsql
Whole thread Raw
In response to Re: [BUGS] Re: BUG #14680: startup process on standby encounter adeadlock of TwoPhaseStateLock when redo 2PC xlog  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: [BUGS] Re: BUG #14680: startup process on standby encounter adeadlock of TwoPhaseStateLock when redo 2PC xlog  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-bugs
Michael Paquier wrote:

> > i look carefully on the patch, because of we removed TwoPhaseStateLock
> > lwlock acquire in `RemoveGXact' and let caller held lwlock, so i think:
> > 1. xact_redo also need held lwlock before call PrepareRedoRemove
> > 2. RecoverPreparedTransactions also need held lwlock before call
> > ProcessTwoPhaseBuffer
> 
> Thanks for the input. I have reviewed all the code paths that have
> been modified, and strengthened the code with assertions using
> LWLockHeldByMeInMode() to make sure that the correct lock is always
> hold in those code paths. There is actually no point in holding the
> lock in restoreTwoPhaseData(), but as this makes the code less
> consistent with the rest I added one. Also, I have replaced the lock
> acquisition in PrepareRedoAdd() with acquisitions at higher levels,
> and added an assertion in this routine. This makes the 2PC state data
> addition and removal more consistent with each other.

Thanks for the patch -- I agree that the new coding looks better.

I also agree that restoreTwoPhaseData doesn't need to hold the lock,
since we don't expect anything running concurrently, but that it's okay
to hold it and makes the whole thing simpler.

We could try to apply an equivalent rationale to
RecoverPreparedTransactions: even though we have already been running in
HOT standby mode for a while, there's no possibility of concurrent
activity either: since we exited recovery, master cannot write any more
2PC xacts, and we haven't yet set the flag that lets open sessions write
WAL.  However, it seems mildly dangerous to run the bottom sections of
the loop with the lock held, since it acquires other lwlocks and
generally does a lot of crap.

Also, let's add an lwlock-is-held assertion to MarkAsPreparingGuts.

BTW: I think that saving one redundant line of code is not worth the
ugliness that it costs in xact_redo, so let's undo that.  The patch is a
bit bulky, but the resulting code is simpler.

In short, I propose the attached.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Attachment

pgsql-bugs by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: [BUGS] Re: BUG #14680: startup process on standby encounter adeadlock of TwoPhaseStateLock when redo 2PC xlog
Next
From: Alvaro Herrera
Date:
Subject: Re: [BUGS] Re: BUG #14680: startup process on standby encounter adeadlock of TwoPhaseStateLock when redo 2PC xlog