Thread: [HACKERS] Re: BUG #14680: startup process on standby encounter a deadlock ofTwoPhaseStateLock when redo 2PC xlog

On Thu, Jun 01, 2017 at 01:07:53AM -0700, Michael Paquier wrote:
> On Wed, May 31, 2017 at 12:30 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
> > On Wed, May 31, 2017 at 6:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> wangchuanting@huawei.com writes:
> >>> startup process on standby encounter a deadlock of TwoPhaseStateLock when
> >>> redo 2PC xlog.
> >>
> >> Please provide an example of how to get into this state.
> >
> > That would help. Are you seeing in the logs something like "removing
> > future two-phase state from memory for XXX" or "removing stale
> > two-phase state from shared memory for XXX"?
> >
> > Even with that, the light-weight lock sequence taken in those code
> > paths look definitely wrong to me, we should not take twice
> > TwoPhaseStateLock in the same code path. I think that we should remove
> > the lock acquisitions in RemoveGXact() and PrepareRedoRemove, and then
> > upgrade the locks of PrescanPreparedTransactions() and
> > StandbyRecoverPreparedTransactions() to be exclusive. We still need to
> > keep a lock as CheckPointTwoPhase() may still be triggered by the
> > checkpoint. Tom, what do you think?
> 
> Attached is what I was thinking about for reference. I just came back
> from a long flight and I am pretty tired, so my brain may have missed
> something. I'll take again a look at this issue on Monday, an open
> item has been added for now.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Simon,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com



On Mon, Jun 5, 2017 at 7:24 AM, Noah Misch <noah@leadboat.com> wrote:
> [Action required within three days.  This is a generic notification.]
>
> The above-described topic is currently a PostgreSQL 10 open item.  Simon,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping v10.  Consequently, I will appreciate your efforts
> toward speedy resolution.  Thanks.

I have just provided a patch that takes care of solving this issue and
cleans up the lock handling for all the 2PC redo code paths.
-- 
Michael



On Sun, Jun 04, 2017 at 10:24:30PM +0000, Noah Misch wrote:
> On Thu, Jun 01, 2017 at 01:07:53AM -0700, Michael Paquier wrote:
> > On Wed, May 31, 2017 at 12:30 PM, Michael Paquier
> > <michael.paquier@gmail.com> wrote:
> > > On Wed, May 31, 2017 at 6:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > >> wangchuanting@huawei.com writes:
> > >>> startup process on standby encounter a deadlock of TwoPhaseStateLock when
> > >>> redo 2PC xlog.
> > >>
> > >> Please provide an example of how to get into this state.
> > >
> > > That would help. Are you seeing in the logs something like "removing
> > > future two-phase state from memory for XXX" or "removing stale
> > > two-phase state from shared memory for XXX"?
> > >
> > > Even with that, the light-weight lock sequence taken in those code
> > > paths look definitely wrong to me, we should not take twice
> > > TwoPhaseStateLock in the same code path. I think that we should remove
> > > the lock acquisitions in RemoveGXact() and PrepareRedoRemove, and then
> > > upgrade the locks of PrescanPreparedTransactions() and
> > > StandbyRecoverPreparedTransactions() to be exclusive. We still need to
> > > keep a lock as CheckPointTwoPhase() may still be triggered by the
> > > checkpoint. Tom, what do you think?
> > 
> > Attached is what I was thinking about for reference. I just came back
> > from a long flight and I am pretty tired, so my brain may have missed
> > something. I'll take again a look at this issue on Monday, an open
> > item has been added for now.
> 
> [Action required within three days.  This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 10 open item.  Simon,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping v10.  Consequently, I will appreciate your efforts
> toward speedy resolution.  Thanks.
> 
> [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com



On Fri, Jun 9, 2017 at 3:17 PM, Noah Misch <noah@leadboat.com> wrote:
> This PostgreSQL 10 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.  Refer to the policy on open item ownership:
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

I have sent an updated patch simplifying the locking here:
https://www.postgresql.org/message-id/CAB7nPqQthLP9GvD2242epHKOBkDMd+03tSuFvK3cVZsGarQyWA@mail.gmail.com
-- 
Michael



On Thu, Jun 08, 2017 at 11:17:38PM -0700, Noah Misch wrote:
> On Sun, Jun 04, 2017 at 10:24:30PM +0000, Noah Misch wrote:
> > On Thu, Jun 01, 2017 at 01:07:53AM -0700, Michael Paquier wrote:
> > > On Wed, May 31, 2017 at 12:30 PM, Michael Paquier
> > > <michael.paquier@gmail.com> wrote:
> > > > On Wed, May 31, 2017 at 6:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > >> wangchuanting@huawei.com writes:
> > > >>> startup process on standby encounter a deadlock of TwoPhaseStateLock when
> > > >>> redo 2PC xlog.
> > > >>
> > > >> Please provide an example of how to get into this state.
> > > >
> > > > That would help. Are you seeing in the logs something like "removing
> > > > future two-phase state from memory for XXX" or "removing stale
> > > > two-phase state from shared memory for XXX"?
> > > >
> > > > Even with that, the light-weight lock sequence taken in those code
> > > > paths look definitely wrong to me, we should not take twice
> > > > TwoPhaseStateLock in the same code path. I think that we should remove
> > > > the lock acquisitions in RemoveGXact() and PrepareRedoRemove, and then
> > > > upgrade the locks of PrescanPreparedTransactions() and
> > > > StandbyRecoverPreparedTransactions() to be exclusive. We still need to
> > > > keep a lock as CheckPointTwoPhase() may still be triggered by the
> > > > checkpoint. Tom, what do you think?
> > > 
> > > Attached is what I was thinking about for reference. I just came back
> > > from a long flight and I am pretty tired, so my brain may have missed
> > > something. I'll take again a look at this issue on Monday, an open
> > > item has been added for now.
> > 
> > [Action required within three days.  This is a generic notification.]
> > 
> > The above-described topic is currently a PostgreSQL 10 open item.  Simon,
> > since you committed the patch believed to have created it, you own this open
> > item.  If some other commit is more relevant or if this does not belong as a
> > v10 open item, please let us know.  Otherwise, please observe the policy on
> > open item ownership[1] and send a status update within three calendar days of
> > this message.  Include a date for your subsequent status update.  Testers may
> > discover new open items at any time, and I want to plan to get them all fixed
> > well in advance of shipping v10.  Consequently, I will appreciate your efforts
> > toward speedy resolution.  Thanks.
> > 
> > [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> 
> This PostgreSQL 10 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.  Refer to the policy on open item ownership:
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is long past due
for your status update.  Please reacquaint yourself with the policy on open
item ownership[1] and then reply immediately.  If I do not hear from you by
2017-06-11 07:00 UTC, I will transfer this item to release management team
ownership without further notice.

[1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com



Noah Misch wrote:

> IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is long past due
> for your status update.  Please reacquaint yourself with the policy on open
> item ownership[1] and then reply immediately.  If I do not hear from you by
> 2017-06-11 07:00 UTC, I will transfer this item to release management team
> ownership without further notice.

I volunteer to own this item.  I'll report on Wednesday 14th at the latest.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services