Thread: [HACKERS] Re: BUG #14680: startup process on standby encounter a deadlock ofTwoPhaseStateLock when redo 2PC xlog
[HACKERS] Re: BUG #14680: startup process on standby encounter a deadlock ofTwoPhaseStateLock when redo 2PC xlog
From
Noah Misch
Date:
On Thu, Jun 01, 2017 at 01:07:53AM -0700, Michael Paquier wrote: > On Wed, May 31, 2017 at 12:30 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: > > On Wed, May 31, 2017 at 6:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> wangchuanting@huawei.com writes: > >>> startup process on standby encounter a deadlock of TwoPhaseStateLock when > >>> redo 2PC xlog. > >> > >> Please provide an example of how to get into this state. > > > > That would help. Are you seeing in the logs something like "removing > > future two-phase state from memory for XXX" or "removing stale > > two-phase state from shared memory for XXX"? > > > > Even with that, the light-weight lock sequence taken in those code > > paths look definitely wrong to me, we should not take twice > > TwoPhaseStateLock in the same code path. I think that we should remove > > the lock acquisitions in RemoveGXact() and PrepareRedoRemove, and then > > upgrade the locks of PrescanPreparedTransactions() and > > StandbyRecoverPreparedTransactions() to be exclusive. We still need to > > keep a lock as CheckPointTwoPhase() may still be triggered by the > > checkpoint. Tom, what do you think? > > Attached is what I was thinking about for reference. I just came back > from a long flight and I am pretty tired, so my brain may have missed > something. I'll take again a look at this issue on Monday, an open > item has been added for now. [Action required within three days. This is a generic notification.] The above-described topic is currently a PostgreSQL 10 open item. Simon, since you committed the patch believed to have created it, you own this open item. If some other commit is more relevant or if this does not belong as a v10 open item, please let us know. Otherwise, please observe the policy on open item ownership[1] and send a status update within three calendar days of this message. Include a date for your subsequent status update. Testers may discover new open items at any time, and I want to plan to get them all fixed well in advance of shipping v10. Consequently, I will appreciate your efforts toward speedy resolution. Thanks. [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
[HACKERS] Re: BUG #14680: startup process on standby encounter a deadlock ofTwoPhaseStateLock when redo 2PC xlog
From
Michael Paquier
Date:
On Mon, Jun 5, 2017 at 7:24 AM, Noah Misch <noah@leadboat.com> wrote: > [Action required within three days. This is a generic notification.] > > The above-described topic is currently a PostgreSQL 10 open item. Simon, > since you committed the patch believed to have created it, you own this open > item. If some other commit is more relevant or if this does not belong as a > v10 open item, please let us know. Otherwise, please observe the policy on > open item ownership[1] and send a status update within three calendar days of > this message. Include a date for your subsequent status update. Testers may > discover new open items at any time, and I want to plan to get them all fixed > well in advance of shipping v10. Consequently, I will appreciate your efforts > toward speedy resolution. Thanks. I have just provided a patch that takes care of solving this issue and cleans up the lock handling for all the 2PC redo code paths. -- Michael
[HACKERS] Re: BUG #14680: startup process on standby encounter a deadlock ofTwoPhaseStateLock when redo 2PC xlog
From
Noah Misch
Date:
On Sun, Jun 04, 2017 at 10:24:30PM +0000, Noah Misch wrote: > On Thu, Jun 01, 2017 at 01:07:53AM -0700, Michael Paquier wrote: > > On Wed, May 31, 2017 at 12:30 PM, Michael Paquier > > <michael.paquier@gmail.com> wrote: > > > On Wed, May 31, 2017 at 6:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > >> wangchuanting@huawei.com writes: > > >>> startup process on standby encounter a deadlock of TwoPhaseStateLock when > > >>> redo 2PC xlog. > > >> > > >> Please provide an example of how to get into this state. > > > > > > That would help. Are you seeing in the logs something like "removing > > > future two-phase state from memory for XXX" or "removing stale > > > two-phase state from shared memory for XXX"? > > > > > > Even with that, the light-weight lock sequence taken in those code > > > paths look definitely wrong to me, we should not take twice > > > TwoPhaseStateLock in the same code path. I think that we should remove > > > the lock acquisitions in RemoveGXact() and PrepareRedoRemove, and then > > > upgrade the locks of PrescanPreparedTransactions() and > > > StandbyRecoverPreparedTransactions() to be exclusive. We still need to > > > keep a lock as CheckPointTwoPhase() may still be triggered by the > > > checkpoint. Tom, what do you think? > > > > Attached is what I was thinking about for reference. I just came back > > from a long flight and I am pretty tired, so my brain may have missed > > something. I'll take again a look at this issue on Monday, an open > > item has been added for now. > > [Action required within three days. This is a generic notification.] > > The above-described topic is currently a PostgreSQL 10 open item. Simon, > since you committed the patch believed to have created it, you own this open > item. If some other commit is more relevant or if this does not belong as a > v10 open item, please let us know. Otherwise, please observe the policy on > open item ownership[1] and send a status update within three calendar days of > this message. Include a date for your subsequent status update. Testers may > discover new open items at any time, and I want to plan to get them all fixed > well in advance of shipping v10. Consequently, I will appreciate your efforts > toward speedy resolution. Thanks. > > [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com This PostgreSQL 10 open item is past due for your status update. Kindly send a status update within 24 hours, and include a date for your subsequent status update. Refer to the policy on open item ownership: https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
[HACKERS] Re: BUG #14680: startup process on standby encounter a deadlock ofTwoPhaseStateLock when redo 2PC xlog
From
Michael Paquier
Date:
On Fri, Jun 9, 2017 at 3:17 PM, Noah Misch <noah@leadboat.com> wrote: > This PostgreSQL 10 open item is past due for your status update. Kindly send > a status update within 24 hours, and include a date for your subsequent status > update. Refer to the policy on open item ownership: > https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com I have sent an updated patch simplifying the locking here: https://www.postgresql.org/message-id/CAB7nPqQthLP9GvD2242epHKOBkDMd+03tSuFvK3cVZsGarQyWA@mail.gmail.com -- Michael
[HACKERS] Re: BUG #14680: startup process on standby encounter a deadlock ofTwoPhaseStateLock when redo 2PC xlog
From
Noah Misch
Date:
On Thu, Jun 08, 2017 at 11:17:38PM -0700, Noah Misch wrote: > On Sun, Jun 04, 2017 at 10:24:30PM +0000, Noah Misch wrote: > > On Thu, Jun 01, 2017 at 01:07:53AM -0700, Michael Paquier wrote: > > > On Wed, May 31, 2017 at 12:30 PM, Michael Paquier > > > <michael.paquier@gmail.com> wrote: > > > > On Wed, May 31, 2017 at 6:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > >> wangchuanting@huawei.com writes: > > > >>> startup process on standby encounter a deadlock of TwoPhaseStateLock when > > > >>> redo 2PC xlog. > > > >> > > > >> Please provide an example of how to get into this state. > > > > > > > > That would help. Are you seeing in the logs something like "removing > > > > future two-phase state from memory for XXX" or "removing stale > > > > two-phase state from shared memory for XXX"? > > > > > > > > Even with that, the light-weight lock sequence taken in those code > > > > paths look definitely wrong to me, we should not take twice > > > > TwoPhaseStateLock in the same code path. I think that we should remove > > > > the lock acquisitions in RemoveGXact() and PrepareRedoRemove, and then > > > > upgrade the locks of PrescanPreparedTransactions() and > > > > StandbyRecoverPreparedTransactions() to be exclusive. We still need to > > > > keep a lock as CheckPointTwoPhase() may still be triggered by the > > > > checkpoint. Tom, what do you think? > > > > > > Attached is what I was thinking about for reference. I just came back > > > from a long flight and I am pretty tired, so my brain may have missed > > > something. I'll take again a look at this issue on Monday, an open > > > item has been added for now. > > > > [Action required within three days. This is a generic notification.] > > > > The above-described topic is currently a PostgreSQL 10 open item. Simon, > > since you committed the patch believed to have created it, you own this open > > item. If some other commit is more relevant or if this does not belong as a > > v10 open item, please let us know. Otherwise, please observe the policy on > > open item ownership[1] and send a status update within three calendar days of > > this message. Include a date for your subsequent status update. Testers may > > discover new open items at any time, and I want to plan to get them all fixed > > well in advance of shipping v10. Consequently, I will appreciate your efforts > > toward speedy resolution. Thanks. > > > > [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com > > This PostgreSQL 10 open item is past due for your status update. Kindly send > a status update within 24 hours, and include a date for your subsequent status > update. Refer to the policy on open item ownership: > https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com IMMEDIATE ATTENTION REQUIRED. This PostgreSQL 10 open item is long past due for your status update. Please reacquaint yourself with the policy on open item ownership[1] and then reply immediately. If I do not hear from you by 2017-06-11 07:00 UTC, I will transfer this item to release management team ownership without further notice. [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
[HACKERS] Re: [BUGS] Re: BUG #14680: startup process on standby encounter adeadlock of TwoPhaseStateLock when redo 2PC xlog
From
Alvaro Herrera
Date:
Noah Misch wrote: > IMMEDIATE ATTENTION REQUIRED. This PostgreSQL 10 open item is long past due > for your status update. Please reacquaint yourself with the policy on open > item ownership[1] and then reply immediately. If I do not hear from you by > 2017-06-11 07:00 UTC, I will transfer this item to release management team > ownership without further notice. I volunteer to own this item. I'll report on Wednesday 14th at the latest. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services