Thread: Deadlock between backend and recovery may not be detected
Hi, While reviewing the patch proposed at [1], I found that there is the case where deadlock that recovery conflict on lock is involved in may not be detected. This deadlock can happen between backends and the startup process, in the standby server. Please see the following procedure to reproduce the deadlock. #1. Set up streaming replication. #2. Set max_standby_streaming_delay to -1 in the standby. #3. Create two tables in the primary. [PRIMARY: SESSION1] CREATE TABLE t1 (); CREATE TABLE t2 (); #4. Start transaction and access to the table t1. [STANDBY: SESSION2] BEGIN; SELECT * FROM t1; #5. Start transaction and lock table t2 in access exclusive mode, in the primary. Also execute pg_switch_wal() to transfer WAL record for access exclusive lock to the standby. [PRIMARY: SESSION1] BEGIN; LOCK TABLE t2 IN ACCESS EXCLUSIVE MODE; SELECT pg_switch_wal(); #6. Access to the table t2 within the transaction that started at #4, in the standby. [STANDBY: SESSION2] SELECT * FROM t2; #7. Lock table t1 in access exclusive mode within the transaction that started in #5, in the primary. Also execute pg_switch_wal() to transfer WAL record for access exclusive lock to the standby. [PRIMARY: SESSION1] LOCK TABLE t1 IN ACCESS EXCLUSIVE MODE; SELECT pg_switch_wal(); After doing this procedure, you can see the startup process and backend wait for the table lock each other, i.e., deadlock. But this deadlock remains even after deadlock_timeout passes. This seems a bug to me. > * Deadlocks involving the Startup process and an ordinary backend process > * will be detected by the deadlock detector within the ordinary backend. The cause of this issue seems that ResolveRecoveryConflictWithLock() that the startup process calls when recovery conflict on lock happens doesn't take care of deadlock case at all. You can see this fact by reading the above source code comment for ResolveRecoveryConflictWithLock(). To fix this issue, I think that we should enable STANDBY_DEADLOCK_TIMEOUT timer in ResolveRecoveryConflictWithLock() so that the startup process can send PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal to the backend. Then if PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal arrives, the backend should check whether the deadlock actually happens or not. Attached is the POC patch implimenting this. Thought? Regards, [1] https://postgr.es/m/9a60178c-a853-1440-2cdc-c3af916cff59@amazon.com -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
ср, 16 дек. 2020 г. в 13:49, Fujii Masao <masao.fujii@oss.nttdata.com>:
After doing this procedure, you can see the startup process and backend
wait for the table lock each other, i.e., deadlock. But this deadlock remains
even after deadlock_timeout passes.
This seems a bug to me.
> * Deadlocks involving the Startup process and an ordinary backend process
> * will be detected by the deadlock detector within the ordinary backend.
The cause of this issue seems that ResolveRecoveryConflictWithLock() that
the startup process calls when recovery conflict on lock happens doesn't
take care of deadlock case at all. You can see this fact by reading the above
source code comment for ResolveRecoveryConflictWithLock().
To fix this issue, I think that we should enable STANDBY_DEADLOCK_TIMEOUT
timer in ResolveRecoveryConflictWithLock() so that the startup process can
send PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal to the backend.
Then if PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal arrives,
the backend should check whether the deadlock actually happens or not.
Attached is the POC patch implimenting this.
I agree that this is a bug.
Unfortunately, we've been hit by it in production.
Such deadlock will, eventually, make all sessions wait on the startup process, making
streaming replica unusable. In case replica is used for balancing out RO queries from the primary,
it causes downtime for the project.
If I understand things right, session will release it's locks when max_standby_streaming_delay is reached.
But it'd be much better if conflict is resolved faster, around deadlock_timeout.
So — huge +1 from me for fixing it.
But it'd be much better if conflict is resolved faster, around deadlock_timeout.
So — huge +1 from me for fixing it.
Victor Yegorov
Hi,
On 12/16/20 2:36 PM, Victor Yegorov wrote:
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
ср, 16 дек. 2020 г. в 13:49, Fujii Masao <masao.fujii@oss.nttdata.com>:After doing this procedure, you can see the startup process and backend
wait for the table lock each other, i.e., deadlock. But this deadlock remains
even after deadlock_timeout passes.
This seems a bug to me.
+1
> * Deadlocks involving the Startup process and an ordinary backend process
> * will be detected by the deadlock detector within the ordinary backend.
The cause of this issue seems that ResolveRecoveryConflictWithLock() that
the startup process calls when recovery conflict on lock happens doesn't
take care of deadlock case at all. You can see this fact by reading the above
source code comment for ResolveRecoveryConflictWithLock().
To fix this issue, I think that we should enable STANDBY_DEADLOCK_TIMEOUT
timer in ResolveRecoveryConflictWithLock() so that the startup process can
send PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal to the backend.
Then if PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal arrives,
the backend should check whether the deadlock actually happens or not.
Attached is the POC patch implimenting this.
good catch!
I don't see any obvious reasons why the STANDBY_DEADLOCK_TIMEOUT shouldn't be set in ResolveRecoveryConflictWithLock() too (it is already set in ResolveRecoveryConflictWithBufferPin()).
So + 1 to consider this as a bug and for the way the patch proposes to fix it. Bertrand
On 2020/12/16 23:28, Drouvot, Bertrand wrote: > Hi, > > On 12/16/20 2:36 PM, Victor Yegorov wrote: >> >> *CAUTION*: This email originated from outside of the organization. Do not click links or open attachments unless you canconfirm the sender and know the content is safe. >> >> >> ср, 16 дек. 2020 г. в 13:49, Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>: >> >> After doing this procedure, you can see the startup process and backend >> wait for the table lock each other, i.e., deadlock. But this deadlock remains >> even after deadlock_timeout passes. >> >> This seems a bug to me. >> > +1 > >> >> > * Deadlocks involving the Startup process and an ordinary backend process >> > * will be detected by the deadlock detector within the ordinary backend. >> >> The cause of this issue seems that ResolveRecoveryConflictWithLock() that >> the startup process calls when recovery conflict on lock happens doesn't >> take care of deadlock case at all. You can see this fact by reading the above >> source code comment for ResolveRecoveryConflictWithLock(). >> >> To fix this issue, I think that we should enable STANDBY_DEADLOCK_TIMEOUT >> timer in ResolveRecoveryConflictWithLock() so that the startup process can >> send PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal to the backend. >> Then if PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal arrives, >> the backend should check whether the deadlock actually happens or not. >> Attached is the POC patch implimenting this. >> > good catch! > > I don't see any obvious reasons why the STANDBY_DEADLOCK_TIMEOUT shouldn't be set in ResolveRecoveryConflictWithLock()too (it is already set in ResolveRecoveryConflictWithBufferPin()). > > So + 1 to consider this as a bug and for the way the patch proposes to fix it. Thanks Victor and Bertrand for agreeing! Attached is the updated version of the patch. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
On 2020/12/17 2:15, Fujii Masao wrote: > > > On 2020/12/16 23:28, Drouvot, Bertrand wrote: >> Hi, >> >> On 12/16/20 2:36 PM, Victor Yegorov wrote: >>> >>> *CAUTION*: This email originated from outside of the organization. Do not click links or open attachments unless youcan confirm the sender and know the content is safe. >>> >>> >>> ср, 16 дек. 2020 г. в 13:49, Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>: >>> >>> After doing this procedure, you can see the startup process and backend >>> wait for the table lock each other, i.e., deadlock. But this deadlock remains >>> even after deadlock_timeout passes. >>> >>> This seems a bug to me. >>> >> +1 >> >>> >>> > * Deadlocks involving the Startup process and an ordinary backend process >>> > * will be detected by the deadlock detector within the ordinary backend. >>> >>> The cause of this issue seems that ResolveRecoveryConflictWithLock() that >>> the startup process calls when recovery conflict on lock happens doesn't >>> take care of deadlock case at all. You can see this fact by reading the above >>> source code comment for ResolveRecoveryConflictWithLock(). >>> >>> To fix this issue, I think that we should enable STANDBY_DEADLOCK_TIMEOUT >>> timer in ResolveRecoveryConflictWithLock() so that the startup process can >>> send PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal to the backend. >>> Then if PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal arrives, >>> the backend should check whether the deadlock actually happens or not. >>> Attached is the POC patch implimenting this. >>> >> good catch! >> >> I don't see any obvious reasons why the STANDBY_DEADLOCK_TIMEOUT shouldn't be set in ResolveRecoveryConflictWithLock()too (it is already set in ResolveRecoveryConflictWithBufferPin()). >> >> So + 1 to consider this as a bug and for the way the patch proposes to fix it. > > Thanks Victor and Bertrand for agreeing! > Attached is the updated version of the patch. Attached is v3 of the patch. Could you review this version? While the startup process is waiting for recovery conflict on buffer pin, it repeats sending the request for deadlock check to all the backends every deadlock_timeout. This may increase the workload in the startup process and backends, but since this is the original behavior, the patch doesn't change that. Also in practice this may not be so harmful because the period that the buffer is kept pinned is basically not so long. On the other hand, IMO we should avoid this issue while the startup process is waiting for recovery conflict on locks since it can take a long time to release the locks. So the patch tries to fix it. Or I'm overthinking this? If this doesn't need to be handled, the patch can be simplified more. Thought? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
Hi, On 12/18/20 10:35 AM, Fujii Masao wrote: > CAUTION: This email originated from outside of the organization. Do > not click links or open attachments unless you can confirm the sender > and know the content is safe. > > > > On 2020/12/17 2:15, Fujii Masao wrote: >> >> >> On 2020/12/16 23:28, Drouvot, Bertrand wrote: >>> Hi, >>> >>> On 12/16/20 2:36 PM, Victor Yegorov wrote: >>>> >>>> *CAUTION*: This email originated from outside of the organization. >>>> Do not click links or open attachments unless you can confirm the >>>> sender and know the content is safe. >>>> >>>> >>>> ср, 16 дек. 2020 г. в 13:49, Fujii Masao >>>> <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>: >>>> >>>> After doing this procedure, you can see the startup process and >>>> backend >>>> wait for the table lock each other, i.e., deadlock. But this >>>> deadlock remains >>>> even after deadlock_timeout passes. >>>> >>>> This seems a bug to me. >>>> >>> +1 >>> >>>> >>>> > * Deadlocks involving the Startup process and an ordinary >>>> backend process >>>> > * will be detected by the deadlock detector within the >>>> ordinary backend. >>>> >>>> The cause of this issue seems that >>>> ResolveRecoveryConflictWithLock() that >>>> the startup process calls when recovery conflict on lock >>>> happens doesn't >>>> take care of deadlock case at all. You can see this fact by >>>> reading the above >>>> source code comment for ResolveRecoveryConflictWithLock(). >>>> >>>> To fix this issue, I think that we should enable >>>> STANDBY_DEADLOCK_TIMEOUT >>>> timer in ResolveRecoveryConflictWithLock() so that the startup >>>> process can >>>> send PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal to the >>>> backend. >>>> Then if PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal arrives, >>>> the backend should check whether the deadlock actually happens >>>> or not. >>>> Attached is the POC patch implimenting this. >>>> >>> good catch! >>> >>> I don't see any obvious reasons why the STANDBY_DEADLOCK_TIMEOUT >>> shouldn't be set in ResolveRecoveryConflictWithLock() too (it is >>> already set in ResolveRecoveryConflictWithBufferPin()). >>> >>> So + 1 to consider this as a bug and for the way the patch proposes >>> to fix it. >> >> Thanks Victor and Bertrand for agreeing! >> Attached is the updated version of the patch. > > Attached is v3 of the patch. Could you review this version? > > While the startup process is waiting for recovery conflict on buffer pin, > it repeats sending the request for deadlock check to all the backends > every deadlock_timeout. This may increase the workload in the startup > process and backends, but since this is the original behavior, the patch > doesn't change that. Agree. IMHO that may need to be rethink (as we are already in a conflict situation, i am tempted to say that the less is being done the better it is), but i think that's outside the scope of this patch. > Also in practice this may not be so harmful because > the period that the buffer is kept pinned is basically not so long. Agree that chances are less to be in this mode for a "long" duration (as compare to the lock conflict). > > On the other hand, IMO we should avoid this issue while the startup > process is waiting for recovery conflict on locks since it can take > a long time to release the locks. So the patch tries to fix it. Agree > > Or I'm overthinking this? If this doesn't need to be handled, > the patch can be simplified more. Thought? I do think that's good to handle it that way for the lock conflict: the less work is done the better it is (specially in a conflict situation). The patch does look good to me. Bertrand
On Fri, Dec 18, 2020 at 6:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/12/17 2:15, Fujii Masao wrote: > > > > > > On 2020/12/16 23:28, Drouvot, Bertrand wrote: > >> Hi, > >> > >> On 12/16/20 2:36 PM, Victor Yegorov wrote: > >>> > >>> *CAUTION*: This email originated from outside of the organization. Do not click links or open attachments unless youcan confirm the sender and know the content is safe. > >>> > >>> > >>> ср, 16 дек. 2020 г. в 13:49, Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>: > >>> > >>> After doing this procedure, you can see the startup process and backend > >>> wait for the table lock each other, i.e., deadlock. But this deadlock remains > >>> even after deadlock_timeout passes. > >>> > >>> This seems a bug to me. > >>> > >> +1 > >> > >>> > >>> > * Deadlocks involving the Startup process and an ordinary backend process > >>> > * will be detected by the deadlock detector within the ordinary backend. > >>> > >>> The cause of this issue seems that ResolveRecoveryConflictWithLock() that > >>> the startup process calls when recovery conflict on lock happens doesn't > >>> take care of deadlock case at all. You can see this fact by reading the above > >>> source code comment for ResolveRecoveryConflictWithLock(). > >>> > >>> To fix this issue, I think that we should enable STANDBY_DEADLOCK_TIMEOUT > >>> timer in ResolveRecoveryConflictWithLock() so that the startup process can > >>> send PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal to the backend. > >>> Then if PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal arrives, > >>> the backend should check whether the deadlock actually happens or not. > >>> Attached is the POC patch implimenting this. > >>> > >> good catch! > >> > >> I don't see any obvious reasons why the STANDBY_DEADLOCK_TIMEOUT shouldn't be set in ResolveRecoveryConflictWithLock()too (it is already set in ResolveRecoveryConflictWithBufferPin()). > >> > >> So + 1 to consider this as a bug and for the way the patch proposes to fix it. > > > > Thanks Victor and Bertrand for agreeing! > > Attached is the updated version of the patch. > > Attached is v3 of the patch. Could you review this version? > > While the startup process is waiting for recovery conflict on buffer pin, > it repeats sending the request for deadlock check to all the backends > every deadlock_timeout. This may increase the workload in the startup > process and backends, but since this is the original behavior, the patch > doesn't change that. Also in practice this may not be so harmful because > the period that the buffer is kept pinned is basically not so long. > @@ -529,6 +603,26 @@ ResolveRecoveryConflictWithBufferPin(void) */ ProcWaitForSignal(PG_WAIT_BUFFER_PIN); + if (got_standby_deadlock_timeout) + { + /* + * Send out a request for hot-standby backends to check themselves for + * deadlocks. + * + * XXX The subsequent ResolveRecoveryConflictWithBufferPin() will wait + * to be signaled by UnpinBuffer() again and send a request for + * deadlocks check if deadlock_timeout happens. This causes the + * request to continue to be sent every deadlock_timeout until the + * buffer is unpinned or ltime is reached. This would increase the + * workload in the startup process and backends. In practice it may + * not be so harmful because the period that the buffer is kept pinned + * is basically no so long. But we should fix this? + */ + SendRecoveryConflictWithBufferPin( + PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK); + got_standby_deadlock_timeout = false; + } + Since SendRecoveryConflictWithBufferPin() sends the signal to all backends every backend who is waiting on a lock at ProcSleep() and not holding a buffer pin blocking the startup process will end up doing a deadlock check, which seems expensive. What is worse is that the deadlock will not be detected because the deadlock involving a buffer pin isn't detected by CheckDeadLock(). I thought we can replace PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK with PROCSIG_RECOVERY_CONFLICT_BUFFERPIN but it’s not good because the backend who has a buffer pin blocking the startup process and not waiting on a lock is also canceled after deadlock_timeout. We can have the backend return from RecoveryConflictInterrupt() when it received PROCSIG_RECOVERY_CONFLICT_BUFFERPIN and is not waiting on any lock, but it’s also not good because we cannot cancel the backend after max_standby_streaming_delay that has a buffer pin blocking the startup process. So I wonder if we can have a new signal. When the backend received this signal it returns from RecoveryConflictInterrupt() without deadlock check either if it’s not waiting on any lock or if it doesn’t have a buffer pin blocking the startup process. Otherwise it's cancelled. Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/
On 2020/12/19 1:43, Drouvot, Bertrand wrote: > Hi, > > On 12/18/20 10:35 AM, Fujii Masao wrote: >> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you canconfirm the sender and know the content is safe. >> >> >> >> On 2020/12/17 2:15, Fujii Masao wrote: >>> >>> >>> On 2020/12/16 23:28, Drouvot, Bertrand wrote: >>>> Hi, >>>> >>>> On 12/16/20 2:36 PM, Victor Yegorov wrote: >>>>> >>>>> *CAUTION*: This email originated from outside of the organization. Do not click links or open attachments unless youcan confirm the sender and know the content is safe. >>>>> >>>>> >>>>> ср, 16 дек. 2020 г. в 13:49, Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>: >>>>> >>>>> After doing this procedure, you can see the startup process and backend >>>>> wait for the table lock each other, i.e., deadlock. But this deadlock remains >>>>> even after deadlock_timeout passes. >>>>> >>>>> This seems a bug to me. >>>>> >>>> +1 >>>> >>>>> >>>>> > * Deadlocks involving the Startup process and an ordinary backend process >>>>> > * will be detected by the deadlock detector within the ordinary backend. >>>>> >>>>> The cause of this issue seems that ResolveRecoveryConflictWithLock() that >>>>> the startup process calls when recovery conflict on lock happens doesn't >>>>> take care of deadlock case at all. You can see this fact by reading the above >>>>> source code comment for ResolveRecoveryConflictWithLock(). >>>>> >>>>> To fix this issue, I think that we should enable STANDBY_DEADLOCK_TIMEOUT >>>>> timer in ResolveRecoveryConflictWithLock() so that the startup process can >>>>> send PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal to the backend. >>>>> Then if PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal arrives, >>>>> the backend should check whether the deadlock actually happens or not. >>>>> Attached is the POC patch implimenting this. >>>>> >>>> good catch! >>>> >>>> I don't see any obvious reasons why the STANDBY_DEADLOCK_TIMEOUT shouldn't be set in ResolveRecoveryConflictWithLock()too (it is already set in ResolveRecoveryConflictWithBufferPin()). >>>> >>>> So + 1 to consider this as a bug and for the way the patch proposes to fix it. >>> >>> Thanks Victor and Bertrand for agreeing! >>> Attached is the updated version of the patch. >> >> Attached is v3 of the patch. Could you review this version? >> >> While the startup process is waiting for recovery conflict on buffer pin, >> it repeats sending the request for deadlock check to all the backends >> every deadlock_timeout. This may increase the workload in the startup >> process and backends, but since this is the original behavior, the patch >> doesn't change that. > > Agree. > > IMHO that may need to be rethink (as we are already in a conflict situation, i am tempted to say that the less is beingdone the better it is), but i think that's outside the scope of this patch. Yes, I agree that's better. I think that we should improve that as a separate patch only for master branch, after fixing the bug and back-patching that at first. > >> Also in practice this may not be so harmful because >> the period that the buffer is kept pinned is basically not so long. > > Agree that chances are less to be in this mode for a "long" duration (as compare to the lock conflict). > >> >> On the other hand, IMO we should avoid this issue while the startup >> process is waiting for recovery conflict on locks since it can take >> a long time to release the locks. So the patch tries to fix it. > Agree >> >> Or I'm overthinking this? If this doesn't need to be handled, >> the patch can be simplified more. Thought? > > I do think that's good to handle it that way for the lock conflict: the less work is done the better it is (specially ina conflict situation). > > The patch does look good to me. Thanks for the review! Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020/12/22 10:25, Masahiko Sawada wrote: > On Fri, Dec 18, 2020 at 6:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> >> >> On 2020/12/17 2:15, Fujii Masao wrote: >>> >>> >>> On 2020/12/16 23:28, Drouvot, Bertrand wrote: >>>> Hi, >>>> >>>> On 12/16/20 2:36 PM, Victor Yegorov wrote: >>>>> >>>>> *CAUTION*: This email originated from outside of the organization. Do not click links or open attachments unless youcan confirm the sender and know the content is safe. >>>>> >>>>> >>>>> ср, 16 дек. 2020 г. в 13:49, Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>: >>>>> >>>>> After doing this procedure, you can see the startup process and backend >>>>> wait for the table lock each other, i.e., deadlock. But this deadlock remains >>>>> even after deadlock_timeout passes. >>>>> >>>>> This seems a bug to me. >>>>> >>>> +1 >>>> >>>>> >>>>> > * Deadlocks involving the Startup process and an ordinary backend process >>>>> > * will be detected by the deadlock detector within the ordinary backend. >>>>> >>>>> The cause of this issue seems that ResolveRecoveryConflictWithLock() that >>>>> the startup process calls when recovery conflict on lock happens doesn't >>>>> take care of deadlock case at all. You can see this fact by reading the above >>>>> source code comment for ResolveRecoveryConflictWithLock(). >>>>> >>>>> To fix this issue, I think that we should enable STANDBY_DEADLOCK_TIMEOUT >>>>> timer in ResolveRecoveryConflictWithLock() so that the startup process can >>>>> send PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal to the backend. >>>>> Then if PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal arrives, >>>>> the backend should check whether the deadlock actually happens or not. >>>>> Attached is the POC patch implimenting this. >>>>> >>>> good catch! >>>> >>>> I don't see any obvious reasons why the STANDBY_DEADLOCK_TIMEOUT shouldn't be set in ResolveRecoveryConflictWithLock()too (it is already set in ResolveRecoveryConflictWithBufferPin()). >>>> >>>> So + 1 to consider this as a bug and for the way the patch proposes to fix it. >>> >>> Thanks Victor and Bertrand for agreeing! >>> Attached is the updated version of the patch. >> >> Attached is v3 of the patch. Could you review this version? >> >> While the startup process is waiting for recovery conflict on buffer pin, >> it repeats sending the request for deadlock check to all the backends >> every deadlock_timeout. This may increase the workload in the startup >> process and backends, but since this is the original behavior, the patch >> doesn't change that. Also in practice this may not be so harmful because >> the period that the buffer is kept pinned is basically not so long. >> > > @@ -529,6 +603,26 @@ ResolveRecoveryConflictWithBufferPin(void) > */ > ProcWaitForSignal(PG_WAIT_BUFFER_PIN); > > + if (got_standby_deadlock_timeout) > + { > + /* > + * Send out a request for hot-standby backends to check themselves for > + * deadlocks. > + * > + * XXX The subsequent ResolveRecoveryConflictWithBufferPin() will wait > + * to be signaled by UnpinBuffer() again and send a request for > + * deadlocks check if deadlock_timeout happens. This causes the > + * request to continue to be sent every deadlock_timeout until the > + * buffer is unpinned or ltime is reached. This would increase the > + * workload in the startup process and backends. In practice it may > + * not be so harmful because the period that the buffer is kept pinned > + * is basically no so long. But we should fix this? > + */ > + SendRecoveryConflictWithBufferPin( > + > PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK); > + got_standby_deadlock_timeout = false; > + } > + > > Since SendRecoveryConflictWithBufferPin() sends the signal to all > backends every backend who is waiting on a lock at ProcSleep() and not > holding a buffer pin blocking the startup process will end up doing a > deadlock check, which seems expensive. What is worse is that the > deadlock will not be detected because the deadlock involving a buffer > pin isn't detected by CheckDeadLock(). I thought we can replace > PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK with > PROCSIG_RECOVERY_CONFLICT_BUFFERPIN but it’s not good because the > backend who has a buffer pin blocking the startup process and not > waiting on a lock is also canceled after deadlock_timeout. We can have > the backend return from RecoveryConflictInterrupt() when it received > PROCSIG_RECOVERY_CONFLICT_BUFFERPIN and is not waiting on any lock, > but it’s also not good because we cannot cancel the backend after > max_standby_streaming_delay that has a buffer pin blocking the startup > process. So I wonder if we can have a new signal. When the backend > received this signal it returns from RecoveryConflictInterrupt() > without deadlock check either if it’s not waiting on any lock or if it > doesn’t have a buffer pin blocking the startup process. Otherwise it's > cancelled. Thanks for pointing out that issue! Using new signal is an idea. Another idea is to make a backend skip check the deadlock if GetStartupBufferPinWaitBufId() returns -1, i.e., the startup process is not waiting for buffer pin. So, what I'm thinkins is; In RecoveryConflictInterrupt(), when a backend receive PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK, 1. If a backend isn't waiting for a lock, it does nothing . 2. If a backend is waiting for a lock and also holding a buffer pin that delays recovery, it may be canceled. 3. If a backend is waiting for a lock and the startup process is not waiting for buffer pin (i.e., the startup process is also waiting for a lock), it checks for the deadlocks. 4. If a backend is waiting for a lock and isn't holding a buffer pin that delays recovery though the startup process is waiting for buffer pin, it does nothing. Thought? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020/12/22 20:42, Fujii Masao wrote: > > > On 2020/12/22 10:25, Masahiko Sawada wrote: >> On Fri, Dec 18, 2020 at 6:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>> >>> >>> >>> On 2020/12/17 2:15, Fujii Masao wrote: >>>> >>>> >>>> On 2020/12/16 23:28, Drouvot, Bertrand wrote: >>>>> Hi, >>>>> >>>>> On 12/16/20 2:36 PM, Victor Yegorov wrote: >>>>>> >>>>>> *CAUTION*: This email originated from outside of the organization. Do not click links or open attachments unless youcan confirm the sender and know the content is safe. >>>>>> >>>>>> >>>>>> ср, 16 дек. 2020 г. в 13:49, Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>: >>>>>> >>>>>> After doing this procedure, you can see the startup process and backend >>>>>> wait for the table lock each other, i.e., deadlock. But this deadlock remains >>>>>> even after deadlock_timeout passes. >>>>>> >>>>>> This seems a bug to me. >>>>>> >>>>> +1 >>>>> >>>>>> >>>>>> > * Deadlocks involving the Startup process and an ordinary backend process >>>>>> > * will be detected by the deadlock detector within the ordinary backend. >>>>>> >>>>>> The cause of this issue seems that ResolveRecoveryConflictWithLock() that >>>>>> the startup process calls when recovery conflict on lock happens doesn't >>>>>> take care of deadlock case at all. You can see this fact by reading the above >>>>>> source code comment for ResolveRecoveryConflictWithLock(). >>>>>> >>>>>> To fix this issue, I think that we should enable STANDBY_DEADLOCK_TIMEOUT >>>>>> timer in ResolveRecoveryConflictWithLock() so that the startup process can >>>>>> send PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal to the backend. >>>>>> Then if PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal arrives, >>>>>> the backend should check whether the deadlock actually happens or not. >>>>>> Attached is the POC patch implimenting this. >>>>>> >>>>> good catch! >>>>> >>>>> I don't see any obvious reasons why the STANDBY_DEADLOCK_TIMEOUT shouldn't be set in ResolveRecoveryConflictWithLock()too (it is already set in ResolveRecoveryConflictWithBufferPin()). >>>>> >>>>> So + 1 to consider this as a bug and for the way the patch proposes to fix it. >>>> >>>> Thanks Victor and Bertrand for agreeing! >>>> Attached is the updated version of the patch. >>> >>> Attached is v3 of the patch. Could you review this version? >>> >>> While the startup process is waiting for recovery conflict on buffer pin, >>> it repeats sending the request for deadlock check to all the backends >>> every deadlock_timeout. This may increase the workload in the startup >>> process and backends, but since this is the original behavior, the patch >>> doesn't change that. Also in practice this may not be so harmful because >>> the period that the buffer is kept pinned is basically not so long. >>> >> >> @@ -529,6 +603,26 @@ ResolveRecoveryConflictWithBufferPin(void) >> */ >> ProcWaitForSignal(PG_WAIT_BUFFER_PIN); >> >> + if (got_standby_deadlock_timeout) >> + { >> + /* >> + * Send out a request for hot-standby backends to check themselves for >> + * deadlocks. >> + * >> + * XXX The subsequent ResolveRecoveryConflictWithBufferPin() will wait >> + * to be signaled by UnpinBuffer() again and send a request for >> + * deadlocks check if deadlock_timeout happens. This causes the >> + * request to continue to be sent every deadlock_timeout until the >> + * buffer is unpinned or ltime is reached. This would increase the >> + * workload in the startup process and backends. In practice it may >> + * not be so harmful because the period that the buffer is kept pinned >> + * is basically no so long. But we should fix this? >> + */ >> + SendRecoveryConflictWithBufferPin( >> + >> PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK); >> + got_standby_deadlock_timeout = false; >> + } >> + >> >> Since SendRecoveryConflictWithBufferPin() sends the signal to all >> backends every backend who is waiting on a lock at ProcSleep() and not >> holding a buffer pin blocking the startup process will end up doing a >> deadlock check, which seems expensive. What is worse is that the >> deadlock will not be detected because the deadlock involving a buffer >> pin isn't detected by CheckDeadLock(). I thought we can replace >> PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK with >> PROCSIG_RECOVERY_CONFLICT_BUFFERPIN but it’s not good because the >> backend who has a buffer pin blocking the startup process and not >> waiting on a lock is also canceled after deadlock_timeout. We can have >> the backend return from RecoveryConflictInterrupt() when it received >> PROCSIG_RECOVERY_CONFLICT_BUFFERPIN and is not waiting on any lock, >> but it’s also not good because we cannot cancel the backend after >> max_standby_streaming_delay that has a buffer pin blocking the startup >> process. So I wonder if we can have a new signal. When the backend >> received this signal it returns from RecoveryConflictInterrupt() >> without deadlock check either if it’s not waiting on any lock or if it >> doesn’t have a buffer pin blocking the startup process. Otherwise it's >> cancelled. > > Thanks for pointing out that issue! Using new signal is an idea. Another idea > is to make a backend skip check the deadlock if GetStartupBufferPinWaitBufId() > returns -1, i.e., the startup process is not waiting for buffer pin. So, > what I'm thinkins is; > > In RecoveryConflictInterrupt(), when a backend receive > PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK, > > 1. If a backend isn't waiting for a lock, it does nothing . > 2. If a backend is waiting for a lock and also holding a buffer pin that > delays recovery, it may be canceled. > 3. If a backend is waiting for a lock and the startup process is not waiting > for buffer pin (i.e., the startup process is also waiting for a lock), > it checks for the deadlocks. > 4. If a backend is waiting for a lock and isn't holding a buffer pin that > delays recovery though the startup process is waiting for buffer pin, > it does nothing. I implemented this. Patch attached. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
On Tue, Dec 22, 2020 at 11:58 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/12/22 20:42, Fujii Masao wrote: > > > > > > On 2020/12/22 10:25, Masahiko Sawada wrote: > >> On Fri, Dec 18, 2020 at 6:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >>> > >>> > >>> > >>> On 2020/12/17 2:15, Fujii Masao wrote: > >>>> > >>>> > >>>> On 2020/12/16 23:28, Drouvot, Bertrand wrote: > >>>>> Hi, > >>>>> > >>>>> On 12/16/20 2:36 PM, Victor Yegorov wrote: > >>>>>> > >>>>>> *CAUTION*: This email originated from outside of the organization. Do not click links or open attachments unlessyou can confirm the sender and know the content is safe. > >>>>>> > >>>>>> > >>>>>> ср, 16 дек. 2020 г. в 13:49, Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>: > >>>>>> > >>>>>> After doing this procedure, you can see the startup process and backend > >>>>>> wait for the table lock each other, i.e., deadlock. But this deadlock remains > >>>>>> even after deadlock_timeout passes. > >>>>>> > >>>>>> This seems a bug to me. > >>>>>> > >>>>> +1 > >>>>> > >>>>>> > >>>>>> > * Deadlocks involving the Startup process and an ordinary backend process > >>>>>> > * will be detected by the deadlock detector within the ordinary backend. > >>>>>> > >>>>>> The cause of this issue seems that ResolveRecoveryConflictWithLock() that > >>>>>> the startup process calls when recovery conflict on lock happens doesn't > >>>>>> take care of deadlock case at all. You can see this fact by reading the above > >>>>>> source code comment for ResolveRecoveryConflictWithLock(). > >>>>>> > >>>>>> To fix this issue, I think that we should enable STANDBY_DEADLOCK_TIMEOUT > >>>>>> timer in ResolveRecoveryConflictWithLock() so that the startup process can > >>>>>> send PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal to the backend. > >>>>>> Then if PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal arrives, > >>>>>> the backend should check whether the deadlock actually happens or not. > >>>>>> Attached is the POC patch implimenting this. > >>>>>> > >>>>> good catch! > >>>>> > >>>>> I don't see any obvious reasons why the STANDBY_DEADLOCK_TIMEOUT shouldn't be set in ResolveRecoveryConflictWithLock()too (it is already set in ResolveRecoveryConflictWithBufferPin()). > >>>>> > >>>>> So + 1 to consider this as a bug and for the way the patch proposes to fix it. > >>>> > >>>> Thanks Victor and Bertrand for agreeing! > >>>> Attached is the updated version of the patch. > >>> > >>> Attached is v3 of the patch. Could you review this version? > >>> > >>> While the startup process is waiting for recovery conflict on buffer pin, > >>> it repeats sending the request for deadlock check to all the backends > >>> every deadlock_timeout. This may increase the workload in the startup > >>> process and backends, but since this is the original behavior, the patch > >>> doesn't change that. Also in practice this may not be so harmful because > >>> the period that the buffer is kept pinned is basically not so long. > >>> > >> > >> @@ -529,6 +603,26 @@ ResolveRecoveryConflictWithBufferPin(void) > >> */ > >> ProcWaitForSignal(PG_WAIT_BUFFER_PIN); > >> > >> + if (got_standby_deadlock_timeout) > >> + { > >> + /* > >> + * Send out a request for hot-standby backends to check themselves for > >> + * deadlocks. > >> + * > >> + * XXX The subsequent ResolveRecoveryConflictWithBufferPin() will wait > >> + * to be signaled by UnpinBuffer() again and send a request for > >> + * deadlocks check if deadlock_timeout happens. This causes the > >> + * request to continue to be sent every deadlock_timeout until the > >> + * buffer is unpinned or ltime is reached. This would increase the > >> + * workload in the startup process and backends. In practice it may > >> + * not be so harmful because the period that the buffer is kept pinned > >> + * is basically no so long. But we should fix this? > >> + */ > >> + SendRecoveryConflictWithBufferPin( > >> + > >> PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK); > >> + got_standby_deadlock_timeout = false; > >> + } > >> + > >> > >> Since SendRecoveryConflictWithBufferPin() sends the signal to all > >> backends every backend who is waiting on a lock at ProcSleep() and not > >> holding a buffer pin blocking the startup process will end up doing a > >> deadlock check, which seems expensive. What is worse is that the > >> deadlock will not be detected because the deadlock involving a buffer > >> pin isn't detected by CheckDeadLock(). I thought we can replace > >> PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK with > >> PROCSIG_RECOVERY_CONFLICT_BUFFERPIN but it’s not good because the > >> backend who has a buffer pin blocking the startup process and not > >> waiting on a lock is also canceled after deadlock_timeout. We can have > >> the backend return from RecoveryConflictInterrupt() when it received > >> PROCSIG_RECOVERY_CONFLICT_BUFFERPIN and is not waiting on any lock, > >> but it’s also not good because we cannot cancel the backend after > >> max_standby_streaming_delay that has a buffer pin blocking the startup > >> process. So I wonder if we can have a new signal. When the backend > >> received this signal it returns from RecoveryConflictInterrupt() > >> without deadlock check either if it’s not waiting on any lock or if it > >> doesn’t have a buffer pin blocking the startup process. Otherwise it's > >> cancelled. > > > > Thanks for pointing out that issue! Using new signal is an idea. Another idea > > is to make a backend skip check the deadlock if GetStartupBufferPinWaitBufId() > > returns -1, i.e., the startup process is not waiting for buffer pin. So, > > what I'm thinkins is; > > > > In RecoveryConflictInterrupt(), when a backend receive > > PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK, > > > > 1. If a backend isn't waiting for a lock, it does nothing . > > 2. If a backend is waiting for a lock and also holding a buffer pin that > > delays recovery, it may be canceled. > > 3. If a backend is waiting for a lock and the startup process is not waiting > > for buffer pin (i.e., the startup process is also waiting for a lock), > > it checks for the deadlocks. > > 4. If a backend is waiting for a lock and isn't holding a buffer pin that > > delays recovery though the startup process is waiting for buffer pin, > > it does nothing. > Good idea! It could still happen that if the startup process sets startupBufferPinWaitBufId to -1 after sending the signal and before the backend checks it, the backend will end up doing an unmeaningful deadlock check. But the likelihood would be low in practice. I have two small comments on ResolveRecoveryConflictWithBufferPin() in the v4 patch: The code currently has three branches as follow: if (ltime == 0) { enable a timeout for deadlock; } else if (GetCurrentTimestamp() >= ltime) { send recovery conflict signal; } else { enable two timeouts: ltime and deadlock } I think we can rearrange the code similar to the changes you made on ResolveRecoveryConflictWithLock(): if (GetCurrentTimestamp() >= ltime && ltime != 0) { Resolve recovery conflict; } else { Enable one or two timeouts: ltime and deadlock } It's more consistent with ResolveRecoveryConflictWithLock(). And currently the patch doesn't reset got_standby_deadlock_timeout in (ltime == 0) case but it will also be resolved by this rearrangement. --- If we always reset got_standby_deadlock_timeout before waiting it's not necessary but we might want to clear got_standby_deadlock_timeout also after disabling all timeouts to ensure that it's cleared at the end of the function. In ResolveRecoveryConflictWithLock() we clear both got_standby_lock_timeout and got_standby_deadlock_timeout after disabling all timeouts but we don't do that in ResolveRecoveryConflictWithBufferPin(). Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/
On 2020/12/23 19:28, Masahiko Sawada wrote: > On Tue, Dec 22, 2020 at 11:58 PM Fujii Masao > <masao.fujii@oss.nttdata.com> wrote: >> >> >> >> On 2020/12/22 20:42, Fujii Masao wrote: >>> >>> >>> On 2020/12/22 10:25, Masahiko Sawada wrote: >>>> On Fri, Dec 18, 2020 at 6:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>>> >>>>> >>>>> >>>>> On 2020/12/17 2:15, Fujii Masao wrote: >>>>>> >>>>>> >>>>>> On 2020/12/16 23:28, Drouvot, Bertrand wrote: >>>>>>> Hi, >>>>>>> >>>>>>> On 12/16/20 2:36 PM, Victor Yegorov wrote: >>>>>>>> >>>>>>>> *CAUTION*: This email originated from outside of the organization. Do not click links or open attachments unlessyou can confirm the sender and know the content is safe. >>>>>>>> >>>>>>>> >>>>>>>> ср, 16 дек. 2020 г. в 13:49, Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>: >>>>>>>> >>>>>>>> After doing this procedure, you can see the startup process and backend >>>>>>>> wait for the table lock each other, i.e., deadlock. But this deadlock remains >>>>>>>> even after deadlock_timeout passes. >>>>>>>> >>>>>>>> This seems a bug to me. >>>>>>>> >>>>>>> +1 >>>>>>> >>>>>>>> >>>>>>>> > * Deadlocks involving the Startup process and an ordinary backend process >>>>>>>> > * will be detected by the deadlock detector within the ordinary backend. >>>>>>>> >>>>>>>> The cause of this issue seems that ResolveRecoveryConflictWithLock() that >>>>>>>> the startup process calls when recovery conflict on lock happens doesn't >>>>>>>> take care of deadlock case at all. You can see this fact by reading the above >>>>>>>> source code comment for ResolveRecoveryConflictWithLock(). >>>>>>>> >>>>>>>> To fix this issue, I think that we should enable STANDBY_DEADLOCK_TIMEOUT >>>>>>>> timer in ResolveRecoveryConflictWithLock() so that the startup process can >>>>>>>> send PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal to the backend. >>>>>>>> Then if PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal arrives, >>>>>>>> the backend should check whether the deadlock actually happens or not. >>>>>>>> Attached is the POC patch implimenting this. >>>>>>>> >>>>>>> good catch! >>>>>>> >>>>>>> I don't see any obvious reasons why the STANDBY_DEADLOCK_TIMEOUT shouldn't be set in ResolveRecoveryConflictWithLock()too (it is already set in ResolveRecoveryConflictWithBufferPin()). >>>>>>> >>>>>>> So + 1 to consider this as a bug and for the way the patch proposes to fix it. >>>>>> >>>>>> Thanks Victor and Bertrand for agreeing! >>>>>> Attached is the updated version of the patch. >>>>> >>>>> Attached is v3 of the patch. Could you review this version? >>>>> >>>>> While the startup process is waiting for recovery conflict on buffer pin, >>>>> it repeats sending the request for deadlock check to all the backends >>>>> every deadlock_timeout. This may increase the workload in the startup >>>>> process and backends, but since this is the original behavior, the patch >>>>> doesn't change that. Also in practice this may not be so harmful because >>>>> the period that the buffer is kept pinned is basically not so long. >>>>> >>>> >>>> @@ -529,6 +603,26 @@ ResolveRecoveryConflictWithBufferPin(void) >>>> */ >>>> ProcWaitForSignal(PG_WAIT_BUFFER_PIN); >>>> >>>> + if (got_standby_deadlock_timeout) >>>> + { >>>> + /* >>>> + * Send out a request for hot-standby backends to check themselves for >>>> + * deadlocks. >>>> + * >>>> + * XXX The subsequent ResolveRecoveryConflictWithBufferPin() will wait >>>> + * to be signaled by UnpinBuffer() again and send a request for >>>> + * deadlocks check if deadlock_timeout happens. This causes the >>>> + * request to continue to be sent every deadlock_timeout until the >>>> + * buffer is unpinned or ltime is reached. This would increase the >>>> + * workload in the startup process and backends. In practice it may >>>> + * not be so harmful because the period that the buffer is kept pinned >>>> + * is basically no so long. But we should fix this? >>>> + */ >>>> + SendRecoveryConflictWithBufferPin( >>>> + >>>> PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK); >>>> + got_standby_deadlock_timeout = false; >>>> + } >>>> + >>>> >>>> Since SendRecoveryConflictWithBufferPin() sends the signal to all >>>> backends every backend who is waiting on a lock at ProcSleep() and not >>>> holding a buffer pin blocking the startup process will end up doing a >>>> deadlock check, which seems expensive. What is worse is that the >>>> deadlock will not be detected because the deadlock involving a buffer >>>> pin isn't detected by CheckDeadLock(). I thought we can replace >>>> PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK with >>>> PROCSIG_RECOVERY_CONFLICT_BUFFERPIN but it’s not good because the >>>> backend who has a buffer pin blocking the startup process and not >>>> waiting on a lock is also canceled after deadlock_timeout. We can have >>>> the backend return from RecoveryConflictInterrupt() when it received >>>> PROCSIG_RECOVERY_CONFLICT_BUFFERPIN and is not waiting on any lock, >>>> but it’s also not good because we cannot cancel the backend after >>>> max_standby_streaming_delay that has a buffer pin blocking the startup >>>> process. So I wonder if we can have a new signal. When the backend >>>> received this signal it returns from RecoveryConflictInterrupt() >>>> without deadlock check either if it’s not waiting on any lock or if it >>>> doesn’t have a buffer pin blocking the startup process. Otherwise it's >>>> cancelled. >>> >>> Thanks for pointing out that issue! Using new signal is an idea. Another idea >>> is to make a backend skip check the deadlock if GetStartupBufferPinWaitBufId() >>> returns -1, i.e., the startup process is not waiting for buffer pin. So, >>> what I'm thinkins is; >>> >>> In RecoveryConflictInterrupt(), when a backend receive >>> PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK, >>> >>> 1. If a backend isn't waiting for a lock, it does nothing . >>> 2. If a backend is waiting for a lock and also holding a buffer pin that >>> delays recovery, it may be canceled. >>> 3. If a backend is waiting for a lock and the startup process is not waiting >>> for buffer pin (i.e., the startup process is also waiting for a lock), >>> it checks for the deadlocks. >>> 4. If a backend is waiting for a lock and isn't holding a buffer pin that >>> delays recovery though the startup process is waiting for buffer pin, >>> it does nothing. >> > > Good idea! It could still happen that if the startup process sets > startupBufferPinWaitBufId to -1 after sending the signal and before > the backend checks it, the backend will end up doing an unmeaningful > deadlock check. But the likelihood would be low in practice. > > I have two small comments on ResolveRecoveryConflictWithBufferPin() in > the v4 patch: > > The code currently has three branches as follow: > > if (ltime == 0) > { > enable a timeout for deadlock; > } > else if (GetCurrentTimestamp() >= ltime) > { > send recovery conflict signal; > } > else > { > enable two timeouts: ltime and deadlock > } > > I think we can rearrange the code similar to the changes you made on > ResolveRecoveryConflictWithLock(): > > if (GetCurrentTimestamp() >= ltime && ltime != 0) > { > Resolve recovery conflict; > } > else > { > Enable one or two timeouts: ltime and deadlock > } > > It's more consistent with ResolveRecoveryConflictWithLock(). And > currently the patch doesn't reset got_standby_deadlock_timeout in > (ltime == 0) case but it will also be resolved by this rearrangement. I didn't want to change the code structure as much as possible because the patch needs to be back-patched. But it's good idea to make the code structures in ResolveRecoveryConflictWithLock() and ...WithBufferPin() similar, to make the code simpler and easier-to-read. So I agree with you. Attached is the updated of the patch. What about this version? > > --- > If we always reset got_standby_deadlock_timeout before waiting it's > not necessary but we might want to clear got_standby_deadlock_timeout > also after disabling all timeouts to ensure that it's cleared at the > end of the function. In ResolveRecoveryConflictWithLock() we clear > both got_standby_lock_timeout and got_standby_deadlock_timeout after > disabling all timeouts but we don't do that in > ResolveRecoveryConflictWithBufferPin(). I agree that it's better to reset got_standby_deadlock_timeout after all the timeouts are disabled. So I changed the patch that way. OTOH got_standby_lock_timeout doesn't need to be reset because it's never enabled in ResolveRecoveryConflictWithBufferPin(). No? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
On Wed, Dec 23, 2020 at 9:42 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/12/23 19:28, Masahiko Sawada wrote: > > On Tue, Dec 22, 2020 at 11:58 PM Fujii Masao > > <masao.fujii@oss.nttdata.com> wrote: > >> > >> > >> > >> On 2020/12/22 20:42, Fujii Masao wrote: > >>> > >>> > >>> On 2020/12/22 10:25, Masahiko Sawada wrote: > >>>> On Fri, Dec 18, 2020 at 6:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >>>>> > >>>>> > >>>>> > >>>>> On 2020/12/17 2:15, Fujii Masao wrote: > >>>>>> > >>>>>> > >>>>>> On 2020/12/16 23:28, Drouvot, Bertrand wrote: > >>>>>>> Hi, > >>>>>>> > >>>>>>> On 12/16/20 2:36 PM, Victor Yegorov wrote: > >>>>>>>> > >>>>>>>> *CAUTION*: This email originated from outside of the organization. Do not click links or open attachments unlessyou can confirm the sender and know the content is safe. > >>>>>>>> > >>>>>>>> > >>>>>>>> ср, 16 дек. 2020 г. в 13:49, Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>: > >>>>>>>> > >>>>>>>> After doing this procedure, you can see the startup process and backend > >>>>>>>> wait for the table lock each other, i.e., deadlock. But this deadlock remains > >>>>>>>> even after deadlock_timeout passes. > >>>>>>>> > >>>>>>>> This seems a bug to me. > >>>>>>>> > >>>>>>> +1 > >>>>>>> > >>>>>>>> > >>>>>>>> > * Deadlocks involving the Startup process and an ordinary backend process > >>>>>>>> > * will be detected by the deadlock detector within the ordinary backend. > >>>>>>>> > >>>>>>>> The cause of this issue seems that ResolveRecoveryConflictWithLock() that > >>>>>>>> the startup process calls when recovery conflict on lock happens doesn't > >>>>>>>> take care of deadlock case at all. You can see this fact by reading the above > >>>>>>>> source code comment for ResolveRecoveryConflictWithLock(). > >>>>>>>> > >>>>>>>> To fix this issue, I think that we should enable STANDBY_DEADLOCK_TIMEOUT > >>>>>>>> timer in ResolveRecoveryConflictWithLock() so that the startup process can > >>>>>>>> send PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal to the backend. > >>>>>>>> Then if PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal arrives, > >>>>>>>> the backend should check whether the deadlock actually happens or not. > >>>>>>>> Attached is the POC patch implimenting this. > >>>>>>>> > >>>>>>> good catch! > >>>>>>> > >>>>>>> I don't see any obvious reasons why the STANDBY_DEADLOCK_TIMEOUT shouldn't be set in ResolveRecoveryConflictWithLock()too (it is already set in ResolveRecoveryConflictWithBufferPin()). > >>>>>>> > >>>>>>> So + 1 to consider this as a bug and for the way the patch proposes to fix it. > >>>>>> > >>>>>> Thanks Victor and Bertrand for agreeing! > >>>>>> Attached is the updated version of the patch. > >>>>> > >>>>> Attached is v3 of the patch. Could you review this version? > >>>>> > >>>>> While the startup process is waiting for recovery conflict on buffer pin, > >>>>> it repeats sending the request for deadlock check to all the backends > >>>>> every deadlock_timeout. This may increase the workload in the startup > >>>>> process and backends, but since this is the original behavior, the patch > >>>>> doesn't change that. Also in practice this may not be so harmful because > >>>>> the period that the buffer is kept pinned is basically not so long. > >>>>> > >>>> > >>>> @@ -529,6 +603,26 @@ ResolveRecoveryConflictWithBufferPin(void) > >>>> */ > >>>> ProcWaitForSignal(PG_WAIT_BUFFER_PIN); > >>>> > >>>> + if (got_standby_deadlock_timeout) > >>>> + { > >>>> + /* > >>>> + * Send out a request for hot-standby backends to check themselves for > >>>> + * deadlocks. > >>>> + * > >>>> + * XXX The subsequent ResolveRecoveryConflictWithBufferPin() will wait > >>>> + * to be signaled by UnpinBuffer() again and send a request for > >>>> + * deadlocks check if deadlock_timeout happens. This causes the > >>>> + * request to continue to be sent every deadlock_timeout until the > >>>> + * buffer is unpinned or ltime is reached. This would increase the > >>>> + * workload in the startup process and backends. In practice it may > >>>> + * not be so harmful because the period that the buffer is kept pinned > >>>> + * is basically no so long. But we should fix this? > >>>> + */ > >>>> + SendRecoveryConflictWithBufferPin( > >>>> + > >>>> PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK); > >>>> + got_standby_deadlock_timeout = false; > >>>> + } > >>>> + > >>>> > >>>> Since SendRecoveryConflictWithBufferPin() sends the signal to all > >>>> backends every backend who is waiting on a lock at ProcSleep() and not > >>>> holding a buffer pin blocking the startup process will end up doing a > >>>> deadlock check, which seems expensive. What is worse is that the > >>>> deadlock will not be detected because the deadlock involving a buffer > >>>> pin isn't detected by CheckDeadLock(). I thought we can replace > >>>> PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK with > >>>> PROCSIG_RECOVERY_CONFLICT_BUFFERPIN but it’s not good because the > >>>> backend who has a buffer pin blocking the startup process and not > >>>> waiting on a lock is also canceled after deadlock_timeout. We can have > >>>> the backend return from RecoveryConflictInterrupt() when it received > >>>> PROCSIG_RECOVERY_CONFLICT_BUFFERPIN and is not waiting on any lock, > >>>> but it’s also not good because we cannot cancel the backend after > >>>> max_standby_streaming_delay that has a buffer pin blocking the startup > >>>> process. So I wonder if we can have a new signal. When the backend > >>>> received this signal it returns from RecoveryConflictInterrupt() > >>>> without deadlock check either if it’s not waiting on any lock or if it > >>>> doesn’t have a buffer pin blocking the startup process. Otherwise it's > >>>> cancelled. > >>> > >>> Thanks for pointing out that issue! Using new signal is an idea. Another idea > >>> is to make a backend skip check the deadlock if GetStartupBufferPinWaitBufId() > >>> returns -1, i.e., the startup process is not waiting for buffer pin. So, > >>> what I'm thinkins is; > >>> > >>> In RecoveryConflictInterrupt(), when a backend receive > >>> PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK, > >>> > >>> 1. If a backend isn't waiting for a lock, it does nothing . > >>> 2. If a backend is waiting for a lock and also holding a buffer pin that > >>> delays recovery, it may be canceled. > >>> 3. If a backend is waiting for a lock and the startup process is not waiting > >>> for buffer pin (i.e., the startup process is also waiting for a lock), > >>> it checks for the deadlocks. > >>> 4. If a backend is waiting for a lock and isn't holding a buffer pin that > >>> delays recovery though the startup process is waiting for buffer pin, > >>> it does nothing. > >> > > > > Good idea! It could still happen that if the startup process sets > > startupBufferPinWaitBufId to -1 after sending the signal and before > > the backend checks it, the backend will end up doing an unmeaningful > > deadlock check. But the likelihood would be low in practice. > > > > I have two small comments on ResolveRecoveryConflictWithBufferPin() in > > the v4 patch: > > > > The code currently has three branches as follow: > > > > if (ltime == 0) > > { > > enable a timeout for deadlock; > > } > > else if (GetCurrentTimestamp() >= ltime) > > { > > send recovery conflict signal; > > } > > else > > { > > enable two timeouts: ltime and deadlock > > } > > > > I think we can rearrange the code similar to the changes you made on > > ResolveRecoveryConflictWithLock(): > > > > if (GetCurrentTimestamp() >= ltime && ltime != 0) > > { > > Resolve recovery conflict; > > } > > else > > { > > Enable one or two timeouts: ltime and deadlock > > } > > > > It's more consistent with ResolveRecoveryConflictWithLock(). And > > currently the patch doesn't reset got_standby_deadlock_timeout in > > (ltime == 0) case but it will also be resolved by this rearrangement. > > I didn't want to change the code structure as much as possible because > the patch needs to be back-patched. But it's good idea to make the code > structures in ResolveRecoveryConflictWithLock() and ...WithBufferPin() similar, > to make the code simpler and easier-to-read. So I agree with you. Attached > is the updated of the patch. What about this version? Thank you for updating the patch! The patch looks good to me. > > > > > --- > > If we always reset got_standby_deadlock_timeout before waiting it's > > not necessary but we might want to clear got_standby_deadlock_timeout > > also after disabling all timeouts to ensure that it's cleared at the > > end of the function. In ResolveRecoveryConflictWithLock() we clear > > both got_standby_lock_timeout and got_standby_deadlock_timeout after > > disabling all timeouts but we don't do that in > > ResolveRecoveryConflictWithBufferPin(). > > I agree that it's better to reset got_standby_deadlock_timeout after > all the timeouts are disabled. So I changed the patch that way. OTOH > got_standby_lock_timeout doesn't need to be reset because it's never > enabled in ResolveRecoveryConflictWithBufferPin(). No? Yes, you're right. We need to clear only got_standby_deadlock_timeout. Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/
At Wed, 23 Dec 2020 21:42:47 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > you. Attached > is the updated of the patch. What about this version? The patch contains a hunk in the following structure. + if (got_standby_lock_timeout) + goto cleanup; + + if (got_standby_deadlock_timeout) + { ... + } + +cleanup: It is eqivalent to + if (!got_standby_lock_timeout && got_standby_deadlock_timeout) + { ... + } Is there any reason for the goto? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2020/12/25 13:16, Kyotaro Horiguchi wrote: > At Wed, 23 Dec 2020 21:42:47 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in >> you. Attached >> is the updated of the patch. What about this version? > > The patch contains a hunk in the following structure. > > + if (got_standby_lock_timeout) > + goto cleanup; > + > + if (got_standby_deadlock_timeout) > + { > ... > + } > + > +cleanup: > > It is eqivalent to > > + if (!got_standby_lock_timeout && got_standby_deadlock_timeout) > + { > ... > + } > > Is there any reason for the goto? Yes. That's because we have the following code using goto. + /* Quick exit if there's no work to be done */ + if (!VirtualTransactionIdIsValid(*backends)) + goto cleanup; Regarding the back-patch, I was thinking to back-patch this to all the supported branches. But I found that it's not easy to do that to v9.5 because v9.5 doesn't have some infrastructure code that this bug fix patch depends on. So at least the commit 37c54863cf as the infrastructure also needs to be back-patched to v9.5. And ISTM that some related commits f868a8143a and 8f0de712c3 need to be back-patched. Probably there might be some other changes to be back-patched. Unfortunately they cannot be applied to v9.5 cleanly and additional changes are necessary. This situation makes me feel that I'm inclined to skip the back-patch to v9.5. Because the next minor version release is the final one for v9.5. So if we unexpectedly introduce the bug to v9.5 by the back-patch, there is no chance to fix that. OTOH, of course, if we don't do the back-patch, there is no chance to fix the deadlock detection bug since the final minor version for v9.5 doesn't include that bug fix. But I'm afraid that the back-patch to v9.5 may give more risk than gain. Thought? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Hi, On 1/5/21 7:26 AM, Fujii Masao wrote: > CAUTION: This email originated from outside of the organization. Do > not click links or open attachments unless you can confirm the sender > and know the content is safe. > > > > On 2020/12/25 13:16, Kyotaro Horiguchi wrote: >> At Wed, 23 Dec 2020 21:42:47 +0900, Fujii Masao >> <masao.fujii@oss.nttdata.com> wrote in >>> you. Attached >>> is the updated of the patch. What about this version? >> >> The patch contains a hunk in the following structure. >> >> + if (got_standby_lock_timeout) >> + goto cleanup; >> + >> + if (got_standby_deadlock_timeout) >> + { >> ... >> + } >> + >> +cleanup: >> >> It is eqivalent to >> >> + if (!got_standby_lock_timeout && got_standby_deadlock_timeout) >> + { >> ... >> + } >> >> Is there any reason for the goto? > > Yes. That's because we have the following code using goto. > > + /* Quick exit if there's no work to be done */ > + if (!VirtualTransactionIdIsValid(*backends)) > + goto cleanup; > > > Regarding the back-patch, I was thinking to back-patch this to all the > supported branches. But I found that it's not easy to do that to v9.5 > because v9.5 doesn't have some infrastructure code that this bug fix > patch depends on. So at least the commit 37c54863cf as the infrastructure > also needs to be back-patched to v9.5. And ISTM that some related commits > f868a8143a and 8f0de712c3 need to be back-patched. Probably there might > be some other changes to be back-patched. Unfortunately they cannot be > applied to v9.5 cleanly and additional changes are necessary. > > This situation makes me feel that I'm inclined to skip the back-patch > to v9.5. > Because the next minor version release is the final one for v9.5. So > if we > unexpectedly introduce the bug to v9.5 by the back-patch, there is no > chance to fix that. OTOH, of course, if we don't do the back-patch, > there is > no chance to fix the deadlock detection bug since the final minor version > for v9.5 doesn't include that bug fix. But I'm afraid that the back-patch > to v9.5 may give more risk than gain. > > Thought? Reading your arguments, I am inclined to think the same. Bertrand
вт, 5 янв. 2021 г. в 07:26, Fujii Masao <masao.fujii@oss.nttdata.com>:
This situation makes me feel that I'm inclined to skip the back-patch to v9.5.
Because the next minor version release is the final one for v9.5. So if we
unexpectedly introduce the bug to v9.5 by the back-patch, there is no
chance to fix that. OTOH, of course, if we don't do the back-patch, there is
no chance to fix the deadlock detection bug since the final minor version
for v9.5 doesn't include that bug fix. But I'm afraid that the back-patch
to v9.5 may give more risk than gain.
Thought?
Honestly, I was thinking that this will not be backpatched at all
and really am glad we're getting this fixed in the back branches as well.
Therefore I think it's fine to skip 9.5, though I
would've mentioned this in the commit message.
Therefore I think it's fine to skip 9.5, though I
would've mentioned this in the commit message.
Victor Yegorov
At Tue, 5 Jan 2021 15:26:50 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > > > On 2020/12/25 13:16, Kyotaro Horiguchi wrote: > > At Wed, 23 Dec 2020 21:42:47 +0900, Fujii Masao > > <masao.fujii@oss.nttdata.com> wrote in > >> you. Attached > >> is the updated of the patch. What about this version? > > The patch contains a hunk in the following structure. > > + if (got_standby_lock_timeout) > > + goto cleanup; > > + > > + if (got_standby_deadlock_timeout) > > + { > > ... > > + } > > + > > +cleanup: > > It is eqivalent to > > + if (!got_standby_lock_timeout && got_standby_deadlock_timeout) > > + { > > ... > > + } > > Is there any reason for the goto? > > Yes. That's because we have the following code using goto. > > + /* Quick exit if there's no work to be done */ > + if (!VirtualTransactionIdIsValid(*backends)) > + goto cleanup; It doesn't seem to be the *cause*. Such straight-forward logic with three-depth indentation is not a thing that should be avoided using goto even if PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK is too lengty and sticks out of 80 coloumns. > Regarding the back-patch, I was thinking to back-patch this to all the > supported branches. But I found that it's not easy to do that to v9.5 > because v9.5 doesn't have some infrastructure code that this bug fix > patch depends on. So at least the commit 37c54863cf as the > infrastructure > also needs to be back-patched to v9.5. And ISTM that some related > commits > f868a8143a and 8f0de712c3 need to be back-patched. Probably there > might > be some other changes to be back-patched. Unfortunately they cannot be > applied to v9.5 cleanly and additional changes are necessary. > > This situation makes me feel that I'm inclined to skip the back-patch > to v9.5. > Because the next minor version release is the final one for v9.5. So > if we > unexpectedly introduce the bug to v9.5 by the back-patch, there is no > chance to fix that. OTOH, of course, if we don't do the back-patch, > there is > no chance to fix the deadlock detection bug since the final minor > version > for v9.5 doesn't include that bug fix. But I'm afraid that the > back-patch > to v9.5 may give more risk than gain. > > Thought? It seems to me that the final minor release should get fixes only for issues that we have actually gotten complaints on, or critical-ish known issues such as ones lead to server crash in normal paths. This particular issue is neither of them. So +1 for not back-patching to 9.5. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Tue, Jan 5, 2021 at 3:26 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/12/25 13:16, Kyotaro Horiguchi wrote: > > At Wed, 23 Dec 2020 21:42:47 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > >> you. Attached > >> is the updated of the patch. What about this version? > > > > The patch contains a hunk in the following structure. > > > > + if (got_standby_lock_timeout) > > + goto cleanup; > > + > > + if (got_standby_deadlock_timeout) > > + { > > ... > > + } > > + > > +cleanup: > > > > It is eqivalent to > > > > + if (!got_standby_lock_timeout && got_standby_deadlock_timeout) > > + { > > ... > > + } > > > > Is there any reason for the goto? > > Yes. That's because we have the following code using goto. > > + /* Quick exit if there's no work to be done */ > + if (!VirtualTransactionIdIsValid(*backends)) > + goto cleanup; > > > Regarding the back-patch, I was thinking to back-patch this to all the > supported branches. But I found that it's not easy to do that to v9.5 > because v9.5 doesn't have some infrastructure code that this bug fix > patch depends on. So at least the commit 37c54863cf as the infrastructure > also needs to be back-patched to v9.5. And ISTM that some related commits > f868a8143a and 8f0de712c3 need to be back-patched. Probably there might > be some other changes to be back-patched. Unfortunately they cannot be > applied to v9.5 cleanly and additional changes are necessary. > > This situation makes me feel that I'm inclined to skip the back-patch to v9.5. > Because the next minor version release is the final one for v9.5. So if we > unexpectedly introduce the bug to v9.5 by the back-patch, there is no > chance to fix that. OTOH, of course, if we don't do the back-patch, there is > no chance to fix the deadlock detection bug since the final minor version > for v9.5 doesn't include that bug fix. But I'm afraid that the back-patch > to v9.5 may give more risk than gain. > > Thought? +1 for not-backpatching to 9.5. Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/
On 2021/01/06 11:48, Masahiko Sawada wrote: > On Tue, Jan 5, 2021 at 3:26 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> >> >> On 2020/12/25 13:16, Kyotaro Horiguchi wrote: >>> At Wed, 23 Dec 2020 21:42:47 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in >>>> you. Attached >>>> is the updated of the patch. What about this version? >>> >>> The patch contains a hunk in the following structure. >>> >>> + if (got_standby_lock_timeout) >>> + goto cleanup; >>> + >>> + if (got_standby_deadlock_timeout) >>> + { >>> ... >>> + } >>> + >>> +cleanup: >>> >>> It is eqivalent to >>> >>> + if (!got_standby_lock_timeout && got_standby_deadlock_timeout) >>> + { >>> ... >>> + } >>> >>> Is there any reason for the goto? >> >> Yes. That's because we have the following code using goto. >> >> + /* Quick exit if there's no work to be done */ >> + if (!VirtualTransactionIdIsValid(*backends)) >> + goto cleanup; >> >> >> Regarding the back-patch, I was thinking to back-patch this to all the >> supported branches. But I found that it's not easy to do that to v9.5 >> because v9.5 doesn't have some infrastructure code that this bug fix >> patch depends on. So at least the commit 37c54863cf as the infrastructure >> also needs to be back-patched to v9.5. And ISTM that some related commits >> f868a8143a and 8f0de712c3 need to be back-patched. Probably there might >> be some other changes to be back-patched. Unfortunately they cannot be >> applied to v9.5 cleanly and additional changes are necessary. >> >> This situation makes me feel that I'm inclined to skip the back-patch to v9.5. >> Because the next minor version release is the final one for v9.5. So if we >> unexpectedly introduce the bug to v9.5 by the back-patch, there is no >> chance to fix that. OTOH, of course, if we don't do the back-patch, there is >> no chance to fix the deadlock detection bug since the final minor version >> for v9.5 doesn't include that bug fix. But I'm afraid that the back-patch >> to v9.5 may give more risk than gain. >> >> Thought? > > +1 for not-backpatching to 9.5. Thanks all! I pushed the patch and back-patched to v9.6. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION