Re: Add Information during standby recovery conflicts - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: Add Information during standby recovery conflicts |
Date | |
Msg-id | e6fa6cc8-8314-a64d-7bf7-ed78dfd2d9dd@oss.nttdata.com Whole thread Raw |
In response to | Re: Add Information during standby recovery conflicts (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Responses |
Re: Add Information during standby recovery conflicts
|
List | pgsql-hackers |
On 2020/12/15 0:20, Fujii Masao wrote: > > > On 2020/12/14 21:31, Fujii Masao wrote: >> >> >> On 2020/12/05 12:38, Masahiko Sawada wrote: >>> On Fri, Dec 4, 2020 at 7:22 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote: >>>> >>>> Hi, >>>> >>>> On 12/4/20 2:21 AM, Fujii Masao wrote: >>>>> >>>>> On 2020/12/04 9:28, Masahiko Sawada wrote: >>>>>> On Fri, Dec 4, 2020 at 2:54 AM Fujii Masao >>>>>> <masao.fujii@oss.nttdata.com> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 2020/12/01 17:29, Drouvot, Bertrand wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> On 12/1/20 12:35 AM, Masahiko Sawada wrote: >>>>>>>>> CAUTION: This email originated from outside of the organization. >>>>>>>>> Do not click links or open attachments unless you can confirm the >>>>>>>>> sender and know the content is safe. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Dec 1, 2020 at 3:25 AM Alvaro Herrera >>>>>>>>> <alvherre@alvh.no-ip.org> wrote: >>>>>>>>>> On 2020-Dec-01, Fujii Masao wrote: >>>>>>>>>> >>>>>>>>>>> + if (proc) >>>>>>>>>>> + { >>>>>>>>>>> + if (nprocs == 0) >>>>>>>>>>> + appendStringInfo(&buf, "%d", proc->pid); >>>>>>>>>>> + else >>>>>>>>>>> + appendStringInfo(&buf, ", %d", proc->pid); >>>>>>>>>>> + >>>>>>>>>>> + nprocs++; >>>>>>>>>>> >>>>>>>>>>> What happens if all the backends in wait_list have gone? In >>>>>>>>>>> other words, >>>>>>>>>>> how should we handle the case where nprocs == 0 (i.e., nprocs >>>>>>>>>>> has not been >>>>>>>>>>> incrmented at all)? This would very rarely happen, but can happen. >>>>>>>>>>> In this case, since buf.data is empty, at least there seems no >>>>>>>>>>> need to log >>>>>>>>>>> the list of conflicting processes in detail message. >>>>>>>>>> Yes, I noticed this too; this can be simplified by changing the >>>>>>>>>> condition in the ereport() call to be "nprocs > 0" (rather than >>>>>>>>>> wait_list being null), otherwise not print the errdetail. (You >>>>>>>>>> could >>>>>>>>>> test buf.data or buf.len instead, but that seems uglier to me.) >>>>>>>>> +1 >>>>>>>>> >>>>>>>>> Maybe we can also improve the comment of this function from: >>>>>>>>> >>>>>>>>> + * This function also reports the details about the conflicting >>>>>>>>> + * process ids if *wait_list is not NULL. >>>>>>>>> >>>>>>>>> to " This function also reports the details about the conflicting >>>>>>>>> process ids if exist" or something. >>>>>>>>> >>>>>>>> Thank you all for the review/remarks. >>>>>>>> >>>>>>>> They have been addressed in the new attached patch version. >>>>>>> >>>>>>> Thanks for updating the patch! I read through the patch again >>>>>>> and applied the following chages to it. Attached is the updated >>>>>>> version of the patch. Could you review this version? If there is >>>>>>> no issue in it, I'm thinking to commit this version. >>>>>> >>>>>> Thank you for updating the patch! I have one question. >>>>>> >>>>>>> >>>>>>> + timeouts[cnt].id = STANDBY_TIMEOUT; >>>>>>> + timeouts[cnt].type = TMPARAM_AFTER; >>>>>>> + timeouts[cnt].delay_ms = DeadlockTimeout; >>>>>>> >>>>>>> Maybe STANDBY_TIMEOUT should be STANDBY_DEADLOCK_TIMEOUT here? >>>>>>> I changed the code that way. >>>>>> >>>>>> As the comment of ResolveRecoveryConflictWithLock() says the >>>>>> following, a deadlock is detected by the ordinary backend process: >>>>>> >>>>>> * Deadlocks involving the Startup process and an ordinary backend >>>>>> proces >>>>>> * will be detected by the deadlock detector within the ordinary >>>>>> backend. >>>>>> >>>>>> If we use STANDBY_DEADLOCK_TIMEOUT, >>>>>> SendRecoveryConflictWithBufferPin() will be called after >>>>>> DeadlockTimeout passed, but I think it's not necessary for the startup >>>>>> process in this case. >>>>> >>>>> Thanks for pointing this! You are right. >>>>> >>>>> >>>>>> If we want to just wake up the startup process >>>>>> maybe we can use STANDBY_TIMEOUT here? >>>>> >>>> Thanks for the patch updates! Except what we are still discussing below, >>>> it looks good to me. >>>> >>>>> When STANDBY_TIMEOUT happens, a request to release conflicting buffer >>>>> pins is sent. Right? If so, we should not also use STANDBY_TIMEOUT there? >>>> >>>> Agree >>>> >>>>> >>>>> Or, first of all, we don't need to enable the deadlock timer at all? >>>>> Since what we'd like to do is to wake up after deadlock_timeout >>>>> passes, we can do that by changing ProcWaitForSignal() so that it can >>>>> accept the timeout and giving the deadlock_timeout to it. If we do >>>>> this, maybe we can get rid of STANDBY_LOCK_TIMEOUT from >>>>> ResolveRecoveryConflictWithLock(). Thought? >>> >>> Where do we enable deadlock timeout in hot standby case? You meant to >>> enable it in ProcWaitForSignal() or where we set a timer for not hot >>> standby case, in ProcSleep()? >> >> No, what I tried to say is to change ResolveRecoveryConflictWithLock() so that it does >> >> 1. calculate the "minimum" timeout from deadlock_timeout and max_standby_xxx_delay >> 2. give the calculated timeout value to ProcWaitForSignal() >> 3. wait for signal and timeout on ProcWaitForSignal() >> >> Since ProcWaitForSignal() calls WaitLatch(), seems it's not so difficult to make ProcWaitForSignal() handle the timeout.If we do this, I was thinking that we can get rid of enable_timeouts() from ResolveRecoveryConflictWithLock(). >> >> >>> >>>> >>>> Why not simply use (again) the STANDBY_LOCK_TIMEOUT one? (as it triggers >>>> a call to StandbyLockTimeoutHandler() which does nothing, except waking >>>> up. That's what we want, right?) >>> >>> Right, what I wanted to mean is STANDBY_LOCK_TIMEOUT. The startup >>> process can wake up and do nothing. Thank you for pointing out. >> >> Okay, understood! Firstly I was thinking that enabling the same type (i.e., STANDBY_LOCK_TIMEOUT) of lock twice doesn'twork properly, but as far as I read the code, it works. In that case, only the shorter timeout would be activatedin enable_timeouts(). So I agree to use STANDBY_LOCK_TIMEOUT. > > So I renamed the argument "deadlock_timer" in ResolveRecoveryConflictWithLock() > because it's not the timer for deadlock and is confusing. Attached is the > updated version of the patch. Barring any objection, I will commit this version. Since the recent commit 8900b5a9d5 changed the recovery conflict code, I updated the patch. Attached is the updated version of the patch. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
pgsql-hackers by date: