Re: Add Information during standby recovery conflicts - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: Add Information during standby recovery conflicts |
Date | |
Msg-id | d78ead3f-b7e1-74ab-7b61-6065bbf8f6c1@oss.nttdata.com Whole thread Raw |
In response to | Re: Add Information during standby recovery conflicts (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Responses |
Re: Add Information during standby recovery conflicts
Re: Add Information during standby recovery conflicts |
List | pgsql-hackers |
On 2020/12/14 21:31, Fujii Masao wrote: > > > On 2020/12/05 12:38, Masahiko Sawada wrote: >> On Fri, Dec 4, 2020 at 7:22 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote: >>> >>> Hi, >>> >>> On 12/4/20 2:21 AM, Fujii Masao wrote: >>>> >>>> On 2020/12/04 9:28, Masahiko Sawada wrote: >>>>> On Fri, Dec 4, 2020 at 2:54 AM Fujii Masao >>>>> <masao.fujii@oss.nttdata.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On 2020/12/01 17:29, Drouvot, Bertrand wrote: >>>>>>> Hi, >>>>>>> >>>>>>> On 12/1/20 12:35 AM, Masahiko Sawada wrote: >>>>>>>> CAUTION: This email originated from outside of the organization. >>>>>>>> Do not click links or open attachments unless you can confirm the >>>>>>>> sender and know the content is safe. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Dec 1, 2020 at 3:25 AM Alvaro Herrera >>>>>>>> <alvherre@alvh.no-ip.org> wrote: >>>>>>>>> On 2020-Dec-01, Fujii Masao wrote: >>>>>>>>> >>>>>>>>>> + if (proc) >>>>>>>>>> + { >>>>>>>>>> + if (nprocs == 0) >>>>>>>>>> + appendStringInfo(&buf, "%d", proc->pid); >>>>>>>>>> + else >>>>>>>>>> + appendStringInfo(&buf, ", %d", proc->pid); >>>>>>>>>> + >>>>>>>>>> + nprocs++; >>>>>>>>>> >>>>>>>>>> What happens if all the backends in wait_list have gone? In >>>>>>>>>> other words, >>>>>>>>>> how should we handle the case where nprocs == 0 (i.e., nprocs >>>>>>>>>> has not been >>>>>>>>>> incrmented at all)? This would very rarely happen, but can happen. >>>>>>>>>> In this case, since buf.data is empty, at least there seems no >>>>>>>>>> need to log >>>>>>>>>> the list of conflicting processes in detail message. >>>>>>>>> Yes, I noticed this too; this can be simplified by changing the >>>>>>>>> condition in the ereport() call to be "nprocs > 0" (rather than >>>>>>>>> wait_list being null), otherwise not print the errdetail. (You >>>>>>>>> could >>>>>>>>> test buf.data or buf.len instead, but that seems uglier to me.) >>>>>>>> +1 >>>>>>>> >>>>>>>> Maybe we can also improve the comment of this function from: >>>>>>>> >>>>>>>> + * This function also reports the details about the conflicting >>>>>>>> + * process ids if *wait_list is not NULL. >>>>>>>> >>>>>>>> to " This function also reports the details about the conflicting >>>>>>>> process ids if exist" or something. >>>>>>>> >>>>>>> Thank you all for the review/remarks. >>>>>>> >>>>>>> They have been addressed in the new attached patch version. >>>>>> >>>>>> Thanks for updating the patch! I read through the patch again >>>>>> and applied the following chages to it. Attached is the updated >>>>>> version of the patch. Could you review this version? If there is >>>>>> no issue in it, I'm thinking to commit this version. >>>>> >>>>> Thank you for updating the patch! I have one question. >>>>> >>>>>> >>>>>> + timeouts[cnt].id = STANDBY_TIMEOUT; >>>>>> + timeouts[cnt].type = TMPARAM_AFTER; >>>>>> + timeouts[cnt].delay_ms = DeadlockTimeout; >>>>>> >>>>>> Maybe STANDBY_TIMEOUT should be STANDBY_DEADLOCK_TIMEOUT here? >>>>>> I changed the code that way. >>>>> >>>>> As the comment of ResolveRecoveryConflictWithLock() says the >>>>> following, a deadlock is detected by the ordinary backend process: >>>>> >>>>> * Deadlocks involving the Startup process and an ordinary backend >>>>> proces >>>>> * will be detected by the deadlock detector within the ordinary >>>>> backend. >>>>> >>>>> If we use STANDBY_DEADLOCK_TIMEOUT, >>>>> SendRecoveryConflictWithBufferPin() will be called after >>>>> DeadlockTimeout passed, but I think it's not necessary for the startup >>>>> process in this case. >>>> >>>> Thanks for pointing this! You are right. >>>> >>>> >>>>> If we want to just wake up the startup process >>>>> maybe we can use STANDBY_TIMEOUT here? >>>> >>> Thanks for the patch updates! Except what we are still discussing below, >>> it looks good to me. >>> >>>> When STANDBY_TIMEOUT happens, a request to release conflicting buffer >>>> pins is sent. Right? If so, we should not also use STANDBY_TIMEOUT there? >>> >>> Agree >>> >>>> >>>> Or, first of all, we don't need to enable the deadlock timer at all? >>>> Since what we'd like to do is to wake up after deadlock_timeout >>>> passes, we can do that by changing ProcWaitForSignal() so that it can >>>> accept the timeout and giving the deadlock_timeout to it. If we do >>>> this, maybe we can get rid of STANDBY_LOCK_TIMEOUT from >>>> ResolveRecoveryConflictWithLock(). Thought? >> >> Where do we enable deadlock timeout in hot standby case? You meant to >> enable it in ProcWaitForSignal() or where we set a timer for not hot >> standby case, in ProcSleep()? > > No, what I tried to say is to change ResolveRecoveryConflictWithLock() so that it does > > 1. calculate the "minimum" timeout from deadlock_timeout and max_standby_xxx_delay > 2. give the calculated timeout value to ProcWaitForSignal() > 3. wait for signal and timeout on ProcWaitForSignal() > > Since ProcWaitForSignal() calls WaitLatch(), seems it's not so difficult to make ProcWaitForSignal() handle the timeout.If we do this, I was thinking that we can get rid of enable_timeouts() from ResolveRecoveryConflictWithLock(). > > >> >>> >>> Why not simply use (again) the STANDBY_LOCK_TIMEOUT one? (as it triggers >>> a call to StandbyLockTimeoutHandler() which does nothing, except waking >>> up. That's what we want, right?) >> >> Right, what I wanted to mean is STANDBY_LOCK_TIMEOUT. The startup >> process can wake up and do nothing. Thank you for pointing out. > > Okay, understood! Firstly I was thinking that enabling the same type (i.e., STANDBY_LOCK_TIMEOUT) of lock twice doesn'twork properly, but as far as I read the code, it works. In that case, only the shorter timeout would be activatedin enable_timeouts(). So I agree to use STANDBY_LOCK_TIMEOUT. So I renamed the argument "deadlock_timer" in ResolveRecoveryConflictWithLock() because it's not the timer for deadlock and is confusing. Attached is the updated version of the patch. Barring any objection, I will commit this version. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
pgsql-hackers by date: