Re: Delay of standby shutdown - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Delay of standby shutdown
Date
Msg-id 87b9d893-101e-664f-270e-6029b1fe7a0c@oss.nttdata.com
Whole thread Raw
In response to Re: Delay of standby shutdown  (Soumyadeep Chakraborty <soumyadeep2007@gmail.com>)
Responses Re: Delay of standby shutdown  (Soumyadeep Chakraborty <soumyadeep2007@gmail.com>)
List pgsql-hackers

On 2020/11/04 9:35, Soumyadeep Chakraborty wrote:
> Hello Fujii,
> 
> On Thu, Sep 17, 2020 at 6:49 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> 
> As far as I understand after debugging, the problem is as follows:

Yes.


> 
> 1. After the primary is stopped, and the standby startup process is
> waiting inside:
> 
> (void) WaitLatch(&XLogCtl->recoveryWakeupLatch,
> WL_LATCH_SET | WL_TIMEOUT |
> WL_EXIT_ON_PM_DEATH,
> wait_time,
> WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL);
> 
> it receives SIGTERM on account of stopping the standby and it leads to
> the WaitLatch call returning with WL_LATCH_SET.
> 
> 2. WaitForWALToBecomeAvailable() then will return true after calling
> XLogFileReadAnyTLI() and eventually, XLogReadRecord() will return NULL
> since there is no new WAL to read, which means ReadRecord() will loop
> back and perform another XLogReadRecord().
> 
> 3. The additional XLogReadRecord() will lead to a 5s wait inside
> WaitForWALToBecomeAvailable() - a different WaitLatch() call this time:
> 
> /*
>   * Wait for more WAL to arrive. Time out after 5 seconds
>   * to react to a trigger file promptly and to check if the
>   * WAL receiver is still active.
>   */
> (void) WaitLatch(&XLogCtl->recoveryWakeupLatch,
> WL_LATCH_SET | WL_TIMEOUT |
> WL_EXIT_ON_PM_DEATH,
> 5000L, WAIT_EVENT_RECOVERY_WAL_STREAM);
> 
> 4. And then eventually, the code will handle interrupts here inside
> WaitForWALToBecomeAvailable() after the 5s wait:
> 
> /*
>   * This possibly-long loop needs to handle interrupts of startup
>   * process.
>   */
> HandleStartupProcInterrupts();
> 
>> To avoid this issue, I think that ReadRecord() should call
>> HandleStartupProcInterrupts() whenever looping back to retry.
> 
> Alternatively, perhaps we can place it inside
> WaitForWALToBecomeAvailable() (which already has a similar call),
> since it is more semantically aligned to checking for interrupts, rather
> than ReadRecord()? Like this:

Yes. Your approach looks better to me.
Attached is the updated version of the patch implementing that.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: logical streaming of xacts via test_decoding is broken
Next
From: Alexander Korotkov
Date:
Subject: Re: Strange GiST logic leading to uninitialized memory access in pg_trgm gist code