Re: Is Recovery actually paused? - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: Is Recovery actually paused?
Date
Msg-id CAFiTN-tU9-+a2b-Zr1ShVkEJj5F-qYRS1MFO8UCxJS=wtvpvvA@mail.gmail.com
Whole thread Raw
In response to Re: Is Recovery actually paused?  (Yugo NAGATA <nagata@sraoss.co.jp>)
Responses Re: Is Recovery actually paused?
List pgsql-hackers
On Mon, Nov 30, 2020 at 12:17 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:

Thanks for looking into this.

> On Thu, 22 Oct 2020 20:36:48 +0530
> Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> > On Thu, Oct 22, 2020 at 7:50 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Thu, Oct 22, 2020 at 6:59 AM Kyotaro Horiguchi
> > > <horikyota.ntt@gmail.com> wrote:
> > > >
> > > > At Wed, 21 Oct 2020 11:14:24 -0400, Robert Haas <robertmhaas@gmail.com> wrote in
> > > > > On Wed, Oct 21, 2020 at 7:16 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > > One idea could be, if the recovery process is waiting for WAL and a
> > > > > > recovery pause is requested then we can assume that the recovery is
> > > > > > paused because before processing the next wal it will always check
> > > > > > whether the recovery pause is requested or not.
> > > > ..
> > > > > However, it might be better to implement this by having the system
> > > > > absorb the pause immediately when it's in this state, rather than
> > > > > trying to detect this state and treat it specially.
> > > >
> > > > The paused state is shown in pg_stat_activity.wait_event and it is
> > > > strange that pg_is_wal_replay_paused() is inconsistent with the
> > > > column.
> > >
> > > Right
> > >
> > > To make them consistent, we need to call recoveryPausesHere()
> > > > at the end of WaitForWALToBecomeAvailable() and let
> > > > pg_wal_replay_pause() call WakeupRecovery().
> > > >
> > > > I think we don't need a separate function to find the state.
> > >
> > > The idea makes sense to me.  I will try to change the patch as per the
> > > suggestion.
> >
> > Here is the patch based on this idea.
>
> I reviewd this patch.
>
> First, I made a recovery conflict situation using a table lock.
>
> Standby:
> #= begin;
> #= select * from t;
>
> Primary:
> #= begin;
> #= lock t in ;
>
> After this, WAL of the table lock cannot be replayed due to a lock acquired
> in the standby.
>
> Second, during the delay, I executed pg_wal_replay_pause() and
> pg_is_wal_replay_paused(). Then, pg_is_wal_replay_paused was blocked until
> max_standby_streaming_delay was expired, and eventually returned true.
>
> I can also see the same behaviour by setting recovery_min_apply_delay.
>
> So, pg_is_wal_replay_paused waits for recovery to get paused and this works
> successfully as expected.
>
> However, I wonder users don't expect pg_is_wal_replay_paused to wait.
> Especially, if max_standby_streaming_delay is -1, this will be blocked forever,
> although this setting may not be usual. In addition, some users may set
> recovery_min_apply_delay for a large.  If such users call pg_is_wal_replay_paused,
> it could wait for a long time.
>
> At least, I think we need some descriptions on document to explain
> pg_is_wal_replay_paused could wait while a time.

Ok

> Also, how about adding a new boolean argument to pg_is_wal_replay_paused to
> control whether this waits for recovery to get paused or not? By setting its
> default value to true or false, users can use the old format for calling this
> and the backward compatibility can be maintained.

So basically, if the wait_recovery_pause flag is false then we will
immediately return true if the pause is requested?  I agree that it is
good to have an API to know whether the recovery pause is requested or
not but I am not sure is it good idea to make this API serve both the
purpose?  Anyone else have any thoughts on this?

>
> As another comment, while pg_is_wal_replay_paused is blocking, I can not cancel
> the query. I think CHECK_FOR_INTERRUPTS() is necessary in the waiting loop.
>
>
> +                   errhint("Recovery control functions can only be executed during recovery.")));
>
> There are a few tabs at the end of this line.

I will fix.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Ajin Cherian
Date:
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Next
From: "tsunakawa.takay@fujitsu.com"
Date:
Subject: RE: POC: postgres_fdw insert batching