Re: Is Recovery actually paused? - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: Is Recovery actually paused?
Date
Msg-id CAFiTN-tTAa66hEFUAF=6_H9SFe9NJe34Yp95-vFnS4tk4F=s3A@mail.gmail.com
Whole thread Raw
In response to Re: Is Recovery actually paused?  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: Is Recovery actually paused?  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
On Mon, Feb 1, 2021 at 1:41 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Feb 1, 2021 at 11:59 AM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> >
> > At Sun, 31 Jan 2021 11:24:30 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in
> > > On Fri, Jan 29, 2021 at 4:33 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Fri, Jan 29, 2021 at 3:25 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> > > > >
> > > > > On Thu, 28 Jan 2021 09:55:42 +0530
> > > > > Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > > > On Wed, Jan 27, 2021 at 2:28 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > > >
> > > > > > > On Wed, Jan 27, 2021 at 2:06 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> > > > > > > >
> > > > > > > > On Wed, 27 Jan 2021 13:29:23 +0530
> > > > > > > > Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy
> > > > > > > > > > > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > > > > > > > > > > +1 to just show the recovery pause state in the output of
> > > > > > > > > > > > pg_is_wal_replay_paused. But, should the function name
> > > > > > > > > > > > "pg_is_wal_replay_paused" be something like
> > > > > > > > > > > > "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists
> > > > > > > > > > > > in a function, I expect a boolean output. Others may have better
> > > > > > > > > > > > thoughts.
> > > > > > > > > > >
> > > > > > > > > > > Maybe we should leave the existing function pg_is_wal_replay_paused()
> > > > > > > > > > > alone and add a new one with the name you suggest that returns text.
> > > > > > > > > > > That would create less burden for tool authors.
> > > > > > > > > >
> > > > > > > > > > +1
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yeah, we can do that, I will send an updated patch soon.
> > > > > > > >
> > > > > > > > This means pg_is_wal_replay_paused is left without any change and this
> > > > > > > > returns whether pause is requested or not? If so, it seems good to modify
> > > > > > > > the documentation of this function in order to note that this could not
> > > > > > > > return the actual pause state.
> > > > > > >
> > > > > > > Yes, we can say that it will return true if the replay pause is
> > > > > > > requested.  I am changing that in my new patch.
> > > > > >
> > > > > > I have modified the patch, changes
> > > > > >
> > > > > > - I have added a new interface pg_get_wal_replay_pause_state to get
> > > > > > the pause request state
> > > > > > - Now, we are not waiting for the recovery to actually get paused so I
> > > > > > think it doesn't make sense to put a lot of checkpoints to check the
> > > > > > pause requested so I have removed that check from the
> > > > > > recoveryApplyDelay but I think it better we still keep that check in
> > > > > > the WaitForWalToBecomeAvailable because it can wait forever before the
> > > > > > next wal get available.
> > > > >
> > > > > I think basically the check in WaitForWalToBecomeAvailable is independent
> > > > > of the feature of pg_get_wal_replay_pause_state, that is, reporting the
> > > > > actual pause state.  This function could just return 'pause requested'
> > > > > if a pause is requested during waiting for WAL.
> > > > >
> > > > > However, I agree the change to allow recovery to transit the state to
> > > > > 'paused' during WAL waiting because 'paused' has more useful information
> > > > > for users than 'pause requested'.  Returning 'paused' lets users know
> > > > > clearly that no more WAL are applied until recovery is resumed.  On the
> > > > > other hand, when 'pause requested' is returned, user can't say whether
> > > > > the next WAL wiill be applied or not from this information.
> > > > >
> > > > > For the same reason, I think it is also useful to call recoveryPausesHere
> > > > > in recoveryApplyDelay.
> > > >
> > > > IMHO the WaitForWalToBecomeAvailable can wait until the next wal get
> > > > available so it can not be controlled by user so it is good to put a
> > > > check for the recovery pause,  however recoveryApplyDelay wait for the
> > > > apply delay which is configured by user and it is predictable value by
> > > > the user.  I don't have much objection to putting that check in the
> > > > recoveryApplyDelay as well but I feel it is not necessary.  Any other
> > > > thoughts on this?
> > > >
> > > > > In addition, in RecoveryRequiresIntParameter, recovery should get paused
> > > > > if a parameter value has a problem.  However, pg_get_wal_replay_pause_state
> > > > > will return 'pause requested' in this case. So, I think, we should pass
> > > > > RECOVERY_PAUSED to SetRecoveryPause() instead of RECOVERY_PAUSE_REQUESTED,
> > > > > or call CheckAndSetRecoveryPause() in the loop like recoveryPausesHere().
> > > >
> > > > Yeah, absolutely right, it must pass RECOVERY_PAUSED.  I will change
> > > > this, thanks for noticing this.
> > >
> > > I have changed this in the new patch.
> >
> > It seems to work well. The checkpoints seems to be placed properly.
>
> Okay
>
> > +SetRecoveryPause(RecoveryPauseState state)
> >  {
> > +       Assert(state >= RECOVERY_NOT_PAUSED && state <= RECOVERY_PAUSED);
> >
> > I'm not sure that state worth FATAL.  Isn't it enough to just ERROR
> > out like XLogFileRead?
>
> Yeah, that makes sense to me.
>
> > CheckAndSetRecovery() has only one caller.  I think it's better to
> > write the code directly.
>
> Okay, I will change.
>
> > I think the documentation of pg_wal_replay_pause needs to be a bit
> > more detailed about the difference between the two states "pause
> > requested" and "paused". Something like "A request doesn't mean that
> > recovery stops right away. If you want a guarantee that recovery is
> > actually paused, you need to check for the recovery pause state
> > returned by pg_wal_replay_pause_state(). Note that
> > pg_is_wal_repay_paused() returns whether a request is made."
>
> That seems like better idea, I will change.
>

Please find an updated patch which addresses these comments.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: a curious case of force_parallel_mode = on with jit'ing
Next
From: Amit Kapila
Date:
Subject: Re: pg_replication_origin_drop API potential race condition