Re: Is Recovery actually paused? - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: Is Recovery actually paused?
Date
Msg-id CAFiTN-sn48p2M3weRm6e8Xaq3KX0FyJSpgZyHih0AqOLVWCKwg@mail.gmail.com
Whole thread Raw
In response to Re: Is Recovery actually paused?  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
On Thu, Jan 21, 2021 at 3:29 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Thanks for reviewing Bharat.

> On Tue, Jan 19, 2021 at 9:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > In the last patch there were some local changes which I did not add to
> > the patch and it was giving compilation warning so fixed that along
> > with that I have addressed your this comment as well.
>
> Thanks for the patch. I took a look at the v5 patch, below are some
> comments. Please ignore if I'm repeating any of the comments discussed
> upthread.
>
> [1] Can we also have a wait version for pg_wal_replay_pause that waits
> until recovery is actually paused right after setting the recovery
> state to RECOVERY_PAUSE_REQUESTED? Something like this -
> pg_wal_replay_pause_and_wait(wait boolean, wait_seconds integer
> DEFAULT 60) returns boolean. It waits until either default or provided
> wait_seconds and returns true if the recovery is paused within that
> wait_seconds otherwise false. If wait_seconds is 0 or -1, then it
> waits until recovery is paused and returns true. One advantage of this
> function is that users don't need to call pg_is_wal_replay_paused().
> IMHO, the job of ensuring whether or not the recovery is actually
> paused, is better done by the one who requests
> it(pg_wal_replay_pause/pg_wal_replay_pause_and_wait).

I don't think we need wait/onwait version for all the APIs,  IMHO it
would be enough for the user to know whether the recovery is actually
paused or not
and for that, we are changing pg_is_wal_replay_paused to wait for the
pause.  However, in the next version in pg_is_wal_replay_paused I will
provide a flag so that the user can decide whether to wait for the
pause or just get the request status.

> [2] Is it intentional that RecoveryPauseRequested() returns true even
> if XLogCtl->recoveryPauseState is either RECOVERY_PAUSE_REQUESTED or
> RECOVERY_PAUSED?

Yes this is intended

> [3] Can we change IsRecoveryPaused() instead of RecoveryIsPaused() and
> IsRecoveryPauseRequested() instead of RecoveryPauseRequested()? How
> about having inline(because they have one line of code) functions like
> IsRecoveryPauseRequested(), IsRecoveryPaused() and
> IsRecoveryInProgress(), returning true when RECOVERY_PAUSE_REQUESTED,
> RECOVERY_PAUSED and RECOVERY_IN_PROGRESS respectively?

Yeah, we can do that, I am not sure whether we need
IsRecoveryInProgress function though.

> [4] Can we have at least one line of comments for each of the new
> functions, I know the function names mean everything? And also for
> existing SetRecoveryPause() and GetRecoveryPauseState()?

Will do that

> [5] Typo, it's "every time" ---> +         * this everytime.

Ok

> [6] Do we reach PG_RETURN_BOOL(true); at the end of
> pg_is_wal_replay_paused()? If not, is it there for satisfying the
> compiler?

Yes

> [7] In pg_is_wal_replay_paused(), do we need if
> (!RecoveryPauseRequested()) inside the for (;;)? If yes, can we add
> comments about why we need it there?

Yes, we need it if replay resumed during the loop,  I will add the comments.

> [8] Isn't it good to have
> pgstat_report_wait_start(WAIT_EVENT_RECOVERY_PAUSE) and
> pgstat_report_wait_end(), just before for (;;) and at the end
> respectively? This will help users in knowing what they are waiting
> for? Alternatively we can issue notice/warnings/write to server log
> while we are waiting in for (;;) for the recovery to get paused?

I think we can do that, let me think about this and get back to you.

> [9] In pg_is_wal_replay_paused(), is 10 msec sleep time chosen
> randomly or based on some analysis that the time it takes to get to
> recovery paused state from request state or some other?

I don't think we can identify that when actually recovery can get
paused.  Though in pg_wal_replay_pause() we are sending a signal to
all the places we are waiting for WAL and right after we are checking.
But it depends upon other configuration parameters like
max_standby_streaming_delay.

> [10]  errhint("The standby was promoted while waiting for recovery to
> be paused."))); Can we know whether standby is actually promoted and
> throw this error? Because the error "recovery is not in progress" here
> is being thrown by just relying on if (!RecoveryInProgress()). IIUC,
> using pg_promote

Because before checking this it has already checked
RecoveryPauseRequested() and if that is true then the
RecoveryInProgress was in progress at some time and now it is not
anymore and that can happen due to promotion.  But I am fine with
reverting to the old error that can not execute if recovery is not in
progress.

> [11] Can we try to add tests for these functions in TAP? Currently, we
> don't have any tests for pg_is_wal_replay_paused, pg_wal_replay_resume
> or pg_wal_replay_pause, but we have tests for pg_promote in timeline
> switch.

I will work on this.

> [12] Isn't it better to change RecoveryPauseState enum
> RECOVERY_IN_PROGRESS value to start from 1 instead of 0? Because when
> XLogCtl shared memory is initialized, I think recoveryPauseState can
> be 0, so should it mean recovery in progress?

I think this state is to track the pause status, either pause
requested or actually pause,  we don't really want to trace the
RECOVERY_IN_PROGRESS.  Maybe we can change the name of that status to
RECOVERY_PAUSE_NOT_REQUESTED or RECOVERY_PAUSE_NONE?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: "osumi.takamichi@fujitsu.com"
Date:
Subject: RE: Stronger safeguard for archive recovery not to miss data
Next
From: Dmitry Dolgov
Date:
Subject: Re: [HACKERS] [PATCH] Generic type subscripting