Re: Is Recovery actually paused? - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: Is Recovery actually paused? |
Date | |
Msg-id | CALj2ACVqsCsgsjRGf4VY5V60a3zA9n-D6deT3NcqNdjgYDBLmA@mail.gmail.com Whole thread Raw |
In response to | Re: Is Recovery actually paused? (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
Responses |
Re: Is Recovery actually paused?
|
List | pgsql-hackers |
On Tue, Feb 9, 2021 at 11:30 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Tue, 9 Feb 2021 09:58:30 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > > On Tue, Feb 9, 2021 at 9:48 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Tue, Feb 9, 2021 at 8:54 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > > > > On Tue, 09 Feb 2021 10:58:04 +0900 (JST) > > > > Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > > > > If we are going to introduce that complexity, I'd like to re-propose > > > > > to introduce interlocking between the recovery side and the > > > > > pause-requestor side instead of introducing the intermediate state, > > > > > which is the cause of the complexity. > > > > > > > > > > The attached PoC patch adds: > > > > > > > > > > - A solid checkpoint just before calling rm_redo. It doesn't add a > > > > > info_lck since the check is done in the existing lock section. > > > > > > > > > > - Interlocking between the above and SetRecoveryPause without adding a > > > > > shared variable. > > > > > (This is what I called "synchronous" before.) > > > > > > > > I think waiting in pg_wal_replay_pasue is a possible option, but this will > > > > also introduce other complexity to codes such as possibility of waiting for > > > > long or for ever. For example, waiting in SetRecoveryPause as in your POC > > > > patch appears to make recovery stuck in RecoveryRequiresIntParameter. > > > > > > > > > > I agree with this, I think we previously discussed these approaches > > > where we can wait in pg_wal_replay_pasue() or > > > pg_is_wal_replay_pasued(). In fact, we had an older version where we > > > put the wait in pg_is_wal_replay_pasued(). But it appeared that doing > > > so will add extra complexity as well as instead of waiting in these > > > APIs the wait logic can be implemented in the application code which > > > is actually using these APIs and IMHO that will give better control to > > > the users. > > > > And also, having waiting logic in pg_wal_replay_pasue() or > > pg_is_wal_replay_pasued() required changes to the existing API such as > > a timeout to not allow them infinitely waiting. > > I don't understand that. pg_wal_replay_pause() is defined as "pausees > recovery". so it is the correct behavior to wait actual pause. > pg_is_wal_replay_paused() doesn't wait for anything at all. What I meant was that if we were to add waiting logic inside pg_wal_replay_pause, we should also have a timeout with some default value, to avoid pg_wal_replay_pause waiting forever in the waiting loop. Within that timeout, if the recovery isn't paused, pg_wal_replay_pause will return probably a warning and a false(this requires us to change the return value of the existing pg_wal_replay_pause)? To avoid changing the existing API and return type, a new function pg_get_wal_replay_pause_state is introduced. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: