Re: Is Recovery actually paused? - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Is Recovery actually paused?
Date
Msg-id CALj2ACVqsCsgsjRGf4VY5V60a3zA9n-D6deT3NcqNdjgYDBLmA@mail.gmail.com
Whole thread Raw
In response to Re: Is Recovery actually paused?  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: Is Recovery actually paused?  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
On Tue, Feb 9, 2021 at 11:30 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Tue, 9 Feb 2021 09:58:30 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
> > On Tue, Feb 9, 2021 at 9:48 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Tue, Feb 9, 2021 at 8:54 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> > > >
> > > > On Tue, 09 Feb 2021 10:58:04 +0900 (JST)
> > > > Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
> > > > > If we are going to introduce that complexity, I'd like to re-propose
> > > > > to introduce interlocking between the recovery side and the
> > > > > pause-requestor side instead of introducing the intermediate state,
> > > > > which is the cause of the complexity.
> > > > >
> > > > > The attached PoC patch adds:
> > > > >
> > > > > - A solid checkpoint just before calling rm_redo. It doesn't add a
> > > > >   info_lck since the check is done in the existing lock section.
> > > > >
> > > > > - Interlocking between the above and SetRecoveryPause without adding a
> > > > >   shared variable.
> > > > >   (This is what I called "synchronous" before.)
> > > >
> > > > I think waiting in pg_wal_replay_pasue is a possible option, but this will
> > > > also introduce other complexity to codes such as possibility of waiting for
> > > > long or for ever.  For example, waiting in SetRecoveryPause as in your POC
> > > > patch appears to make recovery stuck in  RecoveryRequiresIntParameter.
> > > >
> > >
> > > I agree with this,  I think we previously discussed these approaches
> > > where we can wait in pg_wal_replay_pasue() or
> > > pg_is_wal_replay_pasued().  In fact, we had an older version where we
> > > put the wait in pg_is_wal_replay_pasued().  But it appeared that doing
> > > so will add extra complexity as well as instead of waiting in these
> > > APIs the wait logic can be implemented in the application code which
> > > is actually using these APIs and IMHO that will give better control to
> > > the users.
> >
> > And also, having waiting logic in pg_wal_replay_pasue() or
> > pg_is_wal_replay_pasued() required changes to the existing API such as
> > a timeout to not allow them infinitely waiting.
>
> I don't understand that. pg_wal_replay_pause() is defined as "pausees
> recovery". so it is the correct behavior to wait actual pause.
> pg_is_wal_replay_paused() doesn't wait for anything at all.

What I meant was that if we were to add waiting logic inside
pg_wal_replay_pause, we should also have a timeout with some default
value, to avoid pg_wal_replay_pause waiting forever in the waiting
loop. Within that timeout, if the recovery isn't paused,
pg_wal_replay_pause will return probably a warning and a false(this
requires us to change the return value of the existing
pg_wal_replay_pause)?

To avoid changing the existing API and return type, a new function
pg_get_wal_replay_pause_state is introduced.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: "Wang, Shenhao"
Date:
Subject: RE: parse mistake in ecpg connect string
Next
From: Josef Šimánek
Date:
Subject: Re: Improvements and additions to COPY progress reporting