Re: Is Recovery actually paused? - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Is Recovery actually paused?
Date
Msg-id 20210125.121253.1672152121572156425.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Is Recovery actually paused?  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: Is Recovery actually paused?  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
At Sun, 24 Jan 2021 14:26:08 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in 
> On Sun, Jan 24, 2021 at 12:16 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Sun, Jan 24, 2021 at 7:17 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > On Sat, 23 Jan 2021 at 4:40 PM, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >>
> > >> On Sat, Jan 23, 2021 at 1:36 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >> > Please find the patch for the same.  I haven't added a test case for
> > >> > this yet.  I mean we can write a test case to pause the recovery and
> > >> > get the status.  But I am not sure that we can really write a reliable
> > >> > test case for 'pause requested' and 'paused'.
> > >>
> > >> +1 to just show the recovery pause state in the output of
> > >> pg_is_wal_replay_paused. But, should the function name
> > >> "pg_is_wal_replay_paused" be something like
> > >> "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists
> > >> in a function, I expect a boolean output. Others may have better
> > >> thoughts.
> > >>
> > >> IIUC the above change, ensuring the recovery is paused after it's
> > >> requested lies with the user. IMHO, the main problem we are trying to
> > >> solve is not met
> > >
> > >
> > > Basically earlier their was no way for the user yo know whether the recovery is actually paused or not because it
wasalways returning true after pause requested.  Now, we will return whether pause requested or actually paused.  So >
fortool designer who want to wait for recovery to get paused can have a loop and wait until the recovery state reaches
topaused.  That will give a better control.
 
> >
> > I get it and I agree to have that change. My point was whether we can
> > have a new function pg_wal_replay_pause_and_wait that waits until
> > recovery is actually paused ((along with pg_is_wal_replay_paused
> > returning the actual state than a true/false) so that tool developers
> > don't need to have the waiting code outside, if at all they care about
> > it? Others may have better thoughts than me.
> 
> I think the previous patch was based on that idea where we thought
> that we can pass an argument to pg_is_wal_replay_paused which can
> decide whether to wait or return without the wait.  I think this
> version looks better to me where we give the status instead of
> waiting.  I am not sure whether we want another version of
> pg_wal_replay_pause which will wait for actually it to get paused.  I
> mean there is always a scope to include the functionality in the
> database which can be achieved by the tool, but this patch was trying
> to solve the problem that there was no way to know the status so I
> think returning the correct status should be the scope of this.

I understand that the requirement here is that no record is replayed
after pg_wal_replay_pause() is returned, or pg_is_wal_replay_paused()
returns true, and delays taken while recovery don't delay the state
change. That requirements are really synchronous.

On the other hand the machinery is designed to be asynchronous.

>     * Note that we intentionally don't take the info_lck spinlock
>     * here.  We might therefore read a slightly stale value of
>     * the recoveryPause flag, but it can't be very stale (no
>     * worse than the last spinlock we did acquire).  Since a
>     * pause request is a pretty asynchronous thing anyway,
>     * possibly responding to it one WAL record later than we
>     * otherwise would is a minor issue, so it doesn't seem worth
>     * adding another spinlock cycle to prevent that.

As the result, this patch tries to introduce several new checkpoints
to some delaying point so that that waits can find pause request in a
timely manner. I think we had better use locking (or atomics) for the
information instead of such scattered checkpoints if we expect that
machinery to work in a such syhcnronous manner.

That would make the tri-state state variable and many checkpoints
unnecessary. Maybe.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: "Tang, Haiying"
Date:
Subject: RE: Parallel Inserts in CREATE TABLE AS
Next
From: Amit Kapila
Date:
Subject: Re: Single transaction in the tablesync worker?