Re: Is Recovery actually paused? - Mailing list pgsql-hackers

From Yugo NAGATA
Subject Re: Is Recovery actually paused?
Date
Msg-id 20201130154534.c79e91281165721d2062081e@sraoss.co.jp
Whole thread Raw
In response to Re: Is Recovery actually paused?  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: Is Recovery actually paused?
List pgsql-hackers
On Thu, 22 Oct 2020 20:36:48 +0530
Dilip Kumar <dilipbalaut@gmail.com> wrote:

> On Thu, Oct 22, 2020 at 7:50 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Oct 22, 2020 at 6:59 AM Kyotaro Horiguchi
> > <horikyota.ntt@gmail.com> wrote:
> > >
> > > At Wed, 21 Oct 2020 11:14:24 -0400, Robert Haas <robertmhaas@gmail.com> wrote in
> > > > On Wed, Oct 21, 2020 at 7:16 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > One idea could be, if the recovery process is waiting for WAL and a
> > > > > recovery pause is requested then we can assume that the recovery is
> > > > > paused because before processing the next wal it will always check
> > > > > whether the recovery pause is requested or not.
> > > ..
> > > > However, it might be better to implement this by having the system
> > > > absorb the pause immediately when it's in this state, rather than
> > > > trying to detect this state and treat it specially.
> > >
> > > The paused state is shown in pg_stat_activity.wait_event and it is
> > > strange that pg_is_wal_replay_paused() is inconsistent with the
> > > column.
> >
> > Right
> >
> > To make them consistent, we need to call recoveryPausesHere()
> > > at the end of WaitForWALToBecomeAvailable() and let
> > > pg_wal_replay_pause() call WakeupRecovery().
> > >
> > > I think we don't need a separate function to find the state.
> >
> > The idea makes sense to me.  I will try to change the patch as per the
> > suggestion.
> 
> Here is the patch based on this idea.

I reviewd this patch. 

First, I made a recovery conflict situation using a table lock.

Standby:
#= begin;
#= select * from t;

Primary:
#= begin;
#= lock t in ;

After this, WAL of the table lock cannot be replayed due to a lock acquired
in the standby.

Second, during the delay, I executed pg_wal_replay_pause() and
pg_is_wal_replay_paused(). Then, pg_is_wal_replay_paused was blocked until
max_standby_streaming_delay was expired, and eventually returned true.

I can also see the same behaviour by setting recovery_min_apply_delay.

So, pg_is_wal_replay_paused waits for recovery to get paused and this works
successfully as expected.

However, I wonder users don't expect pg_is_wal_replay_paused to wait.
Especially, if max_standby_streaming_delay is -1, this will be blocked forever, 
although this setting may not be usual. In addition, some users may set 
recovery_min_apply_delay for a large.  If such users call pg_is_wal_replay_paused,
it could wait for a long time.

At least, I think we need some descriptions on document to explain
pg_is_wal_replay_paused could wait while a time. 

Also, how about adding a new boolean argument to pg_is_wal_replay_paused to
control whether this waits for recovery to get paused or not? By setting its
default value to true or false, users can use the old format for calling this
and the backward compatibility can be maintained.


As another comment, while pg_is_wal_replay_paused is blocking, I can not cancel
the query. I think CHECK_FOR_INTERRUPTS() is necessary in the waiting loop.


+                   errhint("Recovery control functions can only be executed during recovery.")));          

There are a few tabs at the end of this line.

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Disable WAL logging to speed up data loading
Next
From: "Drouvot, Bertrand"
Date:
Subject: Re: Add Information during standby recovery conflicts