Re: add retry mechanism for achieving recovery target before emitting FATA error "recovery ended before configured recovery target was reached" - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: add retry mechanism for achieving recovery target before emitting FATA error "recovery ended before configured recovery target was reached"
Date
Msg-id CALj2ACWphKWBJUPhtddjcRRqtE7YZh+65hTM_htrBzrZ87QXPg@mail.gmail.com
Whole thread Raw
In response to Re: add retry mechanism for achieving recovery target before emitting FATA error "recovery ended before configured recovery target was reached"  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: add retry mechanism for achieving recovery target before emitting FATA error "recovery ended before configured recovery target was reached"  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On Sat, Oct 23, 2021 at 1:46 AM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Fri, 2021-10-22 at 15:34 +0530, Bharath Rupireddy wrote:
> > If the suggestion is to have the wait and retry logic embedded into
> > the user-written restore_command, IMHO, it's not a good idea as the
> > restore_command is external to the core PG and the FATAL error
> > "recovery ended before configured recovery target was reached" is an
> > internal thing.
>
> What do you want to do after the timeout happens? If you want to issue
> a WARNING instead of failing outright, perhaps that makes sense for
> exploratory PITR cases. That could be a simple boolean GUC without
> needing to introduce the timeout logic into the server.

If you are suggesting to give the user more control on what should
happen to the standby even after the timeout,  then, the 2 new GUCs
recovery_target_retry_timeout (int) and
recovery_target_continue_after_timeout (bool) will really help users
choose what they want. I'm not sure if it is okay to have 2 new GUCs.
Let's hear from other hackers what they think about this.

> I think it's an interesting point that it can be hard to choose a
> reasonable recovery target if the system is completely down. We could
> use some better tooling or metadata around the lsns, xids or timestamp
> ranges available in a pg_wal directory or an archive. Even better would
> be to see the available named restore points. This would make is easier
> to calculate how long recovery might take for a given restore point, or
> whether it's not going to work at all because there's not enough WAL.

I think pg_waldump can help here to do some exploratory analysis of
the available WAL in the directory where the WAL files are present.
Since it is an independent C program, it can run even when the server
is down and also run on archive location.

Regards,
Bharath Rupireddy.



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)
Next
From: Michael Paquier
Date:
Subject: Re: pg_receivewal starting position