Home > mailing lists

Re: add retry mechanism for achieving recovery target before emitting FATA error "recovery ended before configured recovery target was reached" - Mailing list pgsql-hackers

From	Jeff Davis
Subject	Re: add retry mechanism for achieving recovery target before emitting FATA error "recovery ended before configured recovery target was reached"
Date	October 22, 2021 20:16:54
Msg-id	b334d61396e6b0657a63dc38e16d429703fe9b96.camel@j-davis.com Whole thread Raw
In response to	Re: add retry mechanism for achieving recovery target before emitting FATA error "recovery ended before configured recovery target was reached" (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses	Re: add retry mechanism for achieving recovery target before emitting FATA error "recovery ended before configured recovery target was reached" Re: add retry mechanism for achieving recovery target before emitting FATA error "recovery ended before configured recovery target was reached"
List	pgsql-hackers

Tree view

On Fri, 2021-10-22 at 15:34 +0530, Bharath Rupireddy wrote:
> If the suggestion is to have the wait and retry logic embedded into
> the user-written restore_command, IMHO, it's not a good idea as the
> restore_command is external to the core PG and the FATAL error
> "recovery ended before configured recovery target was reached" is an
> internal thing. 

It seems likely that you'd want to tweak the exact behavior for the
given system. For instance, if the files are making some progress, and
you can estimate that in 2 more minutes everything will be fine, then
you may be more willing to wait those two minutes. But if no progress
has happened since recovery began 15 minutes ago, you may want to fail
immediately.

All of this nuance would be better captured in a specialized script
than a generic timeout in the server code.

What do you want to do after the timeout happens? If you want to issue
a WARNING instead of failing outright, perhaps that makes sense for
exploratory PITR cases. That could be a simple boolean GUC without
needing to introduce the timeout logic into the server.

I think it's an interesting point that it can be hard to choose a
reasonable recovery target if the system is completely down. We could
use some better tooling or metadata around the lsns, xids or timestamp
ranges available in a pg_wal directory or an archive. Even better would
be to see the available named restore points. This would make is easier
to calculate how long recovery might take for a given restore point, or
whether it's not going to work at all because there's not enough WAL.

Regards,
    Jeff Davis

pgsql-hackers by date:

From: "Bossart, Nathan"
Date: 22 October 2021, 20:14:33
Subject: Re: pg_dump handling of ALTER DEFAULT PRIVILEGES IN SCHEMA

From: Tom Lane
Date: 22 October 2021, 20:32:39
Subject: Re: Experimenting with hash tables inside pg_dump

Re: add retry mechanism for achieving recovery target before emitting FATA error "recovery ended before configured recovery target was reached" - Mailing list pgsql-hackers

Previous

Next