add retry mechanism for achieving recovery target before emitting FATA error "recovery ended before configured recovery target was reached" - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject add retry mechanism for achieving recovery target before emitting FATA error "recovery ended before configured recovery target was reached"
Date
Msg-id CALj2ACULyUY_GgCf-MSZQUsvD_Fk_F+79qz0F53b2f_KdugZhA@mail.gmail.com
Whole thread Raw
Responses Re: add retry mechanism for achieving recovery target before emitting FATA error "recovery ended before configured recovery target was reached"  (Jeff Davis <pgsql@j-davis.com>)
Re: add retry mechanism for achieving recovery target before emitting FATA error "recovery ended before configured recovery target was reached"  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
Hi,

The  FATAL error "recovery ended before configured recovery target was
reached" introduced by commit at [1] in PG 14 is causing the standby
to go down after having spent a good amount of time in recovery. There
can be cases where the arrival of required WAL (for reaching recovery
target) from the archive location to the standby may take time and
meanwhile the standby failing with the FATAL error isn't good.
Instead, how about we make the standby wait for a certain amount of
time (with a GUC) so that it can keep looking for the required WAL. If
it gets the required WAL during the wait time, then it succeeds in
reaching the recovery target (no FATAL error of course). If it
doesn't, the timeout occurs and the standby fails with the FATAL
error. The value of the new GUC can probably be set to the average
time it takes for the WAL to reach archive location from the primary +
from archive location to the standby, default 0 i.e. disabled.

I'm attaching a WIP patch. I've tested it on my dev system and the
recovery regression tests are passing with it. I will provide a better
version later, probably with a test case.

Thoughts?

[1] commit dc788668bb269b10a108e87d14fefd1b9301b793

Author: Peter Eisentraut <peter@eisentraut.org>
Date:   Wed Jan 29 15:43:32 2020 +0100

    Fail if recovery target is not reached

    Before, if a recovery target is configured, but the archive ended
    before the target was reached, recovery would end and the server would
    promote without further notice.  That was deemed to be pretty wrong.
    With this change, if the recovery target is not reached, it is a fatal
    error.

    Based-on-patch-by: Leif Gunnar Erlandsen <leif@lako.no>
    Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
    Discussion:
https://www.postgresql.org/message-id/flat/993736dd3f1713ec1f63fc3b653839f5@lako.no

Regards,
Bharath Rupireddy.

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [RFC] building postgres with meson
Next
From: Alvaro Herrera
Date:
Subject: Re: pgsql: Document XLOG_INCLUDE_XID a little better