Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery - Mailing list pgsql-hackers

From Marco Nenciarini
Subject Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery
Date
Msg-id CA+nrD2eJUfLq8_Ed7fv-7LrmkOoLJ28LwAHh-Rjjg4RU9KOYCg@mail.gmail.com
Whole thread Raw
In response to Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery  (Marco Nenciarini <marco.nenciarini@enterprisedb.com>)
List pgsql-hackers
Here are the v4 patches implementing what I described above.

On top of Xuneng's v3 (keeping the wait_for_event and scoped log
window test improvements), the main changes are:

- The wait is now capped at one wal_segment_size.  If the gap is
  larger, we skip the wait and let START_REPLICATION fail normally
  so the startup process can fall back to archive.  This avoids
  indefinite polling when the upstream is fundamentally behind.

- The first "ahead of flush position" message is logged at LOG,
  subsequent ones at DEBUG1, to cut down on noise during a long wait.

Two patches attached: v4-0001 for master (extends the
walrcv_identify_system API with an optional server_lsn output
parameter) and v4-backpatch-0001 for stable branches (uses a global
variable to preserve ABI, per Alvaro's suggestion).

Both pass the new TAP test.

Best regards,
Marco
Attachment

pgsql-hackers by date:

Previous
From: Álvaro Herrera
Date:
Subject: Re: [19] CREATE SUBSCRIPTION ... SERVER
Next
From: Ashutosh Bapat
Date:
Subject: SQL/PGQ: All properties reference