Hi,
On Thu, Oct 23, 2025 at 9:25 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
> Hi hackers,
>
> I'd like to propose a new archive_mode setting to address a gap in WAL
> archiving for high availability streaming replication configurations.
>
> In HA setups using streaming replication, standbys can be
> promoted when primary has failed. Some WAL segments might be not yet
> archived. This creates gaps in the WAL archive, breaking point-in-time
> recovery:
>
> 1. Primary generates WAL, streams to standby
> 2. Standby receives WAL, marks segments as .done immediately
+1 to the idea.
If I understand correctly, the assumption we're making is that the Standby
doesn't really "archive" just makes it as .done, even though in theory
it could do the same
thing as the primary and avoid this issue. It would be wasted work if
the primary and replica
archives the same WAL and that's what we want to avoid?
>
> ## Implementation
>
> The patch adds two replication protocol messages:
> - 'a' (PqReplMsg_ArchiveStatusQuery): standby → primary, sends (timeline, segno) pairs
> - 'A' (PqReplMsg_ArchiveStatusResponse): primary → standby, responds with archived pairs
>
I might be missing something but isn't it enough for the writer to
send the last_archived_wal
in PgStat_ArchiverStats? That way we can avoid doing the full
directory scan of archive_status.
Or do we not feel comfortable assuming that WAL files are archived in order?
Thanks,
--
John Hsu - Amazon Web Services