Home > mailing lists

Fwd: restore_command on high-throughput cluster never switches to streaming replication - Mailing list pgsql-general

From	Kasper Føns
Subject	Fwd: restore_command on high-throughput cluster never switches to streaming replication
Date	December 1, 2025 12:49:42
Msg-id	CANOng2i6xLa-FsN1B_rZFpW807GrV3YUJVgDM3nqJEj1gCk2dg@mail.gmail.com Whole thread
List	pgsql-general

Tree view

Hi PostgreSQL community.

I debugged an instance where a PostgreSQL standby would not switch to streaming replication when the `restore_command` fails.

I first posted this to pgsql-admin mailing list, but now trying here as I got no response.

Expectation

I expect PostgreSQL to try switching to streaming replication if the `restore_command` fails.

What happens

PostgreSQL attempts to restore the previously restored WAL segment and then retries the failed segment. However, because the primary produces WAL at a high rate, the WAL file now exists and PostgreSQL does not try to switch to streaming replication.

Context

Running PostgreSQL 15.7 in Kubernetes using CloudNative PostgreSQL Operator.

Logs

I configured PostgreSQL to emit DEBUG3 level logs. Newest logs first, oldest last.

got WAL segment from archive
executing restore command "/controller/manager wal-restore --log-destination /controller/log/postgres.json 000000410000A7BA00000058 pg_wal/RECOVERYXLOG"
got WAL segment from archive
executing restore command "/controller/manager wal-restore --log-destination /controller/log/postgres.json 000000410000A7BA00000057 pg_wal/RECOVERYXLOG"
could not open file "pg_wal/000000410000A7BA00000058": No such file or directory
could not restore file "000000410000A7BA00000058" from archive: child process exited with exit code 1
executing restore command "/controller/manager wal-restore --log-destination /controller/log/postgres.json 000000410000A7BA00000058 pg_wal/RECOVERYXLOG"
got WAL segment from archive
executing restore command "/controller/manager wal-restore --log-destination /controller/log/postgres.json 000000410000A7BA00000057 pg_wal/RECOVERYXLOG"

Notice that when 000000410000A7BA00000058 failed, PostgreSQL asked for 000000410000A7BA00000057 which it had already restored. Aftwards, it asks about 000000410000A7BA00000058 once again.

Problem

This is problematic because the standby will never switch to streaming replication.

Workaround

We can get the PostgreSQL replica to become in-sync if we change the command to `/bin/false` when we are withing `wal_keep_size`.

Question

Is this the expected behaviour?

I expect the function `WaitForWALToBecomeAvailable` to switch to streaming replication once a single `restore_command` fails. This also happens when `/bin/false` is used instead.

Any help would be greatly appreciated

/Kasper Føns

pgsql-general by date:

From: Adrian Klaver
Date: 01 December 2025, 06:23:12
Subject: Re: Check whether a NOT NULL check constraint has been validated

From: hubert depesz lubaczewski
Date: 01 December 2025, 13:10:14
Subject: Re: How to use index in simple select

Fwd: restore_command on high-throughput cluster never switches to streaming replication - Mailing list pgsql-general

Previous

Next