Re: Requested WAL segment xxx has already been removed - Mailing list pgsql-hackers

From wenhui qiu
Subject Re: Requested WAL segment xxx has already been removed
Date
Msg-id CAGjGUAJcUZhzH68nh8qLTXF-OLLFyU+RF70MjDySzJ9=LFpsEg@mail.gmail.com
Whole thread Raw
In response to Re: Requested WAL segment xxx has already been removed  (Alexander Kukushkin <cyberdemn@gmail.com>)
List pgsql-hackers
HI 
>What is really painful right now, logical walsenders can only look into pg_wal, and unfortunately replication slots don't give 100% guarantee for WAL >retention because of max_slot_wal_keep_size.
>That is, using restore_command for logical walsenders would be really helpful and solve some problems and pain points with logical replication.
restore_command needs to be realized with the help of ssh or nfs shared storage,most companies  due to the requirement of security audit, it is not possible to establish ssh mutual trust.It would be very convenient if this feature was implemented


Thanks

On Tue, Jul 15, 2025 at 5:24 PM Alexander Kukushkin <cyberdemn@gmail.com> wrote:
Hi,

On Mon, 14 Jul 2025 at 11:24, Japin Li <japinli@hotmail.com> wrote:
The configuration is as expected. My test script simulates two distinct hosts
by utilizing local archive storage.

For physical replication across distinct hosts without shared WAL archive
storage, WALs are archived locally (in my test).

When the primary's walsender needs a WAL file from the archive that's not in
its pg_wal directory, manual copying is required to the primary's pg_wal or the
standby's pg_wal (or its archive directory, and use restore_command to fetch it).

What prevents us from using the primary's restore_command to retrieve the
necessary WALs?

I am just talking about the practical side of local archive storage.
Such archives will be gone along with the server in case of disaster and therefore they bring only a little value.
With the same success, physical standby can use restore_command to copy files from the archive on the primary via ssh/rsync or similar. This approach is used for ages and works just fine.

What is really painful right now, logical walsenders can only look into pg_wal, and unfortunately replication slots don't give 100% guarantee for WAL retention because of max_slot_wal_keep_size.
That is, using restore_command for logical walsenders would be really helpful and solve some problems and pain points with logical replication.

However, if we start calling restore_command also for physical walsenders it might result in increased resource usage on primary without providing much additional value. For example, restore_command is failing, but standby indefinitely continues making replication connection attempts.

I don't mind if it will also work for physical replication, but IMO there should be a possibility to opt out from it.

Regards,
--
Alexander Kukushkin

pgsql-hackers by date:

Previous
From: "cca5507"
Date:
Subject: Logical replication launcher did not automatically restart when got SIGKILL
Next
From: Jehan-Guillaume de Rorthais
Date:
Subject: Re: Disable parallel query by default