Re: Unnecessary delay in streaming replication due to replay lag - Mailing list pgsql-hackers

From sunil s
Subject Re: Unnecessary delay in streaming replication due to replay lag
Date
Msg-id CAOG6S48rsxPkK7wx7wkU0xqJeKO_XS7S+cLiTXpzj0a7VpsC1Q@mail.gmail.com
Whole thread Raw
In response to Re: Unnecessary delay in streaming replication due to replay lag  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
Hello Hackers,

I recently had the opportunity to continue the effort originally led by a valued contributor.
I’ve addressed most of the previously reported feedback and issues, and would like to share the updated patch with the community.

IMHO starting WAL receiver eagerly offers significant advantages because of following reasons

  1. If recovery_min_apply_delay is set high (for various operational reasons) and the primary crashes, the mirror can recover quickly, thereby improving overall High Availability.

  2. For setups without archive-based recovery, restore and recovery operations complete faster.

  3. When synchronous_commit is enabled, faster mirror recovery reduces offline time and helps avoid prolonged commit/query wait times during failover/recovery.

  4. This approach also improves resilience by limiting the impact of network interruptions on replication.


In common cases, I believe archive recovery is faster than
replication. If a segment is available from archive, we don't need to
prefetch it via stream.

I completely agree — restoring from the archive is significantly faster than streaming.
 Attempting to stream from the last available WAL in the archive would introduce complexity and risk. 
Therefore, we can limit this feature to crash recovery scenarios and skip it when archiving is enabled.

The "FATAL: could not open file" message from walreceiver means that
the walreceiver was operationally prohibited to install a new wal
segment at the time.
This was caused by an additional fix added in upstream to address a race condition between the archiver and checkpointer.
It has been resolved in the latest patch, which also includes a TAP test to verify the fix. Thanks for testing and bringing this to our attention.
For now we will skip wal receiver early start since enabling the write access for wal receiver will reintroduce the bug, which the
commit cc2c7d65fc27e877c9f407587b0b92d46cd6dd16 fixed previously.


I've attached the rebased patch with the necessary fix.

Thanks & Regards,
Sunil S (Broadcom)


On Tue, Jul 8, 2025 at 11:01 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
At Wed, 15 Dec 2021 17:01:24 -0800, Soumyadeep Chakraborty <soumyadeep2007@gmail.com> wrote in
> Sure, that makes more sense. Fixed.

As I played with this briefly.  I started a standby from a backup that
has an access to archive.  I had the following log lines steadily.


[139535:postmaster] LOG:  database system is ready to accept read-only connections
[139542:walreceiver] LOG:  started streaming WAL from primary at 0/2000000 on timeline 1
cp: cannot stat '/home/horiguti/data/arc_work/000000010000000000000003': No such file or directory
[139542:walreceiver] FATAL:  could not open file "pg_wal/000000010000000000000003": No such file or directory
cp: cannot stat '/home/horiguti/data/arc_work/00000002.history': No such file or directory
cp: cannot stat '/home/horiguti/data/arc_work/000000010000000000000003': No such file or directory
[139548:walreceiver] LOG:  started streaming WAL from primary at 0/3000000 on timeline 1

The "FATAL:  could not open file" message from walreceiver means that
the walreceiver was operationally prohibited to install a new wal
segment at the time.  Thus the walreceiver ended as soon as started.
In short, the eager replication is not working at all.


I have a comment on the behavior and objective of this feature.

In the case where archive recovery is started from a backup, this
feature lets walreceiver start while the archive recovery is ongoing.
If walreceiver (or the eager replication) worked as expected, it would
write wal files while archive recovery writes the same set of WAL
segments to the same directory. I don't think that is a sane behavior.
Or, if putting more modestly, an unintended behavior.

In common cases, I believe archive recovery is faster than
replication.  If a segment is available from archive, we don't need to
prefetch it via stream.

If this feature is intended to use only for crash recovery of a
standby, it should fire only when it is needed.

If not, that is, if it is intended to work also for archive recovery,
I think the eager replication should start from the next segment of
the last WAL in archive but that would invite more complex problems.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center




Attachment

pgsql-hackers by date:

Previous
From: Nikita Malakhov
Date:
Subject: Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem)
Next
From: Konstantin Knizhnik
Date:
Subject: Re: Logical replication prefetch