Hello Hackers,
I recently had the opportunity to continue the effort originally led by a valued contributor.
I’ve addressed most of the previously reported feedback and issues, and would like to share the updated patch with the community.
IMHO starting WAL receiver eagerly offers significant advantages because of following reasons
If recovery_min_apply_delay
is set high (for various operational reasons) and the primary crashes, the mirror can recover quickly, thereby improving overall High Availability.
For setups without archive-based recovery, restore and recovery operations complete faster.
When synchronous_commit
is enabled, faster mirror recovery reduces offline time and helps avoid prolonged commit/query wait times during failover/recovery.
This approach also improves resilience by limiting the impact of network interruptions on replication.
>
In common cases, I believe archive recovery is faster thanreplication. If a segment is available from archive, we don't need toprefetch it via stream.
I completely agree — restoring from the archive is significantly faster than streaming.
Attempting to stream from the last available WAL in the archive would introduce complexity and risk.
Therefore, we can limit this feature to crash recovery scenarios and skip it when archiving is enabled.
> The "FATAL: could not open file" message from walreceiver means that
the walreceiver was operationally prohibited to install a new walsegment at the time.This was caused by an additional fix added in upstream to address a race condition between the archiver and checkpointer.
It has been resolved in the latest patch, which also includes a TAP test to verify the fix. Thanks for testing and bringing this to our attention.
For now we will skip wal receiver early start since enabling the write access for wal receiver will reintroduce the bug, which the
commit c
c2c7d65fc27e877c9f407587b0b92d46cd6dd16 fixed previously.
I've attached the rebased patch with the necessary fix.