Re: Unnecessary delay in streaming replication due to replay lag - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Unnecessary delay in streaming replication due to replay lag
Date
Msg-id 20211216.190519.572575190462409312.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Unnecessary delay in streaming replication due to replay lag  (Soumyadeep Chakraborty <soumyadeep2007@gmail.com>)
List pgsql-hackers
At Wed, 15 Dec 2021 17:01:24 -0800, Soumyadeep Chakraborty <soumyadeep2007@gmail.com> wrote in 
> Sure, that makes more sense. Fixed.

As I played with this briefly.  I started a standby from a backup that
has an access to archive.  I had the following log lines steadily.


[139535:postmaster] LOG:  database system is ready to accept read-only connections
[139542:walreceiver] LOG:  started streaming WAL from primary at 0/2000000 on timeline 1
cp: cannot stat '/home/horiguti/data/arc_work/000000010000000000000003': No such file or directory
[139542:walreceiver] FATAL:  could not open file "pg_wal/000000010000000000000003": No such file or directory
cp: cannot stat '/home/horiguti/data/arc_work/00000002.history': No such file or directory
cp: cannot stat '/home/horiguti/data/arc_work/000000010000000000000003': No such file or directory
[139548:walreceiver] LOG:  started streaming WAL from primary at 0/3000000 on timeline 1

The "FATAL:  could not open file" message from walreceiver means that
the walreceiver was operationally prohibited to install a new wal
segment at the time.  Thus the walreceiver ended as soon as started.
In short, the eager replication is not working at all.


I have a comment on the behavior and objective of this feature.

In the case where archive recovery is started from a backup, this
feature lets walreceiver start while the archive recovery is ongoing.
If walreceiver (or the eager replication) worked as expected, it would
write wal files while archive recovery writes the same set of WAL
segments to the same directory. I don't think that is a sane behavior.
Or, if putting more modestly, an unintended behavior.

In common cases, I believe archive recovery is faster than
replication.  If a segment is available from archive, we don't need to
prefetch it via stream.

If this feature is intended to use only for crash recovery of a
standby, it should fire only when it is needed.

If not, that is, if it is intended to work also for archive recovery,
I think the eager replication should start from the next segment of
the last WAL in archive but that would invite more complex problems.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: pg_dump: Refactor getIndexes()
Next
From: Shay Rojansky
Date:
Subject: Re: Privilege required for IF EXISTS event if the object already exists