Re: speed up a logical replica setup - Mailing list pgsql-hackers

From Euler Taveira
Subject Re: speed up a logical replica setup
Date
Msg-id 2e86cf53-c806-49d2-9a10-1a1257776d1f@app.fastmail.com
Whole thread Raw
In response to Re: speed up a logical replica setup  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: speed up a logical replica setup
List pgsql-hackers
On Mon, Apr 29, 2024, at 6:56 AM, Amit Kapila wrote:
On Wed, Mar 27, 2024 at 1:47 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Tue, Mar 26, 2024, at 4:12 PM, Tomas Vondra wrote:
>
> Perhaps I'm missing something, but why is NUM_CONN_ATTEMPTS even needed?
> Why isn't recovery_timeout enough to decide if wait_for_end_recovery()
> waited long enough?
>
>
> It was an attempt to decoupled a connection failure (that keeps streaming the
> WAL) from recovery timeout. The NUM_CONN_ATTEMPTS guarantees that if the primary
> is gone during the standby recovery process, there is a way to bail out.
>

I think we don't need to check primary if the WAL corresponding to
consistent_lsn is already present on the standby. Shouldn't we first
check that? Once we ensure that the required WAL is copied, just
checking server_is_in_recovery() should be sufficient. I feel that
will be a direct way of ensuring what is required rather than
indirectly verifying the same (by checking pg_stat_wal_receiver) as we
are doing currently.

How would you check it? WAL file? During recovery, you are not allowed to use
pg_current_wal_lsn.

Tomas suggested to me off-list that we should adopt a simple solution in
wait_for_end_recovery: wait for recovery_timeout without additional checks
(which means remove the pg_stat_wal_receiver logic).  When we have additional
information that we can reliably use in this function, we can add it. Hence, it
is also easy to adjust the PG_TEST_TIMEOUT_DEFAULT to have stable tests.


--
Euler Taveira

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: speed up a logical replica setup
Next
From: "Zhijie Hou (Fujitsu)"
Date:
Subject: RE: Synchronizing slots from primary to standby