Re: speed up a logical replica setup - Mailing list pgsql-hackers

From Euler Taveira
Subject Re: speed up a logical replica setup
Date
Msg-id b1f0f8c7-8f01-4950-af77-339df3dc4684@app.fastmail.com
Whole thread Raw
In response to Re: speed up a logical replica setup  (Alexander Lakhin <exclusion@gmail.com>)
Responses Re: speed up a logical replica setup
RE: speed up a logical replica setup
List pgsql-hackers
On Thu, Jul 11, 2024, at 2:00 PM, Alexander Lakhin wrote:
Hello Amit and Hou-San,

11.07.2024 13:21, Amit Kapila wrote:
>
> We don't wait for the xmin to catch up corresponding to this insert
> and I don't know if there is a way to do that. So, we should move this
> Insert to after the call to pg_sync_replication_slots(). It won't
> impact the general test of pg_createsubscriber.
>
> Thanks to Hou-San for helping me in the analysis of this BF failure.

Thank you for investigating that issue!

May I ask you to look at another failure of the test occurred today [1]?


Thanks for the report!

You are observing the same issue that Amit explained in [1]. The
pg_create_logical_replication_slot returns the EndRecPtr (see
slot->data.confirmed_flush in DecodingContextFindStartpoint()). EndRecPtr points
to the next record and it is a future position for an idle server. That's why
the recovery takes some time to finish because it is waiting for an activity to
increase the LSN position. Since you modified LOG_SNAPSHOT_INTERVAL_MS to create
additional WAL records soon, the EndRecPtr position is reached rapidly and the
recovery ends quickly.

Hayato proposes a patch [2] to create an additional WAL record that has the same
effect from you little hack: increase the LSN position to allow the recovery
finishes soon. I don't like the solution although it seems simple to implement.
As Amit said if we know the ReadRecPtr, we could use it as consistent LSN. The
problem is that it is used by logical decoding but it is not exposed. [reading
the code...] When the logical replication slot is created, restart_lsn points to
the lastReplayedEndRecPtr (see ReplicationSlotReserveWal()) that is the last
record replayed. Since the replication slots aren't in use, we could use the
restart_lsn from the last replication slot as a consistent LSN.

I'm attaching a patch that implements it.It runs in 6s instead of 26s.




--
Euler Taveira

Attachment

pgsql-hackers by date:

Previous
From: Paul George
Date:
Subject: Re: Eager aggregation, take 3
Next
From: Noah Misch
Date:
Subject: Re: Use read streams in CREATE DATABASE command when the strategy is wal_log