Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Synchronizing slots from primary to standby
Date
Msg-id CALj2ACVvM8omB6FsukxGJeRT0x8zH9VQO-3hHz+WiMCAZzFJeQ@mail.gmail.com
Whole thread Raw
In response to Re: Synchronizing slots from primary to standby  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Synchronizing slots from primary to standby  ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>)
List pgsql-hackers
On Mon, Jul 24, 2023 at 9:00 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > 2. All candidate standbys will start one slot sync worker per logical
> > slot which might not be scalable.
>
> Yeah, that doesn't sound like a good idea but IIRC, the proposed patch
> is using one worker per database (for all slots corresponding to a
> database).

Right. It's based on one worker for each database.

> > Is having one (or a few more - not
> > necessarily one for each logical slot) worker for all logical slots
> > enough?
>
> I guess for a large number of slots the is a possibility of a large
> gap in syncing the slots which probably means we need to retain
> corresponding WAL for a much longer time on the primary. If we can
> prove that the gap won't be large enough to matter then this would be
> probably worth considering otherwise, I think we should find a way to
> scale the number of workers to avoid the large gap.

I think the gap is largely determined by the time taken to advance
each slot and the amount of WAL that each logical slot moves ahead on
primary. I've measured the time it takes for
pg_logical_replication_slot_advance with different amounts WAL on my
system. It took 2595ms/5091ms/31238ms to advance the slot by
3.7GB/7.3GB/13GB respectively. To put things into perspective here,
imagine there are 3 logical slots to sync for a single slot sync
worker and each of them are in need of advancing the slot by
3.7GB/7.3GB/13GB of WAL. The slot sync worker gets to slot 1 again
after 2595ms+5091ms+31238ms (~40sec), gets to slot 2 again after
advance time of slot 1 with amount of WAL that the slot has moved
ahead on primary during 40sec, gets to slot 3 again after advance time
of slot 1 and slot 2 with amount of WAL that the slot has moved ahead
on primary and so on. If WAL generation on the primary is pretty fast,
and if the logical slot moves pretty fast on the primary, the time it
takes for a single sync worker to sync a slot can increase.

Now, let's think what happens if there's a large gap, IOW, a logical
slot on standby is behind X amount of WAL from that of the logical
slot on primary. The standby needs to retain more WAL for sure. IIUC,
primary doesn't need to retain the WAL required for a logical slot on
standby, no?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Anthonin Bonnefoy
Date:
Subject: Re: POC: Extension for adding distributed tracing - pg_tracing
Next
From: Melanie Plageman
Date:
Subject: Eager page freeze criteria clarification