Re: speed up a logical replica setup - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: speed up a logical replica setup
Date
Msg-id CAA4eK1JpKMRGhju=spNmZM_yTNGM12EWqNjaHy6=2t6LaQs2hg@mail.gmail.com
Whole thread Raw
In response to speed up a logical replica setup  ("Euler Taveira" <euler@eulerto.com>)
List pgsql-hackers
On Mon, Feb 21, 2022 at 5:41 PM Euler Taveira <euler@eulerto.com> wrote:
>
> Logical replication has been used to migration with minimal downtime. However,
> if you are dealing with a big database, the amount of required resources (disk
> -- due to WAL retention) increases as the backlog (WAL) increases. Unless you
> have a generous amount of resources and can wait for long period of time until
> the new replica catches up, creating a logical replica is impracticable on
> large databases.
>
> The general idea is to create and convert a physical replica or a base backup
> (archived WAL files available) into a logical replica. The initial data copy
> and catchup tends to be faster on a physical replica. This technique has been
> successfully used in pglogical_create_subscriber [1].
>

Sounds like a promising idea.

> A new tool called pg_subscriber does this conversion and is tightly integrated
> with Postgres.
>
> DESIGN
>
> The conversion requires 8 steps.
>
> 1. Check if the target data directory has the same system identifier than the
> source data directory.
> 2. Stop the target server if it is running as a standby server. (Modify
> recovery parameters requires a restart.)
> 3. Create one replication slot per specified database on the source server. One
> additional replication slot is created at the end to get the consistent LSN
> (This consistent LSN will be used as (a) a stopping point for the recovery
> process and (b) a starting point for the subscriptions).
>

What is the need to create an extra slot other than the slot for each
database? Can't we use the largest LSN returned by slots as the
recovery-target-lsn and starting point for subscriptions?

How, these additional slots will get freed or reused when say the
server has crashed/stopped after creating the slots but before
creating the subscriptions? Users won't even know the names of such
slots as they are internally created.

> 4. Write recovery parameters into the target data directory and start the
> target server (Wait until the target server is promoted).
> 5. Create one publication (FOR ALL TABLES) per specified database on the source
> server.
> 6. Create one subscription per specified database on the target server (Use
> replication slot and publication created in a previous step. Don't enable the
> subscriptions yet).
> 7. Sets the replication progress to the consistent LSN that was got in a
> previous step.
> 8. Enable the subscription for each specified database on the target server.
>
> This tool does not take a base backup. It can certainly be included later.
> There is already a tool do it: pg_basebackup.
>

The backup will take the backup of all the databases present on the
source server. Do we need to provide the way/recommendation to remove
the databases that are not required?

Can we see some numbers with various sizes of databases (cluster) to
see how it impacts the time for small to large size databases as
compared to the traditional method? This might help giving users
advice on when to use this tool?


-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Sadhuprasad Patro
Date:
Subject: Re: Per-table storage parameters for TableAM/IndexAM extensions
Next
From: "tanghy.fnst@fujitsu.com"
Date:
Subject: RE: Design of pg_stat_subscription_workers vs pgstats