Thread: Logical Replication speed-up initial data

Logical Replication speed-up initial data

From
Nikhil Shetty
Date:
Hi Team,

We have a highly transactional system as the source of logical replication and the database size is 500GB+. We are replicating all tables from source using logical replication. 

For two tables the initial data load is very slow and it never completes even after 24hrs+ 
Table size is under 100GB and index size is around 400GB. 

How can we increase the speed of the initial data load without dropping the indexes on destination?

We increased max_sync_workers_per_subscription to 3 but it didn't help much for single tables

Thanks,
Nikhil

Re: Logical Replication speed-up initial data

From
Hüseyin Demir
Date:
Hello,

I also faced a similar issue. Try removing the indexes on the destination first if possible. After that, you can add the indexes.

Regards.


Nikhil Shetty <nikhil.dba04@gmail.com>, 4 Ağu 2021 Çar, 18:07 tarihinde şunu yazdı:
Hi Team,

We have a highly transactional system as the source of logical replication and the database size is 500GB+. We are replicating all tables from source using logical replication. 

For two tables the initial data load is very slow and it never completes even after 24hrs+ 
Table size is under 100GB and index size is around 400GB. 

How can we increase the speed of the initial data load without dropping the indexes on destination?

We increased max_sync_workers_per_subscription to 3 but it didn't help much for single tables

Thanks,
Nikhil


--
Hüseyin Demir

Senior Database Platform Engineer

Linkedin: hseyindemir

Re: Logical Replication speed-up initial data

From
Stefano Amoroso
Date:
Hello,
in my experience, to speed up the initial load, I had to drop UKs and FKs.
Unfortunately, the initial load doesn't work in parallel and, for each table, there is only one sync worker.

Regards

Stefano Amoroso

Il giorno mer 4 ago 2021 alle ore 17:24 Hüseyin Demir <demirhuseyinn.94@gmail.com> ha scritto:
Hello,

I also faced a similar issue. Try removing the indexes on the destination first if possible. After that, you can add the indexes.

Regards.


Nikhil Shetty <nikhil.dba04@gmail.com>, 4 Ağu 2021 Çar, 18:07 tarihinde şunu yazdı:
Hi Team,

We have a highly transactional system as the source of logical replication and the database size is 500GB+. We are replicating all tables from source using logical replication. 

For two tables the initial data load is very slow and it never completes even after 24hrs+ 
Table size is under 100GB and index size is around 400GB. 

How can we increase the speed of the initial data load without dropping the indexes on destination?

We increased max_sync_workers_per_subscription to 3 but it didn't help much for single tables

Thanks,
Nikhil


--
Hüseyin Demir

Senior Database Platform Engineer

Linkedin: hseyindemir

Re: Logical Replication speed-up initial data

From
Christophe Pettus
Date:

> On Aug 4, 2021, at 08:06, Nikhil Shetty <nikhil.dba04@gmail.com> wrote:
>
> How can we increase the speed of the initial data load without dropping the indexes on destination?

You can do the usual steps of increasing checkpoint_timeout and max_wal_size (since incoming logical replication
changesare WAL logged) and setting synchronous_commit = off, but those will be modest improvements.  You will get an
enormousbenefit from dropping indexes and foreign key constraints, and those aren't much use during the initial sync
anyway.


Re: Logical Replication speed-up initial data

From
Nikhil Shetty
Date:
Hi,

Thank you for the suggestion.

We tried by dropping indexes and it worked faster compared to what we saw earlier. We wanted to know if anybody has done any other changes that helps speed-up initial data load without dropping indexes.

Thanks,
Nikhil

On Wed, Aug 4, 2021 at 8:54 PM Hüseyin Demir <demirhuseyinn.94@gmail.com> wrote:
Hello,

I also faced a similar issue. Try removing the indexes on the destination first if possible. After that, you can add the indexes.

Regards.


Nikhil Shetty <nikhil.dba04@gmail.com>, 4 Ağu 2021 Çar, 18:07 tarihinde şunu yazdı:
Hi Team,

We have a highly transactional system as the source of logical replication and the database size is 500GB+. We are replicating all tables from source using logical replication. 

For two tables the initial data load is very slow and it never completes even after 24hrs+ 
Table size is under 100GB and index size is around 400GB. 

How can we increase the speed of the initial data load without dropping the indexes on destination?

We increased max_sync_workers_per_subscription to 3 but it didn't help much for single tables

Thanks,
Nikhil


--
Hüseyin Demir

Senior Database Platform Engineer

Linkedin: hseyindemir

Re: Logical Replication speed-up initial data

From
Nikhil Shetty
Date:
Hi Stefano,

Thank you for the information.

Regards,
Nikhil

On Wed, Aug 4, 2021 at 9:25 PM Stefano Amoroso <stefano.amoroso@gmail.com> wrote:
Hello,
in my experience, to speed up the initial load, I had to drop UKs and FKs.
Unfortunately, the initial load doesn't work in parallel and, for each table, there is only one sync worker.

Regards

Stefano Amoroso

Il giorno mer 4 ago 2021 alle ore 17:24 Hüseyin Demir <demirhuseyinn.94@gmail.com> ha scritto:
Hello,

I also faced a similar issue. Try removing the indexes on the destination first if possible. After that, you can add the indexes.

Regards.


Nikhil Shetty <nikhil.dba04@gmail.com>, 4 Ağu 2021 Çar, 18:07 tarihinde şunu yazdı:
Hi Team,

We have a highly transactional system as the source of logical replication and the database size is 500GB+. We are replicating all tables from source using logical replication. 

For two tables the initial data load is very slow and it never completes even after 24hrs+ 
Table size is under 100GB and index size is around 400GB. 

How can we increase the speed of the initial data load without dropping the indexes on destination?

We increased max_sync_workers_per_subscription to 3 but it didn't help much for single tables

Thanks,
Nikhil


--
Hüseyin Demir

Senior Database Platform Engineer

Linkedin: hseyindemir

Re: Logical Replication speed-up initial data

From
Rick Otten
Date:

On Thu, Aug 5, 2021 at 12:57 AM Nikhil Shetty <nikhil.dba04@gmail.com> wrote:
Hi,

Thank you for the suggestion.

We tried by dropping indexes and it worked faster compared to what we saw earlier. We wanted to know if anybody has done any other changes that helps speed-up initial data load without dropping indexes.

It would be kind of cool if the database could just "know" that it was an initial load and automatically suppress FK checks and index updates until the load is done.  Once complete it would go back and concurrently rebuild the indexes and validate the FK's.   Then you wouldn't have to manually drop all of your indexes and add them back and hope you got them all, and got them right.

 

Re: Logical Replication speed-up initial data

From
Vijaykumar Jain
Date:
On Thu, 5 Aug 2021 at 10:27, Nikhil Shetty <nikhil.dba04@gmail.com> wrote:
Hi,

Thank you for the suggestion.

We tried by dropping indexes and it worked faster compared to what we saw earlier. We wanted to know if anybody has done any other changes that helps speed-up initial data load without dropping indexes.


PS: i have not tested this in production level loads, it was just some exp i did on my laptop.

one option would be to use pglogical extension (this was shared by Dharmendra in one the previous mails, sharing the same),
and then use pglogical_create_subscriber cli to create the initial copy via pgbasebackup and then carry on from there.
I ran the test case similar to one below in my local env, and it seems to work fine. of course i do not have TB worth of load to test, but it looks promising,
especially since they introduced it to the core.
Once you attain some reasonable sync state, you can drop the pglogical extension, and check if things continue fine.
I have done something similar when upgrading from 9.6 to 11 using pglogical and then dropping the extension and it was smooth,
maybe you need to try this out and share if things works fine.
and 

Re: Logical Replication speed-up initial data

From
Avinash Kumar
Date:
Hi, 

On Thu, Aug 5, 2021 at 11:28 AM Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:
On Thu, 5 Aug 2021 at 10:27, Nikhil Shetty <nikhil.dba04@gmail.com> wrote:
Hi,

Thank you for the suggestion.

We tried by dropping indexes and it worked faster compared to what we saw earlier. We wanted to know if anybody has done any other changes that helps speed-up initial data load without dropping indexes.


You could leverage pg_basbeackup or pg_dump with parallel jobs 
taken from a Standby (preferably replication paused if pg_dump, anyways 
pg_basebackup should be straight-forward) or taken even from 
Primary, for the purpose of initial data load. 

As you are able to drop indexes and make some schema changes, I would 
assume that you could pause your app temporarily. If that's the case
you may look into the simple steps i am posting here that demonstrates 
pg_dump/pg_restore instead. 

If you cannot pause the app, then, you could look into how you  

Step 1 : Pause App 
Step 2 : Create Publication on the Primary CREATE PUBLICATION <some_pub_name> FOR ALL TABLES;
Step 3 : Create Logical Replication Slot on the Primary SELECT * FROM pg_create_logical_replication_slot('<some_slot_name>', 'pgoutput'); Step 4 : Create Subscription but do not enable the Subscription
CREATE SUBSCRIPTION <some_sub_name> CONNECTION
'host=<some_host> dbname=<some_db> user=postgres
password=secret port=5432' PUBLICATION <some_pub_name>
WITH (copy_data = false, create_slot=false, enabled=false,
slot_name=<some_slot_name>);

Step 5 : Initiate pg_dump. We can take a parallel backup for a faster restore.

$ pg_dump -d <some_db> -Fd -j 4 -n <some_schema> -f <some_unique_directory> -- If its several hundreds of GBs or TBs, you may rather utilize one of your Standby that has been paused from replication using -> select pg_wal_replay_pause();

Step 6 : Don't need to wait until pg_dump completes, you may start the App. 
-- Hope the app does not perform changes that impact the pg_dump or
gets blocked due to pg_dump. 
Step 7 : Restore the dump if you used pg_dump. 
pg_restore -d <some_db> -j <some_numer_of_parallel_jobs> <some_directory> Step 8 : Enable subscription.
ALTER SUBSCRIPTION <some_sub_name> ENABLE;

If you have not stopped your app then you must advance the lsn using 
pg_replication_origin_advance 

These are all hand-written steps while drafting this email, so, 
please test it on your end as some typos or adjustments are definitely expected.

PS: i have not tested this in production level loads, it was just some exp i did on my laptop.

one option would be to use pglogical extension (this was shared by Dharmendra in one the previous mails, sharing the same),
and then use pglogical_create_subscriber cli to create the initial copy via pgbasebackup and then carry on from there.
I ran the test case similar to one below in my local env, and it seems to work fine. of course i do not have TB worth of load to test, but it looks promising,
especially since they introduced it to the core.
Once you attain some reasonable sync state, you can drop the pglogical extension, and check if things continue fine.
I have done something similar when upgrading from 9.6 to 11 using pglogical and then dropping the extension and it was smooth,
maybe you need to try this out and share if things works fine.
and 



--
Regards,
Avinash Vallarapu (Avi)
CEO,
MigOps, Inc. 

Re: Logical Replication speed-up initial data

From
Nikhil Shetty
Date:
Hi Avinash,

Thank you for the detailed explanation.

Indexes were dropped on the destination to increase initial data load speed. We cannot stop the App on source and it is highly transactional. 
I had thought about this method but I am not sure after the pg_restore from where the logical replication will be started, we cannot afford to lose any data. 

I will give this method a test though and check how it works. 

Thanks,
Nikhil

On Thu, Aug 5, 2021 at 8:42 PM Avinash Kumar <avinash.vallarapu@gmail.com> wrote:
Hi, 

On Thu, Aug 5, 2021 at 11:28 AM Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:
On Thu, 5 Aug 2021 at 10:27, Nikhil Shetty <nikhil.dba04@gmail.com> wrote:
Hi,

Thank you for the suggestion.

We tried by dropping indexes and it worked faster compared to what we saw earlier. We wanted to know if anybody has done any other changes that helps speed-up initial data load without dropping indexes.


You could leverage pg_basbeackup or pg_dump with parallel jobs 
taken from a Standby (preferably replication paused if pg_dump, anyways 
pg_basebackup should be straight-forward) or taken even from 
Primary, for the purpose of initial data load. 

As you are able to drop indexes and make some schema changes, I would 
assume that you could pause your app temporarily. If that's the case
you may look into the simple steps i am posting here that demonstrates 
pg_dump/pg_restore instead. 

If you cannot pause the app, then, you could look into how you  

Step 1 : Pause App 
Step 2 : Create Publication on the Primary CREATE PUBLICATION <some_pub_name> FOR ALL TABLES;
Step 3 : Create Logical Replication Slot on the Primary SELECT * FROM pg_create_logical_replication_slot('<some_slot_name>', 'pgoutput'); Step 4 : Create Subscription but do not enable the Subscription
CREATE SUBSCRIPTION <some_sub_name> CONNECTION
'host=<some_host> dbname=<some_db> user=postgres
password=secret port=5432' PUBLICATION <some_pub_name>
WITH (copy_data = false, create_slot=false, enabled=false,
slot_name=<some_slot_name>);

Step 5 : Initiate pg_dump. We can take a parallel backup for a faster restore.

$ pg_dump -d <some_db> -Fd -j 4 -n <some_schema> -f <some_unique_directory> -- If its several hundreds of GBs or TBs, you may rather utilize one of your Standby that has been paused from replication using -> select pg_wal_replay_pause();

Step 6 : Don't need to wait until pg_dump completes, you may start the App. 
-- Hope the app does not perform changes that impact the pg_dump or
gets blocked due to pg_dump. 
Step 7 : Restore the dump if you used pg_dump. 
pg_restore -d <some_db> -j <some_numer_of_parallel_jobs> <some_directory> Step 8 : Enable subscription.
ALTER SUBSCRIPTION <some_sub_name> ENABLE;

If you have not stopped your app then you must advance the lsn using 
pg_replication_origin_advance 

These are all hand-written steps while drafting this email, so, 
please test it on your end as some typos or adjustments are definitely expected.

PS: i have not tested this in production level loads, it was just some exp i did on my laptop.

one option would be to use pglogical extension (this was shared by Dharmendra in one the previous mails, sharing the same),
and then use pglogical_create_subscriber cli to create the initial copy via pgbasebackup and then carry on from there.
I ran the test case similar to one below in my local env, and it seems to work fine. of course i do not have TB worth of load to test, but it looks promising,
especially since they introduced it to the core.
Once you attain some reasonable sync state, you can drop the pglogical extension, and check if things continue fine.
I have done something similar when upgrading from 9.6 to 11 using pglogical and then dropping the extension and it was smooth,
maybe you need to try this out and share if things works fine.
and 



--
Regards,
Avinash Vallarapu (Avi)
CEO,
MigOps, Inc. 

Re: Logical Replication speed-up initial data

From
Nikhil Shetty
Date:
Hi Vijaykumar,

Thanks for the details.
In this method you are saying the pg_basebackup will make the initial load faster ? 
We intend to bring only a few tables. Using pg_basebackup will clone an entire instance. 

Thanks,
Nikhil



On Thu, Aug 5, 2021 at 7:57 PM Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:
On Thu, 5 Aug 2021 at 10:27, Nikhil Shetty <nikhil.dba04@gmail.com> wrote:
Hi,

Thank you for the suggestion.

We tried by dropping indexes and it worked faster compared to what we saw earlier. We wanted to know if anybody has done any other changes that helps speed-up initial data load without dropping indexes.


PS: i have not tested this in production level loads, it was just some exp i did on my laptop.

one option would be to use pglogical extension (this was shared by Dharmendra in one the previous mails, sharing the same),
and then use pglogical_create_subscriber cli to create the initial copy via pgbasebackup and then carry on from there.
I ran the test case similar to one below in my local env, and it seems to work fine. of course i do not have TB worth of load to test, but it looks promising,
especially since they introduced it to the core.
Once you attain some reasonable sync state, you can drop the pglogical extension, and check if things continue fine.
I have done something similar when upgrading from 9.6 to 11 using pglogical and then dropping the extension and it was smooth,
maybe you need to try this out and share if things works fine.
and 

Re: Logical Replication speed-up initial data

From
Vijaykumar Jain
Date:
On Fri, 6 Aug 2021 at 00:15, Nikhil Shetty <nikhil.dba04@gmail.com> wrote:
Hi Vijaykumar,

Thanks for the details.
In this method you are saying the pg_basebackup will make the initial load faster ?  
We intend to bring only a few tables. Using pg_basebackup will clone an entire instance. 
 
yeah. In that case, this will not be useful. I assumed you wanted all tables.



Re: Logical Replication speed-up initial data

From
Jeff Janes
Date:
On Thu, Aug 5, 2021 at 12:57 AM Nikhil Shetty <nikhil.dba04@gmail.com> wrote:
Hi,

Thank you for the suggestion.

We tried by dropping indexes and it worked faster compared to what we saw earlier. We wanted to know if anybody has done any other changes that helps speed-up initial data load without dropping indexes.

If index maintenance is the bottleneck, nothing but dropping the indexes is likely to be very effective.  Just make sure not to drop the replica identity index.  If you do that, then the entire sync will abort and rollback once it gets to the end, if the master had had any UPDATE or DELETE activity on that table during the sync period.  (v14 will remove that problem--replication still won't proceed until you have the index, but previous synced work will not be lost while it waits for you to build the index.)

Syncing with the index still in place might go faster if shared_buffers is large enough to hold the entire incipient index(es) simultaneously.  It might be worthwhile to make shared_buffers be a large fraction of RAM (like 90%) if doing so will enable the entire index to fit into shared_buffers and if nothing else significant is running on the server.  You probably wouldn't want that as a permanent setting though.

Cheers,

Jeff