The following bug has been logged on the website:
Bug reference: 18433
Logged by: Kostiantyn
Email address: tomahkvt@gmail.com
PostgreSQL version: 13.14
Operating system: AWS RDS
Description:
On Postgresql 10 we used the following approach for the big tables:
1) Download schema from the source database instance
2) Deleted PK, FK, and Indexes for tables bigger than 100Gb from the
schema
3)Upload the schema to the destination DB.
4) Configure Logical replication between source and destination DB.
5) When last table logical replication table synchronization worker for
subscription "db_name_public_subscription", table "table_name" has
finished
6) we created all the necessary PK, FK, and Indexes.
This approach allowed to us upload data more quickly. This approach was
working great on PostgreSQL 10.
We tried the same approach for Postgresql13, but we got an error.
1) Download schema from the source database instance
2) Deleted PK, FK, and Indexes for tables bigger than 100Gb from the
schema
3)Upload the schema to the destination DB.
4) configurated identity replication full at source DB for tables bigger
than 100Gb
5) Configured Logical replication between source and destination DB.
6) During catchup on this big table process we got the following
messages:
Source DB
2024-04-08 15:38:34 UTC:(27994):replication_role@:[22047]:LOG: terminating
walsender process due to replication timeout
2024-04-08 15:38:34 UTC:(27994):replication_role@:[22047]:CONTEXT: slot
"db_name_public_subscription", output plugin "pgoutput", in the begin
callback, associated LSN 13705/2E913050
2024-04-08 15:38:34 UTC:(27994):replication_role@:[22047]:STATEMENT:
START_REPLICATION SLOT "db_name_public_subscription" LOGICAL 13702/C2C8FB30
(proto_version '1', publication_names '"db_name_public_publication"')
2024-04-08 15:38:34 UTC:(36862):replication_role@:[22811]:LOG: terminating
walsender process due to replication timeout
2024-04-08 15:38:34 UTC:(36862):replication_role@:[22811]:CONTEXT: slot
"db_name_public_subscription_18989108_sync_17127", output plugin "pgoutput",
in the begin callback, associated LSN 13703/27942B70
2024-04-08 15:38:34 UTC:(36862):replication_role@:[22811]:STATEMENT:
START_REPLICATION SLOT "db_name_public_subscription_18989108_sync_17127"
LOGICAL 13703/17622B58 (proto_version '1', publication_names
'"db_name_public_publication"')
One important point. If there is no request to source DB logical replication
works fine for big tables.
I saw the messages in PostgreSQL bugs like
https://www.postgresql.org/message-id/flat/718213.1601410160%40sss.pgh.pa.us#7e61dd07661901b505bcbd74ce5f5f28
But I also did some tests and increased wal_keep_size
and max_slot_wal_keep_size to 1GB. And I set wal_sender_timeout to 1h but
without success. The setup works in PG 13 only with a small amount of data.