BUG #18433: Logical replication timeout - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #18433: Logical replication timeout
Date
Msg-id 18433-538407560ac0c70b@postgresql.org
Whole thread Raw
Responses Re: BUG #18433: Logical replication timeout
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      18433
Logged by:          Kostiantyn
Email address:      tomahkvt@gmail.com
PostgreSQL version: 13.14
Operating system:   AWS RDS
Description:

On Postgresql 10 we used the following approach for the big tables:
1) Download schema from the source database instance
2) Deleted PK, FK, and Indexes for tables bigger than 100Gb from the
schema
3)Upload the schema to the destination DB.
4) Configure Logical replication between source and destination DB.
5) When last table logical replication table synchronization worker for
subscription "db_name_public_subscription", table "table_name" has
finished
6) we created all the necessary PK, FK, and Indexes.
This approach allowed to us upload data more quickly. This approach was
working  great on PostgreSQL 10.

We tried the same approach for Postgresql13, but we got an error.
1) Download schema from the source database instance
2) Deleted PK, FK, and Indexes for tables bigger than 100Gb from the
schema
3)Upload the schema to the destination DB.
4) configurated identity replication full at source DB for tables bigger
than 100Gb
5) Configured Logical replication between source and destination DB.
6) During catchup on this big table  process we got the following
messages:
Source DB
2024-04-08 15:38:34 UTC:(27994):replication_role@:[22047]:LOG: terminating
walsender process due to replication timeout
2024-04-08 15:38:34 UTC:(27994):replication_role@:[22047]:CONTEXT: slot
"db_name_public_subscription", output plugin "pgoutput", in the begin
callback, associated LSN 13705/2E913050
2024-04-08 15:38:34 UTC:(27994):replication_role@:[22047]:STATEMENT:
START_REPLICATION SLOT "db_name_public_subscription" LOGICAL 13702/C2C8FB30
(proto_version '1', publication_names '"db_name_public_publication"')
2024-04-08 15:38:34 UTC:(36862):replication_role@:[22811]:LOG: terminating
walsender process due to replication timeout
2024-04-08 15:38:34 UTC:(36862):replication_role@:[22811]:CONTEXT: slot
"db_name_public_subscription_18989108_sync_17127", output plugin "pgoutput",
in the begin callback, associated LSN 13703/27942B70
2024-04-08 15:38:34 UTC:(36862):replication_role@:[22811]:STATEMENT:
START_REPLICATION SLOT "db_name_public_subscription_18989108_sync_17127"
LOGICAL 13703/17622B58 (proto_version '1', publication_names
'"db_name_public_publication"')

One important point. If there is no request to source DB logical replication
works fine for big tables.
I saw the messages in PostgreSQL bugs like
https://www.postgresql.org/message-id/flat/718213.1601410160%40sss.pgh.pa.us#7e61dd07661901b505bcbd74ce5f5f28
But I also did some tests and increased wal_keep_size
and max_slot_wal_keep_size to 1GB. And I set wal_sender_timeout to 1h but
without success. The setup works in PG 13 only with a small amount of data.


pgsql-bugs by date:

Previous
From: Richard Guo
Date:
Subject: Re: BUG #18429: Inconsistent results on similar queries with join lateral
Next
From: Tom Lane
Date:
Subject: Re: BUG #18429: Inconsistent results on similar queries with join lateral