BUG #17438: Logical replication hangs on master after huge DB load - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #17438: Logical replication hangs on master after huge DB load
Date
Msg-id 17438-2d4d4d7c6d1e8ec4@postgresql.org
Whole thread Raw
Responses Re: BUG #17438: Logical replication hangs on master after huge DB load  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      17438
Logged by:          Sergey Belyashov
Email address:      sergey.belyashov@gmail.com
PostgreSQL version: 14.2
Operating system:   Debian 11, GNU/Linux x86_64
Description:

Master DB has few tables: A (few inserts per second, about 200 updates per
second, ~100 deletes each 5 minutes), B (~100 inserts each 5 minutes), C
(~200 inserts and ~200 updates per second). B and C are large partitioned by
range tables (36 and 12 partitions). A is small table about 10K entries
(often updates). Table A has publications for inserts and deletes. Table B
has publication for all operations except truncate via root.

I do some maintenance work. I stop production load on DB and do some high
load operations with table C (for example: "insert into D select * from C").
After completion replications for A and B freezes and loads CPU for 50-99%
without actual data transmission. I try to disable/enable/refresh
subscription, but no effect. I try to restart master - no result. Only
drop/create of subscriptions helps me.

Publisher logs many messages like following:
2022-03-14 19:57:02.907 MSK [1771976] user@DB ERROR:  replication slot
"A_sub" is active for PID 1766849
2022-03-14 19:57:02.907 MSK [1771976] user@DB STATEMENT:  START_REPLICATION
SLOT "A_sub" LOGICAL 28C/60150F50 (proto_version '2', publication_names
'"A_pub"')
2022-03-14 19:57:02.909 MSK [1771977] user@DB ERROR:  replication slot
"B_sub" is active for PID 1766828
2022-03-14 19:57:02.909 MSK [1771977] user@DB STATEMENT:  START_REPLICATION
SLOT "B_sub" LOGICAL 28C/AE2B7D8 (proto_version '2', 
publication_names '"B_pub"')

Subscriber logs many messages like following:
2022-03-14 19:56:52.709 MSK [3266082] LOG:  logical replication apply worker
for subscription "B_sub" has started
2022-03-14 19:56:52.710 MSK [993] LOG:  background worker "logical
replication worker" (PID 3266080) exited with exit code 1
2022-03-14 19:56:52.814 MSK [3266081] ERROR:  could not start WAL streaming:
ERROR:  replication slot "A_sub" is active for PID 1766849
2022-03-14 19:56:52.815 MSK [993] LOG:  background worker "logical
replication worker" (PID 3266081) exited with exit code 1
2022-03-14 19:56:52.818 MSK [3266082] ERROR:  could not start WAL streaming:
ERROR:  replication slot "B_sub" is active for PID 1766828
2022-03-14 19:56:52.819 MSK [993] LOG:  background worker "logical
replication worker" (PID 3266082) exited with exit code 1


pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #17385: "RESET transaction_isolation" inside serializable transaction causes Assert at the transaction end
Next
From: Tom Lane
Date:
Subject: Re: BUG #17385: "RESET transaction_isolation" inside serializable transaction causes Assert at the transaction end