Re: Excessive number of replication slots for 12->14 logical replication - Mailing list pgsql-bugs
From | hubert depesz lubaczewski |
---|---|
Subject | Re: Excessive number of replication slots for 12->14 logical replication |
Date | |
Msg-id | 20220718094330.GA24063@depesz.com Whole thread Raw |
In response to | Re: Excessive number of replication slots for 12->14 logical replication (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Excessive number of replication slots for 12->14 logical replication
|
List | pgsql-bugs |
On Mon, Jul 18, 2022 at 09:07:35AM +0530, Amit Kapila wrote: > As per my understanding, each subscription will use three slots (one > for apply worker and one for each of the sync workers) in your case in > the sync phase. So, in your case, if you have 10 subscriptions then > ideally it should be no more than 30. Can you please check and share > the subscriber logs to see if there are any errors in the initial sync > phase? Also, please confirm the number of subscriptions you create on > the subscriber? Ok. In next test I did: @09:21:16 AM UTC, on source All the "create publication" for focal14_1 to focal14_10 sets. @09:21:53 (immediately after), on recipient: create subscription focal14_1 connection 'host=10.10.10.10 port=5432 user=upgrayedd dbname=dbname' publication focal14_1; create subscription focal14_2 connection 'host=10.10.10.10 port=5432 user=upgrayedd dbname=dbname' publication focal14_2; create subscription focal14_3 connection 'host=10.10.10.10 port=5432 user=upgrayedd dbname=dbname' publication focal14_3; create subscription focal14_4 connection 'host=10.10.10.10 port=5432 user=upgrayedd dbname=dbname' publication focal14_4; create subscription focal14_5 connection 'host=10.10.10.10 port=5432 user=upgrayedd dbname=dbname' publication focal14_5; create subscription focal14_6 connection 'host=10.10.10.10 port=5432 user=upgrayedd dbname=dbname' publication focal14_6; create subscription focal14_7 connection 'host=10.10.10.10 port=5432 user=upgrayedd dbname=dbname' publication focal14_7; create subscription focal14_8 connection 'host=10.10.10.10 port=5432 user=upgrayedd dbname=dbname' publication focal14_8; create subscription focal14_9 connection 'host=10.10.10.10 port=5432 user=upgrayedd dbname=dbname' publication focal14_9; create subscription focal14_10 connection 'host=10.10.10.10 port=5432 user=upgrayedd dbname=dbname' publication focal14_10; This script has ened at 09:21:56 First error: #v+ 2022-07-18 09:22:07.046 UTC,,,4145917,,62d5263f.3f42fd,2,,2022-07-18 09:22:07 UTC,28/21641,1219146,ERROR,53400,"could notfind free replication state slot for replication origin with OID 51",,"Increase max_replication_slots and try again.",,,,,,,"","logicalreplication worker",,0 #v- Nothing else errored out before, no warning, no fatals. from the first ERROR I was getting them in the range of 40-70 per minute. At the same time I was logging data from `select now(), * from pg_replication_slots`, every 2 seconds. Then, i grouped it into 4 groups: slots with name focal* with active = true slots with name focal* with active = false slots with name *sync* with active = true slots with name *sync* with active = false How it looked in time: 09:21:54: 5 total; focal: 1 active, 2 inactive; sync: 1 active, 1 inactive 09:21:56: 11 total; focal: 1 active, 9 inactive; sync: 1 active, 0 inactive 09:21:58: 11 total; focal: 1 active, 8 inactive; sync: 1 active, 1 inactive 09:22:00: 23 total; focal: 10 active, 0 inactive; sync: 0 active, 13 inactive 09:22:03: 28 total; focal: 10 active, 0 inactive; sync: 9 active, 9 inactive 09:22:05: 21 total; focal: 10 active, 0 inactive; sync: 7 active, 4 inactive 09:22:07: 20 total; focal: 10 active, 0 inactive; sync: 7 active, 3 inactive 09:22:09: 27 total; focal: 10 active, 0 inactive; sync: 9 active, 8 inactive 09:22:11: 30 total; focal: 10 active, 0 inactive; sync: 10 active, 10 inactive 09:22:13: 49 total; focal: 10 active, 0 inactive; sync: 5 active, 34 inactive 09:22:15: 50 total; focal: 10 active, 0 inactive; sync: 0 active, 40 inactive 09:22:17: 49 total; focal: 10 active, 0 inactive; sync: 0 active, 39 inactive 09:22:19: 44 total; focal: 10 active, 0 inactive; sync: 1 active, 33 inactive 09:22:21: 43 total; focal: 10 active, 0 inactive; sync: 2 active, 31 inactive 09:22:23: 44 total; focal: 10 active, 0 inactive; sync: 1 active, 33 inactive 09:22:25: 43 total; focal: 10 active, 0 inactive; sync: 1 active, 32 inactive So, it looks that there are up to 10 focal slots, all active, and then there are sync slots with weirdly high counts forinactive ones. At most, I had 11 active sync slots. Looks like some kind of timing issue, which would be inline with what Kyotaro Horiguchi wrote initially. The thing is that after some time, the data *gets* replicated, and it seems to be full on publication side, but the errors are unnerving :) depesz
pgsql-bugs by date: