Re: Excessive number of replication slots for 12->14 logical replication - Mailing list pgsql-bugs

From hubert depesz lubaczewski
Subject Re: Excessive number of replication slots for 12->14 logical replication
Date
Msg-id 20220718094330.GA24063@depesz.com
Whole thread Raw
In response to Re: Excessive number of replication slots for 12->14 logical replication  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Excessive number of replication slots for 12->14 logical replication
List pgsql-bugs
On Mon, Jul 18, 2022 at 09:07:35AM +0530, Amit Kapila wrote:
> As per my understanding, each subscription will use three slots (one
> for apply worker and one for each of the sync workers) in your case in
> the sync phase. So, in your case, if you have 10 subscriptions then
> ideally it should be no more than 30. Can you please check and share
> the subscriber logs to see if there are any errors in the initial sync
> phase? Also, please confirm the number of subscriptions you create on
> the subscriber?

Ok. In next test I did:

@09:21:16 AM UTC, on source
All the "create publication" for focal14_1 to focal14_10 sets.

@09:21:53 (immediately after), on recipient:
create subscription focal14_1 connection 'host=10.10.10.10 port=5432 user=upgrayedd dbname=dbname' publication
focal14_1;
create subscription focal14_2 connection 'host=10.10.10.10 port=5432 user=upgrayedd dbname=dbname' publication
focal14_2;
create subscription focal14_3 connection 'host=10.10.10.10 port=5432 user=upgrayedd dbname=dbname' publication
focal14_3;
create subscription focal14_4 connection 'host=10.10.10.10 port=5432 user=upgrayedd dbname=dbname' publication
focal14_4;
create subscription focal14_5 connection 'host=10.10.10.10 port=5432 user=upgrayedd dbname=dbname' publication
focal14_5;
create subscription focal14_6 connection 'host=10.10.10.10 port=5432 user=upgrayedd dbname=dbname' publication
focal14_6;
create subscription focal14_7 connection 'host=10.10.10.10 port=5432 user=upgrayedd dbname=dbname' publication
focal14_7;
create subscription focal14_8 connection 'host=10.10.10.10 port=5432 user=upgrayedd dbname=dbname' publication
focal14_8;
create subscription focal14_9 connection 'host=10.10.10.10 port=5432 user=upgrayedd dbname=dbname' publication
focal14_9;
create subscription focal14_10 connection 'host=10.10.10.10 port=5432 user=upgrayedd dbname=dbname' publication
focal14_10;

This script has ened at 09:21:56

First error:
#v+
2022-07-18 09:22:07.046 UTC,,,4145917,,62d5263f.3f42fd,2,,2022-07-18 09:22:07 UTC,28/21641,1219146,ERROR,53400,"could
notfind free replication state slot for replication origin with OID 51",,"Increase max_replication_slots and try
again.",,,,,,,"","logicalreplication worker",,0
 
#v-

Nothing else errored out before, no warning, no fatals.

from the first ERROR I was getting them in the range of 40-70 per minute.

At the same time I was logging data from `select now(), * from pg_replication_slots`, every 2 seconds.

Then, i grouped it into 4 groups:
slots with name focal* with active = true
slots with name focal* with active = false
slots with name *sync* with active = true
slots with name *sync* with active = false

How it looked in time:
09:21:54:  5 total; focal:  1 active,  2 inactive; sync:  1 active,  1 inactive
09:21:56: 11 total; focal:  1 active,  9 inactive; sync:  1 active,  0 inactive
09:21:58: 11 total; focal:  1 active,  8 inactive; sync:  1 active,  1 inactive
09:22:00: 23 total; focal: 10 active,  0 inactive; sync:  0 active, 13 inactive
09:22:03: 28 total; focal: 10 active,  0 inactive; sync:  9 active,  9 inactive
09:22:05: 21 total; focal: 10 active,  0 inactive; sync:  7 active,  4 inactive
09:22:07: 20 total; focal: 10 active,  0 inactive; sync:  7 active,  3 inactive
09:22:09: 27 total; focal: 10 active,  0 inactive; sync:  9 active,  8 inactive
09:22:11: 30 total; focal: 10 active,  0 inactive; sync: 10 active, 10 inactive
09:22:13: 49 total; focal: 10 active,  0 inactive; sync:  5 active, 34 inactive
09:22:15: 50 total; focal: 10 active,  0 inactive; sync:  0 active, 40 inactive
09:22:17: 49 total; focal: 10 active,  0 inactive; sync:  0 active, 39 inactive
09:22:19: 44 total; focal: 10 active,  0 inactive; sync:  1 active, 33 inactive
09:22:21: 43 total; focal: 10 active,  0 inactive; sync:  2 active, 31 inactive
09:22:23: 44 total; focal: 10 active,  0 inactive; sync:  1 active, 33 inactive
09:22:25: 43 total; focal: 10 active,  0 inactive; sync:  1 active, 32 inactive

So, it looks that there are up to 10 focal slots, all active, and then there are sync slots with weirdly high counts
forinactive ones.
 

At most, I had 11 active sync slots.

Looks like some kind of timing issue, which would be inline with what
Kyotaro Horiguchi wrote initially.

The thing is that after some time, the data *gets* replicated, and it
seems to be full on publication side, but the errors are unnerving :)

depesz



pgsql-bugs by date:

Previous
From: hubert depesz lubaczewski
Date:
Subject: Re: Excessive number of replication slots for 12->14 logical replication
Next
From: PG Bug reporting form
Date:
Subject: BUG #17554: when i use rule on table which have serial column, the nextval exec twice.