Re: Excessive number of replication slots for 12->14 logical replication - Mailing list pgsql-bugs

From Amit Kapila
Subject Re: Excessive number of replication slots for 12->14 logical replication
Date
Msg-id CAA4eK1+RL43Qty=Rb+RJhw0+0sm-f2T=7ST=9u0R+vmnDDVZaA@mail.gmail.com
Whole thread Raw
In response to Re: Excessive number of replication slots for 12->14 logical replication  (hubert depesz lubaczewski <depesz@depesz.com>)
Responses Re: Excessive number of replication slots for 12->14 logical replication
List pgsql-bugs
On Mon, Jul 18, 2022 at 3:13 PM hubert depesz lubaczewski
<depesz@depesz.com> wrote:
>
> On Mon, Jul 18, 2022 at 09:07:35AM +0530, Amit Kapila wrote:
>
> First error:
> #v+
> 2022-07-18 09:22:07.046 UTC,,,4145917,,62d5263f.3f42fd,2,,2022-07-18 09:22:07 UTC,28/21641,1219146,ERROR,53400,"could
notfind free replication state slot for replication origin with OID 51",,"Increase max_replication_slots and try
again.",,,,,,,"","logicalreplication worker",,0 
> #v-
>
> Nothing else errored out before, no warning, no fatals.
>
> from the first ERROR I was getting them in the range of 40-70 per minute.
>
> At the same time I was logging data from `select now(), * from pg_replication_slots`, every 2 seconds.
>
...
>
> So, it looks that there are up to 10 focal slots, all active, and then there are sync slots with weirdly high counts
forinactive ones. 
>
> At most, I had 11 active sync slots.
>
> Looks like some kind of timing issue, which would be inline with what
> Kyotaro Horiguchi wrote initially.
>

I think this is a timing issue similar to what Horiguchi-San has
pointed out but due to replication origins. We drop the replication
origin after the sync worker that has used it is finished. This is
done by the apply worker because we don't allow to drop the origin
till the process owning the origin is alive. I am not sure of
repercussions but maybe we can allow dropping the origin by the
process that owns it.

I think this will also be addressed once we start resuing
workers/slots/origin to copy multiple tables in the initial sync phase
as is being discussed in the thread [1].

[1] - https://www.postgresql.org/message-id/CAGPVpCTq%3DrUDd4JUdaRc1XUWf4BrH2gdSNf3rtOMUGj9rPpfzQ%40mail.gmail.com

--
With Regards,
Amit Kapila.



pgsql-bugs by date:

Previous
From: Francisco Olarte
Date:
Subject: Re: BUG #17554: when i use rule on table which have serial column, the nextval exec twice.
Next
From: PG Bug reporting form
Date:
Subject: BUG #17555: Missing rhel-9 repo