On Thu, 18 Jan 2024 at 13:00, Bowen Shi <zxwsbg12138@gmail.com> wrote:
>
> Dears,
>
> I encountered a similar problem when I used logical replication to replicate databases from pg 16 to pg 16.
>
> I started 3 subscription in parallel, and subscriber's postgresql.conf is following:
> max_replication_slots = 10
> max_sync_workers_per_subscription = 2
>
> However, after 3 minutes, I found three COPY errors in subscriber:
> "error while shutting down streaming COPY: ERROR: could not find record while sending logically-decoded data:
missingcontrecord at xxxx/xxxxxxxxx""
> Then, the subscriber began to print a large number of errors: "could not find free replication state slot for
replicationorigin with ID 11, Increase max_replication_slots and try again."
>
> And the publisher was full of pg_xxx_sync_xxxxxxx slots, printing lots of "all replication slots are in use, Free one
orincrease max_replication_slots."
>
> This question is very similar to https://www.postgresql.org/message-id/flat/20220714115155.GA5439%40depesz.com . When
thetable sync worker encounters an error and exits while copying a table, the replication origin will not be deleted.
Andnew table sync workers would create sync slot in the publisher and then exit without dropping them.
I had tried various tests with the suggested configuration, but I did
not hit this scenario. I was able to simulate this problem with a
lesser number of max_replication_slots, but the behavior is as
expected in this case.
If you have a test case or logs for this, can you share it please. It
will be easier to generate the sequence of things that is happening
and to project a clear picture of what is happening.
Regards,
Vignesh