Re: Local replication "slot does not exist" after initial sync - Mailing list pgsql-general

From Mike Lissner
Subject Re: Local replication "slot does not exist" after initial sync
Date
Msg-id CAMp9=EwDFHpz_p+QPeEq3kGP=0sSDwrV0-z_QxZgYaP-B3a_ug@mail.gmail.com
Whole thread Raw
In response to Local replication "slot does not exist" after initial sync  (Mike Lissner <mlissner@michaeljaylissner.com>)
Responses Re: Local replication "slot does not exist" after initial sync
List pgsql-general
Sorry, two more little things here. The publisher logs add much, but here's what we see:

STATEMENT: START_REPLICATION SLOT "pg_20031_sync_17418_7324846428853951375" LOGICAL F1D0/346C6508 (proto_version '2', publication_names '"compass_publication2"')
ERROR: replication slot "pg_20031_sync_17402_7324846428853951375" does not exist

And I thought that maybe there'd be some magic in the REFRESH command on the subscriber, so I tried that:

alter subscription xyz refresh publication;

To nobody's surprise, that didn't help. :)


On Sun, Feb 25, 2024 at 10:00 AM Mike Lissner <mlissner@michaeljaylissner.com> wrote:
Hi, I set up logical replication a few days ago, but it's throwing some weird log lines that have me worried. Does anybody have experience with lines like the following on a subscriber:

LOG: logical replication table synchronization worker for subscription "compass_subscription", table "search_opinionscitedbyrecapdocument" has started
ERROR: could not start WAL streaming: ERROR: replication slot "pg_20031_sync_17418_7324846428853951375" does not exist
LOG: background worker "logical replication worker" (PID 1014) exited with exit code 1

Slots with this kind of name (pg_xyz_sync_*) are created during the initial sync, but it seems like the subscription is working based on a quick look in a few tables.

I thought this might be related to running out of slots on the publisher, so I increased both max_replication_slots and max_wal_senders to 50 and rebooted so those would take effect. No luck.

I thought rebooting the subscriber might help. No luck.

When I look in the publisher to see the slots we have...

SELECT * FROM pg_replication_slots;

...I do not see the one that's missing according to the log lines.

So it seems like the initial sync might have worked properly (tables have content), but that I have an errant process on the subscriber that might be stuck in a retry loop.

I haven't been able to fix this, and I think my last attempt might be a new subscription with copy_data=false, but I'd rather avoid that if I can.

Is there a way to fix or understand this so that I don't get the log lines forever and so that I can be confident the replication is in good shape?

Thank you!


Mike

pgsql-general by date:

Previous
From: Mike Lissner
Date:
Subject: Local replication "slot does not exist" after initial sync
Next
From: Justin
Date:
Subject: Re: Local replication "slot does not exist" after initial sync