Hi, I set up logical replication a few days ago, but it's throwing some weird log lines that have me worried. Does anybody have experience with lines like the following on a subscriber:
LOG: logical replication table synchronization worker for subscription "compass_subscription", table "search_opinionscitedbyrecapdocument" has started
ERROR: could not start WAL streaming: ERROR: replication slot "pg_20031_sync_17418_7324846428853951375" does not exist
LOG: background worker "logical replication worker" (PID 1014) exited with exit code 1
Slots with this kind of name (pg_xyz_sync_*) are created during the initial sync, but it seems like the subscription is working based on a quick look in a few tables.
I thought this might be related to running out of slots on the publisher, so I increased both max_replication_slots and max_wal_senders to 50 and rebooted so those would take effect. No luck.
I thought rebooting the subscriber might help. No luck.
When I look in the publisher to see the slots we have...
SELECT * FROM pg_replication_slots;
...I do not see the one that's missing according to the log lines.
So it seems like the initial sync might have worked properly (tables have content), but that I have an errant process on the subscriber that might be stuck in a retry loop.
I haven't been able to fix this, and I think my last attempt might be a new subscription with copy_data=false, but I'd rather avoid that if I can.
Is there a way to fix or understand this so that I don't get the log lines forever and so that I can be confident the replication is in good shape?
Thank you!
Mike