Hi,
On 2022-02-07 13:38:38 +0530, Ashutosh Sharma wrote:
> Are you talking about this scenario - what if the logical replication
> slot on the publisher is dropped, but is being referenced by the
> standby where the slot is synchronized?
It's a bit hard to say, because neither in this thread nor in the patch I've
found a clear description of what the syncing needs to & tries to
guarantee. It might be that that was discussed in one of the precursor
threads, but...
Generally I don't think we can permit scenarios where a slot can be in a
"corrupt" state, i.e. missing required catalog entries, after "normal"
administrative commands (i.e. not mucking around in catalog entries / on-disk
files). Even if the sequence of commands may be a bit weird. All such cases
need to be either prevented or detected.
As far as I can tell, the way this patch keeps slots on physical replicas
"valid" is solely by reorderbuffer.c blocking during replay via
wait_for_standby_confirmation().
Which means that if e.g. the standby_slot_names GUC differs from
synchronize_slot_names on the physical replica, the slots synchronized on the
physical replica are not going to be valid. Or if the primary drops its
logical slots.
> Should the redo function for the drop replication slot have the capability
> to drop it on standby and its subscribers (if any) as well?
Slots are not WAL logged (and shouldn't be).
I think you pretty much need the recovery conflict handling infrastructure I
referenced upthread, which recognized during replay if a record has a conflict
with a slot on a standby. And then ontop of that you can build something like
this patch.
Greetings,
Andres Freund