On Fri, Oct 31, 2025 at 2:58 PM Alexander Kukushkin <cyberdemn@gmail.com> wrote:
>
> Instead of dropping such slots, what we actually need is a way to safely set synced=false->true and continue
operating.
>
> Operating logical replication setups is already extremely complex and error-prone — this is not theoretical, it’s
somethingmany of us face daily.
> So rather than adding more speculative features or workarounds, I think we should focus on addressing real
operationalpain points and the inconsistencies in the current design.
>
> A slot created on the primary (which later becomes a standby) with failover=true has a very clear purpose. The
failoverflag already indicates that purpose; synced shouldn’t override it.
>
I think this is not as clear as you are saying as compared to WAL. In
failover cases, we bump the WAL timelines on new primary and also have
facilities like pg_rewind to ensure that old primary can follow the
new primary after divergence. For slots, there is no such facility,
now, there is an argument that for slot's it is sufficient to match
the name and failover to say that it is okay to overwrite the slot on
old primary. However, it is not clear whether it is always safe to do
so, for example, if the old primary ran after divergence for sometime
and one has re-created the slot with same name and failover property,
it will no longer be the same slot. Unlike WAL, we don't maintain the
slot's history, so it is not equally clear that we can overwrite old
primary's slot's as it is.
--
With Regards,
Amit Kapila.