Hi all
I recently wrote some notes on interaction between physical
replication failover/promotion and logical replication publisher
and/or standby.
As you probably all know, right now we don't support physical failover
for logical replication publishers at all, either for in-core logical
replication or for 3rd party solutions. And while we support physical
failover and promotion of subscribers, there turn out to be some
issues with that too.
I'm not trying to solve those issues right now. But since there are
various out-of-tree replication tools trying to work around these
limitations, I wanted to share my knowledge of the various hazards and
challenges involved in doing so, so I've written a wiki article on it.
https://wiki.postgresql.org/wiki/Logical_replication_and_physical_standby_failover
I tried to address many of these issues with failover slots, but I am
not trying to beat that dead horse now. I know that at least some
people here are of the opinion that effort shouldn't go into
logical/physical replication interoperation anyway - that we should
instead address the remaining limitations in logical replication so
that it can provide complete HA capabilities without use of physical
replication. So for now I'm just trying to save others who go looking
into these issues some time and warn them about some of the less
obvious booby-traps.
I do want to add some info to the logical decoding docs around slot
fast-forward behaviour and how to write clients to avoid missing or
double-processing transactions. I'd welcome advice on the best way to
do that in a manner that would be accepted by this community.