> For the publisher nodes, that may be something nice to support (I'm assuming it
> could be useful for more complex replication setups) but I'm not interested in
> that at the moment as my goal is to reduce downtime for major upgrade of
> physical replica, thus *not* doing pg_upgrade of the primary node, whether
> physical or logical. I don't see why it couldn't be done later on, if/when
> someone has a use case for it.
>
I thought there is value if we provide a way to upgrade both publisher
and subscriber.
it's still unclear to me whether it's actually achievable on the publisher side, as running pg_upgrade leaves a "hole" in the WAL stream and resets the timeline, among other possible difficulties. Now I don't know much about logical replication internals so I'm clearly not the best person to answer those questions.
Now, you came up with a use case linking it to a
physical replica where allowing an upgrade of only subscriber nodes is
useful. It is possible that users find your steps easy to perform and
didn't find them error-prone but it may be better to get some
authentication of the same. I haven't yet analyzed all the steps in
detail but let's see what others think.
It's been quite some time since and no one seemed to chime in or object. IMO doing a major version upgrade with limited downtime (so something faster than stopping postgres and running pg_upgrade) has always been difficult and never prevented anyone from doing it, so I don't think that it should be a blocker for what I'm suggesting here, especially since the current behavior of pg_upgrade on a subscriber node is IMHO broken.
Is there something that can be done for pg16? I was thinking that having a fix for the normal and easy case could be acceptable: only allowing pg_upgrade to optionally, and not by default, preserve the subscription relations IFF all subscriptions only have tables in ready state. Different states should be transient, and it's easy to check as a user beforehand and also easy to check during pg_upgrade, so it seems like an acceptable limitations (which I personally see as a good sanity check, but YMMV). It could be lifted in later releases if wanted anyway.
It's unclear to me whether this limited scope would also require to preserve the replication origins, but having looked at the code I don't think it would be much of a problem as the local LSN doesn't have to be preserved. In both cases I would prefer a single option (e. g. --preserve-logical-subscription-state or something like that) to avoid too much complications. Similarly, I still don't see any sensible use case for allowing such option in a normal pg_dump so I'd rather not expose that.