Re: pg_upgrade and logical replication - Mailing list pgsql-hackers
From | Julien Rouhaud |
---|---|
Subject | Re: pg_upgrade and logical replication |
Date | |
Msg-id | 20230226030518.q6tggrid2vacmvlb@jrouhaud Whole thread Raw |
In response to | Re: pg_upgrade and logical replication (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: pg_upgrade and logical replication
|
List | pgsql-hackers |
On Sat, Feb 25, 2023 at 11:24:17AM +0530, Amit Kapila wrote: > On Wed, Feb 22, 2023 at 12:13 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > Is there really a use case for dumping the content of pg_subscription_rel > > > outside of pg_upgrade? > > I think the users who want to take a dump and restore the entire > cluster may need it there for the same reason as pg_upgrade needs it. > TBH, I have not seen such a request but this is what I imagine one > would expect if we provide this functionality via pg_upgrade. But the pg_subscription_rel data are only needed if you want to resume logical replication from the exact previous state, otherwise you can always refresh the subscription and it will retrieve the list of relations automatically (dealing with initial sync and so on). It's hard to see how it could be happening with a plain pg_dump. The only usable scenario I can see would be to disable all subscriptions on the logical replica, maybe make sure that no one does any write those tables if you want to eventually switch over on the restored node, do a pg_dump(all), restore it and then resume the logical replication / subscription(s) on the restored server. That's a lot of constraints for something that pg_upgrade deals with so much more efficiently. Maybe one plausible use case would be to split a single logical replica to N servers, one per database / publication or something like that. In that case pg_upgrade won't be that useful and if each target subset is small enough a pg_dump/pg_restore may be a viable option. But if that's a viable option then surely creating the logical replica from scratch using normal logical table sync should be an even better option. I'm really worried that it's going to be a giant foot-gun that any user should really avoid. > > > For the pg_upgrade use-case, do you see any reason to not restore the > > > pg_subscription_rel by default? > > As I said earlier, one can very well say that giving more flexibility > (in terms of where the publications will be after restore) after a > restore is a better idea. Also, we are doing the same till now without > any major complaints about the same, so it makes sense to keep the > current behavior as default. I'm a bit dubious that anyone actually tried to run pg_upgrade on a logical replica and then kept using logical replication, as it's currently impossible to safely resume replication without truncating all target relations. As I mentioned before, if we keep the current behavior as a default there should be an explicit warning in the documentation stating that you need to truncate all target relations before resuming logical replication as otherwise you have a guarantee that you will lose data. > > > Maybe having an option to not restore it would > > > make sense if it indeed add noticeable overhead when publications have a lot of > > > tables? > > Yeah, that could be another reason to not do it default. I will do some benchmark with various number of relations, from high to unreasonable. > > > > Since I didn't hear any objection I worked on a POC patch with this approach. > > > > For now when pg_dump is invoked with --binary, it will always emit extra > > commands to restore the relation list. This command is only allowed when the > > server is started in binary upgrade mode. > > > > The new command is of the form > > > > ALTER SUBSCRIPTION name ADD TABLE (relid = X, state = 'Y', lsn = 'Z/Z') > > > > with the lsn part being optional. > > > > BTW, do we restore the origin and its LSN after the upgrade? Because > without that this won't be sufficient as that is required for apply > worker to ensure that it is in sync with table sync workers. We currently don't, which is yet another sign that no one actually tried to resume logical replication after a pg_upgrade. That being said, trying to pg_upgrade a node that's currently syncing relations seems like a bad idea (I didn't even think to try), but I guess it should also be supported. I will work on that too. Assuming we add a new option for controlling either plain pg_dump and/or pg_upgrade behavior, should this option control both pg_subscription_rel and replication origins and their data or do we need more granularity?
pgsql-hackers by date: