Re: pg_upgrade and logical replication - Mailing list pgsql-hackers
From | Julien Rouhaud |
---|---|
Subject | Re: pg_upgrade and logical replication |
Date | |
Msg-id | 20230302105056.cgu2lb5pryazdnxm@jrouhaud Whole thread Raw |
In response to | Re: pg_upgrade and logical replication (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: pg_upgrade and logical replication
|
List | pgsql-hackers |
On Thu, Mar 02, 2023 at 03:47:53PM +0530, Amit Kapila wrote: > On Wed, Mar 1, 2023 at 12:25 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > 1) setup a normal physical replication cluster (pg_basebackup, restoring PITR, > > whatever), let's call the primary node "A" and replica node "B" > > 2) ensure WAL level is "logical" on the primary node A > > 3) create a logical replication slot on every (connectable) database (or just > > the one you're interested in if you don't want to preserve everything) on A > > 4) create a FOR ALL TABLE publication (again for every databases or just the > > one you're interested in) > > 5) wait for replication to be reasonably if not entirely up to date > > 6) promote the standby node B > > 7) retrieve the promotion LSN (from the XXXXXXXX.history file, > > pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn()...) > > 8) call pg_replication_slot_advance() with that LSN for all previously created > > logical replication slots on A > > > > How are these slots used? Do subscriptions use these slots? Yes, as this is the only way to make sure that you replicate everything since the promotion, and only once. To be more precise, something like that: CREATE SUBSCRIPTION db_xxx_subscription CONNECTION 'dbname=db_xxx user=...' PUBLICATION sub_for_db_xxx WITH (create_slot = false, slot_name = 'slot_for_db_xxx', copy_data = false); > > 9) create a normal subscription on all wanted databases on the promoted node > > 10) wait for it to catchup if needed on B > > 12) stop the node B > > 13) run pg_upgrade on B, creating the new node C > > 14) start C, run the global ANALYZE and any sanity check needed (hopefully you > > would have validated that your application is compatible with that new > > version before this point) > > 15) re-enable the subscription on C. This is currently not possible without > > losing data, the patch fixes that > > 16) wait for it to catchup if needed > > 17) create any missing relation and do the ALTER SUBSCRIPTION ... REFRESH if > > needed > > 18) trash B > > 19) create new nodes D, E... as physical replica from C if needed, possibly > > using cheaper approach like pg_start_backup() / rsync / pg_stop_backup if > > needed > > 20) switchover to C and trash A (or convert it to another replica if you want) > > 21) trash the publications on C on all databases > > > > As noted the step 15 is currently problematic, and is also problematic in any > > variation of that scenario that doesn't require you to entirely recreate the > > node C from scratch using logical replication, which is what I want to avoid. > > > > This isn't terribly complicated but requires to be really careful if you don't > > want to end up with an incorrect node C. This approach is also currently not > > entirely ideal, but hopefully logical replication of sequences and DDL will > > remove the main sources of downtime when upgrading using logical replication. > > > > I think there are good chances that one can make mistakes following > all the above steps unless she is an expert. Assuming we do fix pg_upgrade behavior with subscriptions, there isn't much room for error compared to other scenario: - pg_upgrade has been there for ages and contains a lot of sanity checks. People already use it and AFAIK it's not a major pain point, apart from the cases where it can be slow - ALTER SUBSCRIPTIOn ... REFRESH will complain if tables are missing locally - similarly, the logical replica will complain if you're missing some other DDL locally - you only create replica if you had some in the first place, so it's something you should already know how to do. If not, you didn't have any before the upgrade and you still won't have after > > My ultimate goal is to provide some tooling to do that in a much simpler way. > > Maybe a new "promote to logical" action that would take care of steps 2 to 9. > > Users would therefore only have to do this "promotion to logical", and then run > > pg_upgrade and create a new physical replication cluster if they want. > > > > Why don't we try to support the direct upgrade of logical replication > nodes? Have you tried to analyze what are the obstacles and whether we > can have solutions for those? For example, one of the challenges is to > support the upgrade of slots, can we copy (from the old cluster) and > recreate them in the new cluster by resetting LSNs? We can also reset > origins during the upgrade of subscribers and recommend to first > upgrade the subscriber node. I'm not sure I get your question. This whole thread is about direct upgrade of logical replication nodes, at least the subscribers, and what is currently preventing it. For the publisher nodes, that may be something nice to support (I'm assuming it could be useful for more complex replication setups) but I'm not interested in that at the moment as my goal is to reduce downtime for major upgrade of physical replica, thus *not* doing pg_upgrade of the primary node, whether physical or logical. I don't see why it couldn't be done later on, if/when someone has a use case for it.
pgsql-hackers by date: