Re: pg_upgrade and logical replication - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: pg_upgrade and logical replication
Date
Msg-id 20230301065527.gk7pdtz5kcjy7rc5@jrouhaud
Whole thread Raw
In response to Re: pg_upgrade and logical replication  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: pg_upgrade and logical replication  (Amit Kapila <amit.kapila16@gmail.com>)
Re: pg_upgrade and logical replication  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Wed, Mar 01, 2023 at 11:51:49AM +0530, Amit Kapila wrote:
> On Tue, Feb 28, 2023 at 10:18 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > Well, as I mentioned I'm *not* interested in a logical-replication-only
> > scenario.  Logical replication is nice but it will always be less efficient
> > than physical replication, and some workloads also don't really play well with
> > it.  So while it can be a huge asset in some cases I'm for now looking at
> > leveraging logical replication for the purpose of major upgrade only for a
> > physical replication cluster, so the publications and subscriptions are only
> > temporary and trashed after use.
> >
> > That being said I was only saying that if I had to do a major upgrade of a
> > logical replication cluster this is probably how I would try to do it, to
> > minimize downtime, even if there are probably *a lot* difficulties to
> > overcome.
> >
>
> Okay, but it would be better if you list out your detailed steps. It
> would be useful to support the new mechanism in this area if others
> also find your steps to upgrade useful.

Sure.  Here are the overly detailed steps:

 1) setup a normal physical replication cluster (pg_basebackup, restoring PITR,
    whatever), let's call the primary node "A" and replica node "B"
 2) ensure WAL level is "logical" on the primary node A
 3) create a logical replication slot on every (connectable) database (or just
    the one you're interested in if you don't want to preserve everything) on A
 4) create a FOR ALL TABLE publication (again for every databases or just the
    one you're interested in)
 5) wait for replication to be reasonably if not entirely up to date
 6) promote the standby node B
 7) retrieve the promotion LSN (from the XXXXXXXX.history file,
    pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn()...)
 8) call pg_replication_slot_advance() with that LSN for all previously created
    logical replication slots on A
 9) create a normal subscription on all wanted databases on the promoted node
10) wait for it to catchup if needed on B
12) stop the node B
13) run pg_upgrade on B, creating the new node C
14) start C, run the global ANALYZE and any sanity check needed (hopefully you
    would have validated that your application is compatible with that new
    version before this point)
15) re-enable the subscription on C.  This is currently not possible without
    losing data, the patch fixes that
16) wait for it to catchup if needed
17) create any missing relation and do the ALTER SUBSCRIPTION ... REFRESH if
    needed
18) trash B
19) create new nodes D, E... as physical replica from C if needed, possibly
using cheaper approach like pg_start_backup() / rsync / pg_stop_backup if
needed
20) switchover to C and trash A (or convert it to another replica if you want)
21) trash the publications on C on all databases

As noted the step 15 is currently problematic, and is also problematic in any
variation of that scenario that doesn't require you to entirely recreate the
node C from scratch using logical replication, which is what I want to avoid.

This isn't terribly complicated but requires to be really careful if you don't
want to end up with an incorrect node C.  This approach is also currently not
entirely ideal, but hopefully logical replication of sequences and DDL will
remove the main sources of downtime when upgrading using logical replication.

My ultimate goal is to provide some tooling to do that in a much simpler way.
Maybe a new "promote to logical" action that would take care of steps 2 to 9.
Users would therefore only have to do this "promotion to logical", and then run
pg_upgrade and create a new physical replication cluster if they want.



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Commitfest 2023-03 starting tomorrow!
Next
From: Bharath Rupireddy
Date:
Subject: Re: Track Oldest Initialized WAL Buffer Page