On Thu, Sep 28, 2023 at 1:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Sep 28, 2023 at 10:44 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > > No, without that commit, there is a very high possibility that even if
> > > we have sent the WAL to the subscriber and got the acknowledgment of
> > > the same, we would miss updating it before shutdown. This would lead
> > > to upgrade failures because upgrades have no way to later identify
> > > whether the remaining WAL records are sent to the subscriber.
> >
> > Thanks for clarifying. I'm trying understand what happens without
> > commit e0b2eed0 with an illustration:
> >
> > step 1: publisher - confirmed_flush LSN in replication slot on disk
> > structure is 80
> > step 2: publisher - sends WAL at LSN 100
> > step 3: subscriber - acknowledges the apply LSN or confirmed_flush LSN as 100
> > step 4: publisher - shuts down without writing the new confirmed_flush
> > LSN as 100 to disk, note that commit e0b2eed0 is not in place
> > step 5: publisher - restarts
> > step 6: subscriber - upon publisher restart, the subscriber requests
> > WAL from publisher from LSN 100 as it tracks the last applied LSN in
> > replication origin
> >
> > Now, if the pg_upgrade with the patch in this thread is run on
> > publisher after step 4, it complains with "The slot \"%s\" has not
> > consumed the WAL yet".
> >
> > Is my above understanding right?
> >
>
> Yes.
Thanks. Trying things with replication lag - when there's a lag, the
pg_upgrade can't proceed further and it complains "The slot "mysub"
has not consumed the WAL yet".
I think the best way to upgrade a postgres instance with logical
replication slots is: 1) ensure no replication lag for the logical
slots; 2) perform pg_upgrade --check first; 3) perform pg_upgrade if
there are no complaints.
With the above understanding, it looks to me that the commit e0b2eed0
isn't necessary for back branches. Because, without it the pg_upgrade
complains "The slot "mysub" has not consumed the WAL yet", and then
the user has to restart the instance to ensure the WAL is consumed
(IOW, to get the correct confirmed_flush LSN to the disk).
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com