RE: [PoC] pg_upgrade: allow to upgrade publisher node - Mailing list pgsql-hackers

From Zhijie Hou (Fujitsu)
Subject RE: [PoC] pg_upgrade: allow to upgrade publisher node
Date
Msg-id OS0PR01MB5716CE2A338136DD1129D2AC9414A@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: [PoC] pg_upgrade: allow to upgrade publisher node  (Amit Kapila <amit.kapila16@gmail.com>)
Responses RE: [PoC] pg_upgrade: allow to upgrade publisher node
List pgsql-hackers
On Tuesday, August 15, 2023 11:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Tue, Aug 15, 2023 at 7:51 AM Masahiko Sawada <sawada.mshk@gmail.com>
> wrote:
> >
> > On Mon, Aug 14, 2023 at 2:07 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > >
> > > On Mon, Aug 14, 2023 at 7:57 AM Masahiko Sawada
> <sawada.mshk@gmail.com> wrote:
> > > > Another idea is (which might have already discussed thoguh) that we
> check if the latest shutdown checkpoint LSN in the control file matches the
> confirmed_flush_lsn in pg_replication_slots view. That way, we can ensure that
> the slot has consumed all WAL records before the last shutdown. We don't
> need to worry about WAL records generated after starting the old cluster
> during the upgrade, at least for logical replication slots.
> > > >
> > >
> > > Right, this is somewhat closer to what Patch is already doing. But
> > > remember in this case we need to remember and use the latest
> > > checkpoint from the control file before the old cluster is started
> > > because otherwise the latest checkpoint location could be even
> > > updated during the upgrade. So, instead of reading from WAL, we need
> > > to change so that we rely on the control file's latest LSN.
> >
> > Yes, I was thinking the same idea.
> >
> > But it works for only replication slots for logical replication. Do we
> > want to check if no meaningful WAL records are generated after the
> > latest shutdown checkpoint, for manually created slots (or non-logical
> > replication slots)? If so, we would need to have something reading WAL
> > records in the end.
> >
> 
> > > I would prefer this
> > > idea than to invent a new API/tool like pg_replslotdata.
> >
> > +1

Changed the check to compare the latest checkpoint lsn from pg_controldata
with the confirmed_flush_lsn in pg_replication_slots view.

> >
> > >
> > > The other point you and Bruce seem to be favoring is that instead of
> > > dumping/restoring slots via pg_dump, we remember the required
> > > information of slots retrieved during their validation in pg_upgrade
> > > itself and use that to create the slots in the new cluster. Though I
> > > am not aware of doing similar treatment for other objects we restore
> > > in this case it seems reasonable especially because slots are not
> > > stored in the catalog and we anyway already need to retrieve the
> > > required information to validate them, so trying to again retrieve
> > > it via pg_dump doesn't seem useful unless I am missing something.
> > > Does this match your understanding?
> >
> > If there are use cases for --logical-replication-slots-only option
> > other than pg_upgrade, it would be good to have it in pg_dump. I was
> > just not sure of other use cases.
> >
> 
> It was primarily for upgrade purposes only. So, as we can't see a good reason to
> go via pg_dump let's do it in upgrade unless someone thinks otherwise.

Removed the new option in pg_dump and modified the pg_upgrade
directly use the slot info to restore the slot in new cluster.

> 
> > >
> > > Yet another thing I am trying to consider is whether we can allow to
> > > upgrade slots from 16 or 15 to later versions. As of now, the patch
> > > has the following check:
> > > getLogicalReplicationSlots()
> > > {
> > > ...
> > > + /* Check whether we should dump or not */ if (fout->remoteVersion
> > > + < 170000) return;
> > > ...
> > > }
> > >
> > > If we decide to use the existing view pg_replication_slots then can
> > > we consider upgrading slots from the prior version to 17? Now, if we
> > > want to invent any new API similar to pg_replslotdata then we can't
> > > do this because it won't exist in prior versions but OTOH using
> > > existing view pg_replication_slots can allow us to fetch slot info
> > > from older versions as well. So, I think it is worth considering.
> >
> > I think that without 0001 patch the replication slots will not be able
> > to pass the confirmed_flush_lsn check.
> >
> 
> Right, but we can think of backpatching the same. Anyway, we can do that as a
> separate work by starting a new thread to see if there is a broader agreement
> for backpatching such a change. For now, we can focus on >=v17.
> 

Here is the new version patch which addressed above points.
The new version patch also removes the --exclude-logical-replication-slots
option due to recent comment. 
Thanks Kuroda-san for addressing most of the points. 

Best Regards,
Hou zj

Attachment

pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: [PATCH] Add function to_oct
Next
From: Masahiro Ikeda
Date:
Subject: Re: Fix pg_stat_reset_single_table_counters function