Re: [PoC] pg_upgrade: allow to upgrade publisher node - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [PoC] pg_upgrade: allow to upgrade publisher node
Date
Msg-id CAD21AoB0dbtOpaKn419py5RkEiX1WnoGe=rmw0ahJjr4DEuQmA@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] pg_upgrade: allow to upgrade publisher node  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [PoC] pg_upgrade: allow to upgrade publisher node
List pgsql-hackers
On Wed, Aug 9, 2023 at 1:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Aug 9, 2023 at 8:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Aug 7, 2023 at 6:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Mon, Aug 7, 2023 at 2:02 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > WAL records for hint bit updates could be generated even in upgrading mode?
> > > >
> > >
> > > Do you mean these records can be generated during reading catalog tables?
> >
> > Yes.
> >
>
> BTW, Kuroda-San has verified and found that three types of records
> (including XLOG_FPI_FOR_HINT) can be generated by the system during
> the upgrade. See email [1].
>
> > >
> > > > > I feel if there is a chance of any WAL activity during the
> > > > > upgrade, we need to either change the check to ensure such WAL records
> > > > > are expected or document the same in some way.
> > > >
> > > > Yes, but how does it work with the above idea of limiting the number
> > > > of WAL records to read? If XLOG_FPI_FOR_HINT can still be generated in
> > > > the upgrade mode, we cannot predict how many such records are
> > > > generated after the latest CHECKPOINT_SHUTDOWN.
> > > >
> > >
> > > Right, as said earlier, in that case, we need to rely on the type of records.
> >
> > Another idea would be that before starting the old cluster we check if
> > the slot's confirmed_flush_lsn in the slot state file matches the
> > latest checkpoint LSN got by pg_controlfile. We need another tool to
> > dump the slot state file, though.
> >
>
> I feel it would be a good idea to provide such a tool for users to
> avoid getting errors during upgrade but I think the upgrade code still
> needs to ensure that there are no WAL records between
> confirm_flush_lsn and SHUTDOWN_CHECKPOINT than required. Or, do you
> want to say that we don't do any verification check during the upgrade
> and let the data loss happens if the user didn't ensure that by
> running such a tool?

I meant that if we can check the slot state file while the old cluster
stops, we can ensure there are no WAL records between slot's
confirmed_fluhs_lsn (in the state file) and the latest checkpoint (in
the control file).

>
> > >
> > > > I'm not really sure we should always perform the slot's
> > > > confirmed_flush_lsn check by default in the first place. With this
> > > > check, the upgrade won't be able to proceed if there is any logical
> > > > slot that is not used by logical replication (or something streaming
> > > > the changes using walsender), right? For example, if a user uses a
> > > > program that periodically consumes the changes from the logical slot,
> > > > the slot would not be able to pass the check even if the user executed
> > > > pg_logical_slot_get_changes() just before shutdown. The backend
> > > > process who consumes the changes is always terminated before the
> > > > shutdown checkpoint. On the other hand, I think there are cases where
> > > > the user can ensure that no meaningful WAL records are generated after
> > > > the last pg_logical_slot_get_changes(). I'm concerned that this check
> > > > might make upgrading such cases cumbersome unnecessarily.
> > > >
> > >
> > > You are right and I have mentioned the same case today in my response
> > > to Jonathan but do you have better ideas to deal with such slots than
> > > to give an ERROR?
> >
> > It makes sense to me to give an ERROR for such slots but does it also
> > make sense to make the check optional?
> >
>
> We can do that if we think so. We have two ways to make this check
> optional (a) have a switch like --include-logical-replication-slots as
> the proposed patch has which means by default we won't try to upgrade
> slots; (b) have a switch like --exclude-logical-replication-slots as
> Jonathan proposed which means we will exclude slots only if specified
> by user. Now, one thing to note is that we don't seem to have any
> include/exclude switch in the upgrade which I think indicates users by
> default prefer to upgrade everything. Now, even if we decide not to
> give any switch initially but do it only if there is a user demand for
> it then also users will have a way to proceed with an upgrade which is
> by dropping such slots. Do you have any preference?

TBH I'm not sure if there is a use case where the user wants to
exclude replication slots during the upgrade. Including replication
slots by default seems to be better to me, at least for now. I
initially thought asking for users to drop replication slots that
possibly have not consumed all WAL records would not be a good idea,
but since we already do such things in check.c I now think it would
not be a problem. I guess it would be great if we can check WAL
records between slots' confimed_flush_lsn and the latest LSN, and if
there are no meaningful WAL records there we can upgrade the
replication slots.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Incorrect handling of OOM in WAL replay leading to data loss
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Incorrect handling of OOM in WAL replay leading to data loss