RE: [PoC] pg_upgrade: allow to upgrade publisher node - Mailing list pgsql-hackers

From Zhijie Hou (Fujitsu)
Subject RE: [PoC] pg_upgrade: allow to upgrade publisher node
Date
Msg-id OS0PR01MB5716506A1A1B20EFBFA7B52994C1A@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: [PoC] pg_upgrade: allow to upgrade publisher node  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: [PoC] pg_upgrade: allow to upgrade publisher node
List pgsql-hackers
On Thursday, September 28, 2023 5:32 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:

Hi,

> 
> On Mon, Sep 25, 2023 at 4:31 PM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > > 4.
> > > +        /*
> > > +         * There is a possibility that following records may be generated
> > > +         * during the upgrade.
> > > +         */
> > > +        is_valid = is_xlog_record_type(rmid, info, RM_XLOG_ID,
> > > XLOG_CHECKPOINT_SHUTDOWN) ||
> > > +            is_xlog_record_type(rmid, info, RM_XLOG_ID,
> > > XLOG_CHECKPOINT_ONLINE) ||
...
> > >
> > > What if we missed to capture the WAL records that may be generated
> > > during upgrade?
> >
> > If such records are generated before calling
> > binary_upgrade_validate_wal_logical_end(),
> > the upgrading would fail. Otherwise it would be succeeded. Anyway, we
> > don't care such records because those aren't required to be
> > replicated. The main thing we want to detect is that we don't miss any record
> generated before server shutdown.
> 
> I read this
> https://www.postgresql.org/message-id/20230725170319.h423jbthfohwgnf7@a
> work3.anarazel.de
> and understand that the current patch implements the approach suggested
> there - "scan the end of the WAL for records that should have been streamed
> out". I think the WAL records that should have been streamed out are all WAL
> record types in XXXX_decode functions except the ones that have a no-op or an
> op unrelated to logical decoding. For instance,
> - for xlog_decode, if the records of type {XLOG_CHECKPOINT_ONLINE,
> XLOG_PARAMETER_CHANGE, XLOG_NOOP, XLOG_NEXTOID, XLOG_SWITCH,
> XLOG_BACKUP_END, XLOG_RESTORE_POINT, XLOG_FPW_CHANGE,
> XLOG_FPI_FOR_HINT, XLOG_FPI, XLOG_OVERWRITE_CONTRECORD} are found
> after confirmed_flush LSN, it is fine.
> - for xact_decode, if the records of type {XLOG_XACT_ASSIGNMENT} are found
> after confirmed_flush LSN, it is fine.
> - for standby_decode, if the records of type {XLOG_STANDBY_LOCK,
> XLOG_INVALIDATIONS} are found after confirmed_flush LSN, it is fine.
> - for standby_decode, if the records of type {XLOG_STANDBY_LOCK,
> XLOG_INVALIDATIONS} are found after confirmed_flush LSN, it is fine.
> - for heap2_decode, if the records of type {XLOG_HEAP2_REWRITE,
> XLOG_HEAP2_FREEZE_PAGE, XLOG_HEAP2_PRUNE, XLOG_HEAP2_VACUUM,
> XLOG_HEAP2_VISIBLE, XLOG_HEAP2_LOCK_UPDATED} are found after
> confirmed_flush LSN, it is fine.
> - for heap_decode, if the records of type {XLOG_HEAP_LOCK} are found after
> confirmed_flush LSN, it is fine.
> 
> I think all of the above WAL records are okay to be present after cofirmed_flush
> LSN. If any WAL records other than the above are found after confirmed_flush
> LSN, those are the one that should have been streamed out and the pg_upgrade
> must complain with "The slot "foo" has not consumed the WAL yet" for all such
> slots, right? But, the function binary_upgrade_validate_wal_logical_end checks
> for only a handful of the above record types. I know that the list is arrived at
> based on testing, but it may happen that any of the above WAL records may be
> generated and present before/during/after pg_upgrade for which pg_upgrade
> failure isn't wanted.
> 
> Perhaps, a function in logical/decode.c returning the WAL record as valid if the
> record type is any of the above. A note in replication/decode.h and/or
> access/rmgrlist.h asking rmgr adders to categorize the WAL record type in the
> new function based on its decoding operation might help with future new WAL
> record type additions.
> 
> Thoughts?

I think this approach can work, but I am not sure if it's better than other
approaches. Mainly because it has almost the same maintaince burden as the
current approach, i.e. we need to verify and update the check function each
time we add a new WAL record type.

Apart from the WAL scan approach, we also considered alternative approach that
do not impose an additional maintenance burden and could potentially be less
complex.  For example, we can add a new field in pg_controldata to record the
last checkpoint that happens in non-upgrade mode, so that we can compare the
slot's confirmed_flush_lsn with this value, If they are the same, the WAL
should have been consumed otherwise we disallow upgrading this slot. I would
appreciate if you can share your thought about this approach.

And if we decided to use WAL scan approach, instead of checking each record, we
could directly check if the WAL record can be decoded into meaningful results
by use test_decoding to decode them. This approach also doesn't add new
maintenance burden as we anyway need to update the test_decoding if any decode
logic for new record changes. This was also mentioned [1].

What do you think ?

[1]
https://www.postgresql.org/message-id/OS0PR01MB5716FC0F814D78E82E4CC3B894C3A%40OS0PR01MB5716.jpnprd01.prod.outlook.com

Best Regards,
Hou zj

pgsql-hackers by date:

Previous
From: "Drouvot, Bertrand"
Date:
Subject: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag
Next
From: Melanie Plageman
Date:
Subject: Re: Eliminate redundant tuple visibility check in vacuum