RE: [PoC] pg_upgrade: allow to upgrade publisher node - Mailing list pgsql-hackers

From Hayato Kuroda (Fujitsu)
Subject RE: [PoC] pg_upgrade: allow to upgrade publisher node
Date
Msg-id TYAPR01MB5866EF7398CB13FFDBF230E7F5F0A@TYAPR01MB5866.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: [PoC] pg_upgrade: allow to upgrade publisher node  (Amit Kapila <amit.kapila16@gmail.com>)
Responses RE: [PoC] pg_upgrade: allow to upgrade publisher node
Re: [PoC] pg_upgrade: allow to upgrade publisher node
List pgsql-hackers
Dear Amit,

Thank you for reviewing! Before making a patch I can reply the important point.

> 1. One thing to note is that if user checks whether the old cluster is
> upgradable with --check option and then try to upgrade, that will also
> fail. Because during the --check run there would at least one
> additional shutdown checkpoint WAL and then in the next run the slots
> position won't match. Note, I am saying this in context of using
> --check option with not-running old cluster. Won't that be surprising
> to users? One possibility is that we document such a behaviour and
> other is that we go back to WAL reading design where we can ignore
> known WAL records like shutdown checkpoint, XLOG_RUNNING_XACTS, etc.

Good catch, we have never considered the case that --check is executed for
stopped cluster. You are right, the old cluster is turned on/off during the
check and it generates SHUTDOWN_CHECKPOINT. This leads that confirmed_flush is
behind the latest checkpoint lsn.

Good catch, we have never considered the case that --check is executed for
stopped cluster. You are right, the old cluster is turned on/off during the
check and it generates SHUTDOWN_CHECKPOINT. This leads that confirmed_flush is
behind the latest checkpoint lsn.

Here are other approaches we came up with:

1. adds WARNING message when the --check is executed and slots are checked.
   We can say like: 

```
...
Checking for valid logical replication slots                  
WARNING: this check generated WALs
Next pg_uprade would fail.
Please ensure again that all WALs are replicated.
...
```


2. adds hint message in the FATAL error when the confirmed_flush is not same as
   the latest checkpoint:

```
...
Checking for valid logical replication slots                  fatal

Your installation contains invalid logical replication slots.
These slots can't be copied, so this cluster cannot be upgraded.
Consider removing such slots or consuming the pending WAL if any,
and then restart the upgrade.
If you did pg_upgrade --check before this run, it may be a cause.
Please start clusters and confirm again that all changes are
replicated.
A list of invalid logical replication slots is in the file:
```

3. requests users to do pg_upgrade --check on backup database, if old cluster
   has logical slots. Basically they save a whole of cluster before doing pg_uprade,
   so it may be acceptable. This is not a modification of codes.

How do others think?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [ psql - review request ] review request for \d+ tablename, \d+ indexname indenting
Next
From: Maxim Orlov
Date:
Subject: Re: Add 'worker_type' to pg_stat_subscription