Dear Amit,
Thank you for reviewing! Before making a patch I can reply the important point.
> 1. One thing to note is that if user checks whether the old cluster is
> upgradable with --check option and then try to upgrade, that will also
> fail. Because during the --check run there would at least one
> additional shutdown checkpoint WAL and then in the next run the slots
> position won't match. Note, I am saying this in context of using
> --check option with not-running old cluster. Won't that be surprising
> to users? One possibility is that we document such a behaviour and
> other is that we go back to WAL reading design where we can ignore
> known WAL records like shutdown checkpoint, XLOG_RUNNING_XACTS, etc.
Good catch, we have never considered the case that --check is executed for
stopped cluster. You are right, the old cluster is turned on/off during the
check and it generates SHUTDOWN_CHECKPOINT. This leads that confirmed_flush is
behind the latest checkpoint lsn.
Good catch, we have never considered the case that --check is executed for
stopped cluster. You are right, the old cluster is turned on/off during the
check and it generates SHUTDOWN_CHECKPOINT. This leads that confirmed_flush is
behind the latest checkpoint lsn.
Here are other approaches we came up with:
1. adds WARNING message when the --check is executed and slots are checked.
We can say like:
```
...
Checking for valid logical replication slots
WARNING: this check generated WALs
Next pg_uprade would fail.
Please ensure again that all WALs are replicated.
...
```
2. adds hint message in the FATAL error when the confirmed_flush is not same as
the latest checkpoint:
```
...
Checking for valid logical replication slots fatal
Your installation contains invalid logical replication slots.
These slots can't be copied, so this cluster cannot be upgraded.
Consider removing such slots or consuming the pending WAL if any,
and then restart the upgrade.
If you did pg_upgrade --check before this run, it may be a cause.
Please start clusters and confirm again that all changes are
replicated.
A list of invalid logical replication slots is in the file:
```
3. requests users to do pg_upgrade --check on backup database, if old cluster
has logical slots. Basically they save a whole of cluster before doing pg_uprade,
so it may be acceptable. This is not a modification of codes.
How do others think?
Best Regards,
Hayato Kuroda
FUJITSU LIMITED