RE: [PoC] pg_upgrade: allow to upgrade publisher node - Mailing list pgsql-hackers
From | Zhijie Hou (Fujitsu) |
---|---|
Subject | RE: [PoC] pg_upgrade: allow to upgrade publisher node |
Date | |
Msg-id | OS0PR01MB571640E1B58741979A5E586594F7A@OS0PR01MB5716.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | RE: [PoC] pg_upgrade: allow to upgrade publisher node ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>) |
Responses |
Re: [PoC] pg_upgrade: allow to upgrade publisher node
|
List | pgsql-hackers |
On Wednesday, September 13, 2023 9:52 PM Kuroda, Hayato/黒田 隼人 <kuroda.hayato@fujitsu.com> wrote: > > Dear Amit, > > Thank you for reviewing! Before making a patch I can reply the important point. > > > 1. One thing to note is that if user checks whether the old cluster is > > upgradable with --check option and then try to upgrade, that will also > > fail. Because during the --check run there would at least one > > additional shutdown checkpoint WAL and then in the next run the slots > > position won't match. Note, I am saying this in context of using > > --check option with not-running old cluster. Won't that be surprising > > to users? One possibility is that we document such a behaviour and > > other is that we go back to WAL reading design where we can ignore > > known WAL records like shutdown checkpoint, XLOG_RUNNING_XACTS, etc. > > Good catch, we have never considered the case that --check is executed for > stopped cluster. You are right, the old cluster is turned on/off during the > check and it generates SHUTDOWN_CHECKPOINT. This leads that > confirmed_flush is > behind the latest checkpoint lsn. > > Here are other approaches we came up with: > > 1. adds WARNING message when the --check is executed and slots are > checked. > We can say like: > > ``` > ... > Checking for valid logical replication slots > WARNING: this check generated WALs > Next pg_uprade would fail. > Please ensure again that all WALs are replicated. > ... > ``` > > > 2. adds hint message in the FATAL error when the confirmed_flush is not same > as > the latest checkpoint: > > ``` > ... > Checking for valid logical replication slots fatal > > Your installation contains invalid logical replication slots. > These slots can't be copied, so this cluster cannot be upgraded. > Consider removing such slots or consuming the pending WAL if any, > and then restart the upgrade. > If you did pg_upgrade --check before this run, it may be a cause. > Please start clusters and confirm again that all changes are > replicated. > A list of invalid logical replication slots is in the file: > ``` > > 3. requests users to do pg_upgrade --check on backup database, if old cluster > has logical slots. Basically they save a whole of cluster before doing > pg_uprade, > so it may be acceptable. This is not a modification of codes. > Here are some more ideas about the issue for reference. 1) Extending the controlfile. We can dd a new field (e.g. non_upgrade_checkPoint) to record the last check point ptr happened in non-upgrade mode. The new field won't be updated due to "pg_upgrade --check", so pg_upgrade can use this LSN to compare with the slot's confirmed_flush_lsn. Pros: User can smoothly upgrade the cluster even if they run "pg_upgrade --check" in advance. Cons: Not sure if this is a enough reason to introduce new field in controlfile. ----------- 2) Advance the slot's confirmed_flush_lsn in pg_upgrade if the check passes. Introducing an upgrade support SQL function (binary_upgrade_advance_logical_slot_lsn()) to set a flag(catch_confirmed_lsn_up) on server side. On server side, when trying to flush the slot in shutdown checkpoint(CheckPointReplicationSlots), we update the slot's confirmed_flush_lsn to the lsn of the current checkpoint if catch_confirmed_lsn_up is set. Pros: User can smoothly upgrade the cluster even if they run "pg_upgrade --check" in advance. Cons: Although we have some examples for using functions (binary_upgrade_set_next_pg_enum_oid ...) to set some variables during upgrade , but not sure if it's a standard behavior to change the slot's lsn during upgrade. ----------- 3) Introduce a new pg_upgrade option(e.g. skip_slot_check), and suggest if user already did the upgrade check for stopped server, they can use this option when trying to upgrade later. Pros: Can save some efforts for user to advance each slot's lsn. Cons: I didn't see similar options in pg_upgrade, might need some agreement. Best Regards, Hou zj
pgsql-hackers by date: