Re: [PoC] pg_upgrade: allow to upgrade publisher node - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [PoC] pg_upgrade: allow to upgrade publisher node |
Date | |
Msg-id | CAD21AoCzY5MeP8VaqctGcMPeZra4R-SQA+-_Zy5N6XtvTpjPGw@mail.gmail.com Whole thread Raw |
In response to | Re: [PoC] pg_upgrade: allow to upgrade publisher node (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: [PoC] pg_upgrade: allow to upgrade publisher node
Re: [PoC] pg_upgrade: allow to upgrade publisher node |
List | pgsql-hackers |
On Thu, Aug 10, 2023 at 12:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Aug 10, 2023 at 6:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Wed, Aug 9, 2023 at 1:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Wed, Aug 9, 2023 at 8:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > I feel it would be a good idea to provide such a tool for users to > > > avoid getting errors during upgrade but I think the upgrade code still > > > needs to ensure that there are no WAL records between > > > confirm_flush_lsn and SHUTDOWN_CHECKPOINT than required. Or, do you > > > want to say that we don't do any verification check during the upgrade > > > and let the data loss happens if the user didn't ensure that by > > > running such a tool? > > > > I meant that if we can check the slot state file while the old cluster > > stops, we can ensure there are no WAL records between slot's > > confirmed_fluhs_lsn (in the state file) and the latest checkpoint (in > > the control file). > > > > Are you suggesting doing this before we start the old cluster or after > we stop the old cluster? I was thinking about the pros and cons of > doing this check when the server is 'on' (along with other upgrade > checks something like the patch is doing now) versus when the server > is 'off'. I think the advantage of doing it when the server is 'off' > (after check_and_dump_old_cluster()) is that it will be ensured that > there is no extra WAL that could be generated during the upgrade and > has not been verified against confirmed_flush_lsn location. But OTOH, > to retrieve slot information when the server is 'off', we need a > separate utility or probably a functionality for the same in > pg_upgrade and also some WAL reading stuff which sounds to me like a > larger change that may not be warranted here. I think anyway the extra > WAL (if any got generated during the upgrade) won't be required after > the upgrade so not convinced to make such a check while the server is > 'off'. Are there reasons which make it better to do this while the old > cluster is 'off'? What I imagined is that we do this check before check_and_dump_old_cluster() while the server is 'off'. Reading the slot state file would be simple and I guess we would not need a tool or cli program for that. We need to expose RepliactionSlotOnDisk, though. After reading the control file and the slots' state files we check if slot's confirmed_flush_lsn matches the latest checkpoint LSN in the control file (BTW maybe we can get slot name and plugin name here instead of using pg_dump?). Extra WAL records could be generated only after this check, so we wouldn't need to worry about that for slots for logical replication. As for non-logical replication slots, we would need some WAL reading stuff, but I'm not sure we need it for the first commit. Or another idea would be to allow users to mark replication slots "upgradable" so that pg_upgrade skips the confirmed_flush_lsn check. BTW this check would not be able to support live-check but I think it's not a problem as this check with a running server will never be able to pass. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: