Re: Impact of checkpointer during pg_upgrade - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Impact of checkpointer during pg_upgrade |
Date | |
Msg-id | CAA4eK1J8YGQCD9Z3665XPWqpQUmrHTNaPXJhmW0VbvGhtfUScg@mail.gmail.com Whole thread Raw |
In response to | Re: Impact of checkpointer during pg_upgrade (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: Impact of checkpointer during pg_upgrade
|
List | pgsql-hackers |
On Mon, Sep 4, 2023 at 10:33 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Mon, Sep 4, 2023 at 8:41 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Sat, Sep 2, 2023 at 6:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Sat, Sep 2, 2023 at 10:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > The other possibilities apart from not allowing an upgrade in such a > > > > case could be (a) Before starting the old cluster, we fetch the slots > > > > directly from the disk using some tool like [2] and make the decisions > > > > based on that state; > > > > > > Okay, so IIUC along with dumping the slot data we also need to dump > > > the latest checkpoint LSN because during upgrade we do check that the > > > confirmed flush lsn for all the slots should be the same as the latest > > > checkpoint. Yeah but I think we could work this out. > > > > > We already have the latest checkpoint LSN information from > > pg_controldata. I think we can use that as the patch proposed in the > > thread [1] is doing now. Do you have something else in mind? > > I think I did not understood the complete proposal. And what I meant > is that if we dump the slot before we start the cluster thats fine. > But then later after starting the old cluster if we run some query > like we do in check_old_cluster_for_valid_slots() then thats not > right, because there is gap between the status of the slots what we > dumped before starting the cluster and what we are checking after the > cluster, so there is not point of that check right? > That's right but if we do read slots from disk, we preserve those in the memory and use that information instead of querying it again in check_old_cluster_for_valid_slots(). > > > (b) During the upgrade, we don't allow WAL to be > > > > removed if it can invalidate slots; (c) Copy/Migrate the invalid slots > > > > as well but for that, we need to expose an API to invalidate the > > > > slots; > > > > > > (d) somehow distinguish the slots that are invalidated during > > > > an upgrade and then simply copy such slots because anyway we ensure > > > > that all the WAL required by slot is sent before shutdown. > > > > > > Yeah this could also be an option, although we need to think the > > > mechanism of distinguishing those slots looks clean and fit well with > > > other architecture. > > > > > > > If we want to do this we probably need to maintain a flag in the slot > > indicating that it was invalidated during an upgrade and then use the > > same flag in the upgrade to check the validity of slots. I think such > > a flag needs to be maintained at the same level as > > ReplicationSlotInvalidationCause to avoid any inconsistency among > > those. > > I think we can do better, like we can just read the latest > checkpoint's LSN before starting the old cluster. And now while > checking the slot can't we check if the the slot is invalidated then > their confirmed_flush_lsn >= the latest_checkpoint_lsn we preserved > before starting the cluster because if so then those slot might have > got invalidated during the upgrade no? > Isn't that possible only if we update confirmend_flush LSN while invalidating? Otherwise, how the check you are proposing can succeed? > > > > > Alternatively can't we just ignore all the invalidated slots and do > > > not migrate them at all. Because such scenarios are very rare that > > > some of the segments are getting dropped just during the upgrade time > > > and that too from the old cluster so in such cases not migrating the > > > slots which are invalidated should be fine no? > > > > > > > I also think that such a scenario would be very rare but are you > > suggesting to ignore all invalidated slots or just the slots that got > > invalidated during an upgrade? BTW, if we simply ignore invalidated > > slots then users won't be able to drop corresponding subscriptions > > after an upgrade. They need to first use the Alter Subscription > > command to disassociate the slot (by using the command ALTER > > SUBSCRIPTION ... SET (slot_name = NONE)) and then drop the > > subscription similar to what we suggest in other cases as described in > > the Replication Slot Management section in docs [2]. > > Yeah I think thats not the best thing to do. > Right, but OTOH, there is an argument to just document it instead of having additional complexities in the code as such cases should be rare. -- With Regards, Amit Kapila.
pgsql-hackers by date: