Thread: Impact of checkpointer during pg_upgrade
During pg_upgrade, we start the server for the old cluster which can allow the checkpointer to remove the WAL files. It has been noticed that we do generate certain types of WAL records (e.g XLOG_RUNNING_XACTS, XLOG_CHECKPOINT_ONLINE, and XLOG_FPI_FOR_HINT) even during pg_upgrade for old cluster, so additional WAL records could let checkpointer decide that certain WAL segments can be removed (e.g. say wal size crosses max_slot_wal_keep_size_mb) and invalidate the slots. Currently, I can't see any problem with this but for future work where we want to migrate logical slots during an upgrade[1], we need to decide what to do for such cases. The initial idea we had was that if the old cluster has some invalid slots, we won't allow an upgrade unless the user removes such slots or uses some option like --exclude-slots. It is quite possible that slots got invalidated during pg_upgrade due to no user activity. Now, even though the possibility of the same is less I think it is worth considering what should be the behavior. The other possibilities apart from not allowing an upgrade in such a case could be (a) Before starting the old cluster, we fetch the slots directly from the disk using some tool like [2] and make the decisions based on that state; (b) During the upgrade, we don't allow WAL to be removed if it can invalidate slots; (c) Copy/Migrate the invalid slots as well but for that, we need to expose an API to invalidate the slots; (d) somehow distinguish the slots that are invalidated during an upgrade and then simply copy such slots because anyway we ensure that all the WAL required by slot is sent before shutdown. Thoughts? [1] - https://www.postgresql.org/message-id/TYAPR01MB58664C81887B3AF2EB6B16E3F5939%40TYAPR01MB5866.jpnprd01.prod.outlook.com [2] - https://www.postgresql.org/message-id/flat/CALj2ACW0rV5gWK8A3m6_X62qH%2BVfaq5hznC%3Di0R5Wojt5%2Byhyw%40mail.gmail.com -- With Regards, Amit Kapila.
On Sat, Sep 2, 2023 at 10:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > During pg_upgrade, we start the server for the old cluster which can > allow the checkpointer to remove the WAL files. It has been noticed > that we do generate certain types of WAL records (e.g > XLOG_RUNNING_XACTS, XLOG_CHECKPOINT_ONLINE, and XLOG_FPI_FOR_HINT) > even during pg_upgrade for old cluster, so additional WAL records > could let checkpointer decide that certain WAL segments can be removed > (e.g. say wal size crosses max_slot_wal_keep_size_mb) and invalidate > the slots. Currently, I can't see any problem with this but for future > work where we want to migrate logical slots during an upgrade[1], we > need to decide what to do for such cases. The initial idea we had was > that if the old cluster has some invalid slots, we won't allow an > upgrade unless the user removes such slots or uses some option like > --exclude-slots. It is quite possible that slots got invalidated > during pg_upgrade due to no user activity. Now, even though the > possibility of the same is less I think it is worth considering what > should be the behavior. Right > The other possibilities apart from not allowing an upgrade in such a > case could be (a) Before starting the old cluster, we fetch the slots > directly from the disk using some tool like [2] and make the decisions > based on that state; Okay, so IIUC along with dumping the slot data we also need to dump the latest checkpoint LSN because during upgrade we do check that the confirmed flush lsn for all the slots should be the same as the latest checkpoint. Yeah but I think we could work this out. (b) During the upgrade, we don't allow WAL to be > removed if it can invalidate slots; (c) Copy/Migrate the invalid slots > as well but for that, we need to expose an API to invalidate the > slots; (d) somehow distinguish the slots that are invalidated during > an upgrade and then simply copy such slots because anyway we ensure > that all the WAL required by slot is sent before shutdown. Yeah this could also be an option, although we need to think the mechanism of distinguishing those slots looks clean and fit well with other architecture. Alternatively can't we just ignore all the invalidated slots and do not migrate them at all. Because such scenarios are very rare that some of the segments are getting dropped just during the upgrade time and that too from the old cluster so in such cases not migrating the slots which are invalidated should be fine no? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Sat, Sep 2, 2023 at 6:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Sat, Sep 2, 2023 at 10:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > The other possibilities apart from not allowing an upgrade in such a > > case could be (a) Before starting the old cluster, we fetch the slots > > directly from the disk using some tool like [2] and make the decisions > > based on that state; > > Okay, so IIUC along with dumping the slot data we also need to dump > the latest checkpoint LSN because during upgrade we do check that the > confirmed flush lsn for all the slots should be the same as the latest > checkpoint. Yeah but I think we could work this out. > We already have the latest checkpoint LSN information from pg_controldata. I think we can use that as the patch proposed in the thread [1] is doing now. Do you have something else in mind? > (b) During the upgrade, we don't allow WAL to be > > removed if it can invalidate slots; (c) Copy/Migrate the invalid slots > > as well but for that, we need to expose an API to invalidate the > > slots; > > (d) somehow distinguish the slots that are invalidated during > > an upgrade and then simply copy such slots because anyway we ensure > > that all the WAL required by slot is sent before shutdown. > > Yeah this could also be an option, although we need to think the > mechanism of distinguishing those slots looks clean and fit well with > other architecture. > If we want to do this we probably need to maintain a flag in the slot indicating that it was invalidated during an upgrade and then use the same flag in the upgrade to check the validity of slots. I think such a flag needs to be maintained at the same level as ReplicationSlotInvalidationCause to avoid any inconsistency among those. > Alternatively can't we just ignore all the invalidated slots and do > not migrate them at all. Because such scenarios are very rare that > some of the segments are getting dropped just during the upgrade time > and that too from the old cluster so in such cases not migrating the > slots which are invalidated should be fine no? > I also think that such a scenario would be very rare but are you suggesting to ignore all invalidated slots or just the slots that got invalidated during an upgrade? BTW, if we simply ignore invalidated slots then users won't be able to drop corresponding subscriptions after an upgrade. They need to first use the Alter Subscription command to disassociate the slot (by using the command ALTER SUBSCRIPTION ... SET (slot_name = NONE)) and then drop the subscription similar to what we suggest in other cases as described in the Replication Slot Management section in docs [2]. Also, if users really want to continue that subscription by syncing corresponding tables then they can recreate the slots manually and then continue with replication. So, if we want to do this then we will just rely on the current state (at the time we query for them in the old cluster) of slots, and even if they later got invalidated during the upgrade, we will just ignore such invalidations as anyway the required WAL is already copied. [1] - https://www.postgresql.org/message-id/TYAPR01MB58664C81887B3AF2EB6B16E3F5939%40TYAPR01MB5866.jpnprd01.prod.outlook.com [2] - https://www.postgresql.org/docs/devel/logical-replication-subscription.html -- With Regards, Amit Kapila.
On Mon, Sep 4, 2023 at 8:41 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Sep 2, 2023 at 6:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Sat, Sep 2, 2023 at 10:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > The other possibilities apart from not allowing an upgrade in such a > > > case could be (a) Before starting the old cluster, we fetch the slots > > > directly from the disk using some tool like [2] and make the decisions > > > based on that state; > > > > Okay, so IIUC along with dumping the slot data we also need to dump > > the latest checkpoint LSN because during upgrade we do check that the > > confirmed flush lsn for all the slots should be the same as the latest > > checkpoint. Yeah but I think we could work this out. > > > We already have the latest checkpoint LSN information from > pg_controldata. I think we can use that as the patch proposed in the > thread [1] is doing now. Do you have something else in mind? I think I did not understood the complete proposal. And what I meant is that if we dump the slot before we start the cluster thats fine. But then later after starting the old cluster if we run some query like we do in check_old_cluster_for_valid_slots() then thats not right, because there is gap between the status of the slots what we dumped before starting the cluster and what we are checking after the cluster, so there is not point of that check right? > > (b) During the upgrade, we don't allow WAL to be > > > removed if it can invalidate slots; (c) Copy/Migrate the invalid slots > > > as well but for that, we need to expose an API to invalidate the > > > slots; > > > > (d) somehow distinguish the slots that are invalidated during > > > an upgrade and then simply copy such slots because anyway we ensure > > > that all the WAL required by slot is sent before shutdown. > > > > Yeah this could also be an option, although we need to think the > > mechanism of distinguishing those slots looks clean and fit well with > > other architecture. > > > > If we want to do this we probably need to maintain a flag in the slot > indicating that it was invalidated during an upgrade and then use the > same flag in the upgrade to check the validity of slots. I think such > a flag needs to be maintained at the same level as > ReplicationSlotInvalidationCause to avoid any inconsistency among > those. I think we can do better, like we can just read the latest checkpoint's LSN before starting the old cluster. And now while checking the slot can't we check if the the slot is invalidated then their confirmed_flush_lsn >= the latest_checkpoint_lsn we preserved before starting the cluster because if so then those slot might have got invalidated during the upgrade no? > > > Alternatively can't we just ignore all the invalidated slots and do > > not migrate them at all. Because such scenarios are very rare that > > some of the segments are getting dropped just during the upgrade time > > and that too from the old cluster so in such cases not migrating the > > slots which are invalidated should be fine no? > > > > I also think that such a scenario would be very rare but are you > suggesting to ignore all invalidated slots or just the slots that got > invalidated during an upgrade? BTW, if we simply ignore invalidated > slots then users won't be able to drop corresponding subscriptions > after an upgrade. They need to first use the Alter Subscription > command to disassociate the slot (by using the command ALTER > SUBSCRIPTION ... SET (slot_name = NONE)) and then drop the > subscription similar to what we suggest in other cases as described in > the Replication Slot Management section in docs [2]. Yeah I think thats not the best thing to do. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Mon, Sep 4, 2023 at 10:33 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Mon, Sep 4, 2023 at 8:41 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Sat, Sep 2, 2023 at 6:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Sat, Sep 2, 2023 at 10:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > The other possibilities apart from not allowing an upgrade in such a > > > > case could be (a) Before starting the old cluster, we fetch the slots > > > > directly from the disk using some tool like [2] and make the decisions > > > > based on that state; > > > > > > Okay, so IIUC along with dumping the slot data we also need to dump > > > the latest checkpoint LSN because during upgrade we do check that the > > > confirmed flush lsn for all the slots should be the same as the latest > > > checkpoint. Yeah but I think we could work this out. > > > > > We already have the latest checkpoint LSN information from > > pg_controldata. I think we can use that as the patch proposed in the > > thread [1] is doing now. Do you have something else in mind? > > I think I did not understood the complete proposal. And what I meant > is that if we dump the slot before we start the cluster thats fine. > But then later after starting the old cluster if we run some query > like we do in check_old_cluster_for_valid_slots() then thats not > right, because there is gap between the status of the slots what we > dumped before starting the cluster and what we are checking after the > cluster, so there is not point of that check right? > That's right but if we do read slots from disk, we preserve those in the memory and use that information instead of querying it again in check_old_cluster_for_valid_slots(). > > > (b) During the upgrade, we don't allow WAL to be > > > > removed if it can invalidate slots; (c) Copy/Migrate the invalid slots > > > > as well but for that, we need to expose an API to invalidate the > > > > slots; > > > > > > (d) somehow distinguish the slots that are invalidated during > > > > an upgrade and then simply copy such slots because anyway we ensure > > > > that all the WAL required by slot is sent before shutdown. > > > > > > Yeah this could also be an option, although we need to think the > > > mechanism of distinguishing those slots looks clean and fit well with > > > other architecture. > > > > > > > If we want to do this we probably need to maintain a flag in the slot > > indicating that it was invalidated during an upgrade and then use the > > same flag in the upgrade to check the validity of slots. I think such > > a flag needs to be maintained at the same level as > > ReplicationSlotInvalidationCause to avoid any inconsistency among > > those. > > I think we can do better, like we can just read the latest > checkpoint's LSN before starting the old cluster. And now while > checking the slot can't we check if the the slot is invalidated then > their confirmed_flush_lsn >= the latest_checkpoint_lsn we preserved > before starting the cluster because if so then those slot might have > got invalidated during the upgrade no? > Isn't that possible only if we update confirmend_flush LSN while invalidating? Otherwise, how the check you are proposing can succeed? > > > > > Alternatively can't we just ignore all the invalidated slots and do > > > not migrate them at all. Because such scenarios are very rare that > > > some of the segments are getting dropped just during the upgrade time > > > and that too from the old cluster so in such cases not migrating the > > > slots which are invalidated should be fine no? > > > > > > > I also think that such a scenario would be very rare but are you > > suggesting to ignore all invalidated slots or just the slots that got > > invalidated during an upgrade? BTW, if we simply ignore invalidated > > slots then users won't be able to drop corresponding subscriptions > > after an upgrade. They need to first use the Alter Subscription > > command to disassociate the slot (by using the command ALTER > > SUBSCRIPTION ... SET (slot_name = NONE)) and then drop the > > subscription similar to what we suggest in other cases as described in > > the Replication Slot Management section in docs [2]. > > Yeah I think thats not the best thing to do. > Right, but OTOH, there is an argument to just document it instead of having additional complexities in the code as such cases should be rare. -- With Regards, Amit Kapila.
On Mon, Sep 4, 2023 at 11:18 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Sep 4, 2023 at 10:33 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Mon, Sep 4, 2023 at 8:41 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Sat, Sep 2, 2023 at 6:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > On Sat, Sep 2, 2023 at 10:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > The other possibilities apart from not allowing an upgrade in such a > > > > > case could be (a) Before starting the old cluster, we fetch the slots > > > > > directly from the disk using some tool like [2] and make the decisions > > > > > based on that state; > > > > > > > > Okay, so IIUC along with dumping the slot data we also need to dump > > > > the latest checkpoint LSN because during upgrade we do check that the > > > > confirmed flush lsn for all the slots should be the same as the latest > > > > checkpoint. Yeah but I think we could work this out. > > > > > > > We already have the latest checkpoint LSN information from > > > pg_controldata. I think we can use that as the patch proposed in the > > > thread [1] is doing now. Do you have something else in mind? > > > > I think I did not understood the complete proposal. And what I meant > > is that if we dump the slot before we start the cluster thats fine. > > But then later after starting the old cluster if we run some query > > like we do in check_old_cluster_for_valid_slots() then thats not > > right, because there is gap between the status of the slots what we > > dumped before starting the cluster and what we are checking after the > > cluster, so there is not point of that check right? > > > > That's right but if we do read slots from disk, we preserve those in > the memory and use that information instead of querying it again in > check_old_cluster_for_valid_slots(). > > > > > (b) During the upgrade, we don't allow WAL to be > > > > > removed if it can invalidate slots; (c) Copy/Migrate the invalid slots > > > > > as well but for that, we need to expose an API to invalidate the > > > > > slots; > > > > > > > > (d) somehow distinguish the slots that are invalidated during > > > > > an upgrade and then simply copy such slots because anyway we ensure > > > > > that all the WAL required by slot is sent before shutdown. > > > > > > > > Yeah this could also be an option, although we need to think the > > > > mechanism of distinguishing those slots looks clean and fit well with > > > > other architecture. > > > > > > > > > > If we want to do this we probably need to maintain a flag in the slot > > > indicating that it was invalidated during an upgrade and then use the > > > same flag in the upgrade to check the validity of slots. I think such > > > a flag needs to be maintained at the same level as > > > ReplicationSlotInvalidationCause to avoid any inconsistency among > > > those. > > > > I think we can do better, like we can just read the latest > > checkpoint's LSN before starting the old cluster. And now while > > checking the slot can't we check if the the slot is invalidated then > > their confirmed_flush_lsn >= the latest_checkpoint_lsn we preserved > > before starting the cluster because if so then those slot might have > > got invalidated during the upgrade no? > > > > Isn't that possible only if we update confirmend_flush LSN while > invalidating? Otherwise, how the check you are proposing can succeed? I am not suggesting to compare the confirmend_flush_lsn to the latest checkpoint LSN instead I am suggesting that before starting the cluster we get the location of the latest checkpoint LSN that should be the shutdown checkpoint LSN. So now also in [1] we check that confirmed flush lsn should be equal to the latest checkpoint lsn. So the only problem is that after we restart the cluster during the upgrade we might invalidate some of the slots which are perfectly fine to migrate and we want to identify those slots. So if we know the the LSN of the shutdown checkpoint before the cluster started then we can perform a additional checks on all the invalidated slots that their confirmed lsn >= shutdown checkpoint lsn we preserved before restarting the cluster (not the latest checkpoint lsn) then those slots got invalidated only after we started the cluster for upgrade? Is there any loophole in this theory? This theory is based on the assumption that the confirmed flush lsn are not moving forward for the already invalidated slots that means the slot which got invalidated before we shutdown for upgrade will have confirm flush lsn value < shutdown checkpoint and the slots which got invalidated during the upgrade will have confirm flush lsn at least equal to the shutdown checkpoint. [1] https://www.postgresql.org/message-id/TYAPR01MB5866F7D8ED15BA1E8E4A2AB0F5E4A%40TYAPR01MB5866.jpnprd01.prod.outlook.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Mon, Sep 4, 2023 at 1:41 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > I think we can do better, like we can just read the latest > > > checkpoint's LSN before starting the old cluster. And now while > > > checking the slot can't we check if the the slot is invalidated then > > > their confirmed_flush_lsn >= the latest_checkpoint_lsn we preserved > > > before starting the cluster because if so then those slot might have > > > got invalidated during the upgrade no? > > > > > > > Isn't that possible only if we update confirmend_flush LSN while > > invalidating? Otherwise, how the check you are proposing can succeed? > > I am not suggesting to compare the confirmend_flush_lsn to the latest > checkpoint LSN instead I am suggesting that before starting the > cluster we get the location of the latest checkpoint LSN that should > be the shutdown checkpoint LSN. So now also in [1] we check that > confirmed flush lsn should be equal to the latest checkpoint lsn. So > the only problem is that after we restart the cluster during the > upgrade we might invalidate some of the slots which are perfectly fine > to migrate and we want to identify those slots. So if we know the the > LSN of the shutdown checkpoint before the cluster started then we can > perform a additional checks on all the invalidated slots that their > confirmed lsn >= shutdown checkpoint lsn we preserved before > restarting the cluster (not the latest checkpoint lsn) then those > slots got invalidated only after we started the cluster for upgrade? > Is there any loophole in this theory? This theory is based on the > assumption that the confirmed flush lsn are not moving forward for the > already invalidated slots that means the slot which got invalidated > before we shutdown for upgrade will have confirm flush lsn value < > shutdown checkpoint and the slots which got invalidated during the > upgrade will have confirm flush lsn at least equal to the shutdown > checkpoint. Said that there is a possibility that some of the slots which got invalidated even on the previous checkpoint might get the same LSN as the slot which got invalidated later if there is no activity between these two checkpoints. So if we go with this approach then there is some risk of migrating some of the slots which were already invalidated even before the shutdown checkpoint. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Mon, Sep 4, 2023 at 4:19 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > Said that there is a possibility that some of the slots which got > invalidated even on the previous checkpoint might get the same LSN as > the slot which got invalidated later if there is no activity between > these two checkpoints. So if we go with this approach then there is > some risk of migrating some of the slots which were already > invalidated even before the shutdown checkpoint. > I think even during the shutdown checkpoint, after writing shutdown checkpoint WAL, we can invalidate some slots that in theory are safe to migrate/copy because all the WAL for those slots would also have been sent. So, those would be similar to what we invalidate during the upgrade, no? If so, I think it is better to have the same behavior for invalidated slots irrespective of the time it gets invalidated. We can either give an error for such slots during the upgrade (which means disallow the upgrade) or simply ignore such slots during the upgrade. I would prefer ERROR but if we want to ignore such slots, we can probably inform the user in some way about ignored slots, so that she can later drop corresponding subscritions or recreate such slots and do the required sync-up to continue the replication. -- With Regards, Amit Kapila.
On Tue, Sep 5, 2023 at 9:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Sep 4, 2023 at 4:19 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > Said that there is a possibility that some of the slots which got > > invalidated even on the previous checkpoint might get the same LSN as > > the slot which got invalidated later if there is no activity between > > these two checkpoints. So if we go with this approach then there is > > some risk of migrating some of the slots which were already > > invalidated even before the shutdown checkpoint. > > > > I think even during the shutdown checkpoint, after writing shutdown > checkpoint WAL, we can invalidate some slots that in theory are safe > to migrate/copy because all the WAL for those slots would also have > been sent. So, those would be similar to what we invalidate during the > upgrade, no? Thats correct If so, I think it is better to have the same behavior for > invalidated slots irrespective of the time it gets invalidated. We can > either give an error for such slots during the upgrade (which means > disallow the upgrade) or simply ignore such slots during the upgrade. > I would prefer ERROR but if we want to ignore such slots, we can > probably inform the user in some way about ignored slots, so that she > can later drop corresponding subscritions or recreate such slots and > do the required sync-up to continue the replication. Earlier I was thinking that ERRORing out is better so that the user can take necessary action for the invalidated slots and then retry upgrade. But thinking again I could not find what are the advantages of this because if we error out then also users need to restart the old cluster again and have to drop the corresponding subscriptions OTOH if we allow the upgrade by ignoring the slots then also the user has to take similar actions on the new cluster? So what's the advantage of erroring out over upgrading? I see a clear advantage of upgrading is that the user wants to upgrade and that's successful without reattempting. If we say that if we error out and then there are some option for user to salvage those invalidated slots and he can somehow migrate those slot as well by retrying upgrade then it make sense to error out and let user take some action on old cluster, but if all he has to do is to drop the subscription or recreate the slot in both the cases then letting the upgrade pass is better option at least IMHO. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Tue, Sep 5, 2023 at 10:09 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Sep 5, 2023 at 9:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Sep 4, 2023 at 4:19 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > Said that there is a possibility that some of the slots which got > > > invalidated even on the previous checkpoint might get the same LSN as > > > the slot which got invalidated later if there is no activity between > > > these two checkpoints. So if we go with this approach then there is > > > some risk of migrating some of the slots which were already > > > invalidated even before the shutdown checkpoint. > > > > > > > I think even during the shutdown checkpoint, after writing shutdown > > checkpoint WAL, we can invalidate some slots that in theory are safe > > to migrate/copy because all the WAL for those slots would also have > > been sent. So, those would be similar to what we invalidate during the > > upgrade, no? > > Thats correct > > If so, I think it is better to have the same behavior for > > invalidated slots irrespective of the time it gets invalidated. We can > > either give an error for such slots during the upgrade (which means > > disallow the upgrade) or simply ignore such slots during the upgrade. > > I would prefer ERROR but if we want to ignore such slots, we can > > probably inform the user in some way about ignored slots, so that she > > can later drop corresponding subscritions or recreate such slots and > > do the required sync-up to continue the replication. > > Earlier I was thinking that ERRORing out is better so that the user > can take necessary action for the invalidated slots and then retry > upgrade. But thinking again I could not find what are the advantages > of this because if we error out then also users need to restart the > old cluster again and have to drop the corresponding subscriptions > OTOH if we allow the upgrade by ignoring the slots then also the user > has to take similar actions on the new cluster? So what's the > advantage of erroring out over upgrading? > The advantage is that we avoid inconvenience caused to users because Drop Subscription will be unsuccessful as the corresponding slots are not present. So users first need to disassociate slots for the subscription and then drop the subscription. Also, I am not sure leaving behind some slots doesn't have any other impact, otherwise, why don't we drop such slots from time to time after they are marked invalidated during normal operation? If users really want to leave behind such invalidated slots after upgrade, we can even think of providing some option like "exclude_invalid_logical_slots". -- With Regards, Amit Kapila.
On Tue, Sep 5, 2023 at 10:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > Earlier I was thinking that ERRORing out is better so that the user > > can take necessary action for the invalidated slots and then retry > > upgrade. But thinking again I could not find what are the advantages > > of this because if we error out then also users need to restart the > > old cluster again and have to drop the corresponding subscriptions > > OTOH if we allow the upgrade by ignoring the slots then also the user > > has to take similar actions on the new cluster? So what's the > > advantage of erroring out over upgrading? > > > > The advantage is that we avoid inconvenience caused to users because > Drop Subscription will be unsuccessful as the corresponding slots are > not present. So users first need to disassociate slots for the > subscription and then drop the subscription. Yeah that's a valid argument for erroring out. Also, I am not sure > leaving behind some slots doesn't have any other impact, otherwise, > why don't we drop such slots from time to time after they are marked > invalidated during normal operation? Okay, I am also not sure of that. If users really want to leave > behind such invalidated slots after upgrade, we can even think of > providing some option like "exclude_invalid_logical_slots". +1 -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
(Just dumping what I have in mind while reading the thread.) On Sat, Sep 02, 2023 at 10:08:51AM +0530, Amit Kapila wrote: > During pg_upgrade, we start the server for the old cluster which can > allow the checkpointer to remove the WAL files. It has been noticed > that we do generate certain types of WAL records (e.g > XLOG_RUNNING_XACTS, XLOG_CHECKPOINT_ONLINE, and XLOG_FPI_FOR_HINT) > even during pg_upgrade for old cluster, so additional WAL records > could let checkpointer decide that certain WAL segments can be removed > (e.g. say wal size crosses max_slot_wal_keep_size_mb) and invalidate > the slots. Currently, I can't see any problem with this but for future > work where we want to migrate logical slots during an upgrade[1], we > need to decide what to do for such cases. The initial idea we had was > that if the old cluster has some invalid slots, we won't allow an > upgrade unless the user removes such slots or uses some option like > --exclude-slots. It is quite possible that slots got invalidated > during pg_upgrade due to no user activity. Now, even though the > possibility of the same is less I think it is worth considering what > should be the behavior. > > The other possibilities apart from not allowing an upgrade in such a > case could be (a) Before starting the old cluster, we fetch the slots > directly from the disk using some tool like [2] and make the decisions > based on that state; (b) During the upgrade, we don't allow WAL to be > removed if it can invalidate slots; (c) Copy/Migrate the invalid slots > as well but for that, we need to expose an API to invalidate the > slots; (d) somehow distinguish the slots that are invalidated during > an upgrade and then simply copy such slots because anyway we ensure > that all the WAL required by slot is sent before shutdown. Checking for any invalid slots at the beginning of the upgrade and complain sounds like a good thing to do, because these are not expected to begin with, no? That's similar to a pre-check requirement that the slots should have fed the subscribers with all the data available up to the shutdown checkpoint when the publisher was stopped before running pg_upgrade. So (a) is a good idea to prevent an upgrade based on a state we don't expect from the start, as something in check.c, I assume. On top of that, (b) sounds like a good idea to me anyway to be more defensive. But couldn't we do that just like we do for autovacuum and force the GUCs that could remove the slot's WAL to not do their work instead? An upgrade is a special state of the cluster, and I'm not much into painting more checks based on IsBinaryUpgrade to prevent WAL segments to be removed while we can have a full control of what we want with just the GUCs that force the hand of the slots. That just seems simpler to me, and the WAL recycling logic is already complex particularly with all the GUCs popping lately to force some conditions to do the WAL recycling and/or removal. During the upgrade, if some segments are removed and some of the slots we expect to still be valid once the upgrade is done are marked as invalid and become unusable, I think that we should just copy these slots, but also update their state data so as they can still be used with what we expect, as these could be wanted by the subscribers. That's what you mean with (d), I assume. Do you think that it would be possible to force the slot's data on the publishers so as they use a local LSN based on the new WAL we are resetting at? At least that seems more friendly to me as this limits the set of manipulations to do on the slots for the end-user. The protection from (b) ought to be enough, in itself. (c) overlaps with (a), especially if we want to be able to read or write some of the slot's data offline. -- Michael
Attachment
On Thu, Sep 7, 2023 at 12:55 PM Michael Paquier <michael@paquier.xyz> wrote: > > (Just dumping what I have in mind while reading the thread.) > > On Sat, Sep 02, 2023 at 10:08:51AM +0530, Amit Kapila wrote: > > During pg_upgrade, we start the server for the old cluster which can > > allow the checkpointer to remove the WAL files. It has been noticed > > that we do generate certain types of WAL records (e.g > > XLOG_RUNNING_XACTS, XLOG_CHECKPOINT_ONLINE, and XLOG_FPI_FOR_HINT) > > even during pg_upgrade for old cluster, so additional WAL records > > could let checkpointer decide that certain WAL segments can be removed > > (e.g. say wal size crosses max_slot_wal_keep_size_mb) and invalidate > > the slots. Currently, I can't see any problem with this but for future > > work where we want to migrate logical slots during an upgrade[1], we > > need to decide what to do for such cases. The initial idea we had was > > that if the old cluster has some invalid slots, we won't allow an > > upgrade unless the user removes such slots or uses some option like > > --exclude-slots. It is quite possible that slots got invalidated > > during pg_upgrade due to no user activity. Now, even though the > > possibility of the same is less I think it is worth considering what > > should be the behavior. > > > > The other possibilities apart from not allowing an upgrade in such a > > case could be (a) Before starting the old cluster, we fetch the slots > > directly from the disk using some tool like [2] and make the decisions > > based on that state; (b) During the upgrade, we don't allow WAL to be > > removed if it can invalidate slots; (c) Copy/Migrate the invalid slots > > as well but for that, we need to expose an API to invalidate the > > slots; (d) somehow distinguish the slots that are invalidated during > > an upgrade and then simply copy such slots because anyway we ensure > > that all the WAL required by slot is sent before shutdown. > > Checking for any invalid slots at the beginning of the upgrade and > complain sounds like a good thing to do, because these are not > expected to begin with, no? That's similar to a pre-check requirement > that the slots should have fed the subscribers with all the data > available up to the shutdown checkpoint when the publisher was stopped > before running pg_upgrade. So (a) is a good idea to prevent an > upgrade based on a state we don't expect from the start, as something > in check.c, I assume. > > On top of that, (b) sounds like a good idea to me anyway to be more > defensive. But couldn't we do that just like we do for autovacuum and > force the GUCs that could remove the slot's WAL to not do their work > instead? > I think if we just make max_slot_wal_keep_size to -1 that should be sufficient to not let any slots get invalidated during upgrade. Do you have anything else in mind? An upgrade is a special state of the cluster, and I'm not > much into painting more checks based on IsBinaryUpgrade to prevent WAL > segments to be removed while we can have a full control of what we > want with just the GUCs that force the hand of the slots. That just > seems simpler to me, and the WAL recycling logic is already complex > particularly with all the GUCs popping lately to force some conditions > to do the WAL recycling and/or removal. > > During the upgrade, if some segments are removed and some of the slots > we expect to still be valid once the upgrade is done are marked as > invalid and become unusable, I think that we should just copy these > slots, but also update their state data so as they can still be used > with what we expect, as these could be wanted by the subscribers. > That's what you mean with (d), I assume. Do you think that it would > be possible to force the slot's data on the publishers so as they use > a local LSN based on the new WAL we are resetting at? > Yes. > > At least that > seems more friendly to me as this limits the set of manipulations to > do on the slots for the end-user. The protection from (b) ought to be > enough, in itself. (c) overlaps with (a), especially if we want to be > able to read or write some of the slot's data offline. > If we do (b) either via GUCs or IsBinaryUpgrade check we don't need to do any of (a), (b), or (d). I feel that would be a minimal and sufficient fix to prevent any side impact of checkpointer on slots during an upgrade. -- With Regards, Amit Kapila.
On Thu, Sep 7, 2023 at 3:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Sep 7, 2023 at 12:55 PM Michael Paquier <michael@paquier.xyz> wrote: > > > > (Just dumping what I have in mind while reading the thread.) > > > > On Sat, Sep 02, 2023 at 10:08:51AM +0530, Amit Kapila wrote: > > > During pg_upgrade, we start the server for the old cluster which can > > > allow the checkpointer to remove the WAL files. It has been noticed > > > that we do generate certain types of WAL records (e.g > > > XLOG_RUNNING_XACTS, XLOG_CHECKPOINT_ONLINE, and XLOG_FPI_FOR_HINT) > > > even during pg_upgrade for old cluster, so additional WAL records > > > could let checkpointer decide that certain WAL segments can be removed > > > (e.g. say wal size crosses max_slot_wal_keep_size_mb) and invalidate > > > the slots. Currently, I can't see any problem with this but for future > > > work where we want to migrate logical slots during an upgrade[1], we > > > need to decide what to do for such cases. The initial idea we had was > > > that if the old cluster has some invalid slots, we won't allow an > > > upgrade unless the user removes such slots or uses some option like > > > --exclude-slots. It is quite possible that slots got invalidated > > > during pg_upgrade due to no user activity. Now, even though the > > > possibility of the same is less I think it is worth considering what > > > should be the behavior. > > > > > > The other possibilities apart from not allowing an upgrade in such a > > > case could be (a) Before starting the old cluster, we fetch the slots > > > directly from the disk using some tool like [2] and make the decisions > > > based on that state; (b) During the upgrade, we don't allow WAL to be > > > removed if it can invalidate slots; (c) Copy/Migrate the invalid slots > > > as well but for that, we need to expose an API to invalidate the > > > slots; (d) somehow distinguish the slots that are invalidated during > > > an upgrade and then simply copy such slots because anyway we ensure > > > that all the WAL required by slot is sent before shutdown. > > > > Checking for any invalid slots at the beginning of the upgrade and > > complain sounds like a good thing to do, because these are not > > expected to begin with, no? That's similar to a pre-check requirement > > that the slots should have fed the subscribers with all the data > > available up to the shutdown checkpoint when the publisher was stopped > > before running pg_upgrade. So (a) is a good idea to prevent an > > upgrade based on a state we don't expect from the start, as something > > in check.c, I assume. > > > > On top of that, (b) sounds like a good idea to me anyway to be more > > defensive. But couldn't we do that just like we do for autovacuum and > > force the GUCs that could remove the slot's WAL to not do their work > > instead? > > > > I think if we just make max_slot_wal_keep_size to -1 that should be > sufficient to not let any slots get invalidated during upgrade. Do you > have anything else in mind? This seems like a good solution to the problem. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Thu, Sep 07, 2023 at 03:33:52PM +0530, Amit Kapila wrote: > I think if we just make max_slot_wal_keep_size to -1 that should be > sufficient to not let any slots get invalidated during upgrade. Do you > have anything else in mind? Forcing wal_keep_size while on it may be a good thing. > If we do (b) either via GUCs or IsBinaryUpgrade check we don't need to > do any of (a), (b), or (d). I feel that would be a minimal and > sufficient fix to prevent any side impact of checkpointer on slots > during an upgrade. I could get into the addition of a post-upgrade check to make sure that nothing got invalidated while the upgrade was running, FWIW. -- Michael
Attachment
On Fri, Sep 8, 2023 at 5:37 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Sep 07, 2023 at 03:33:52PM +0530, Amit Kapila wrote: > > I think if we just make max_slot_wal_keep_size to -1 that should be > > sufficient to not let any slots get invalidated during upgrade. Do you > > have anything else in mind? > > Forcing wal_keep_size while on it may be a good thing. > I had thought about it but couldn't come up with a reason to force wal_keep_size for this purpose. > > If we do (b) either via GUCs or IsBinaryUpgrade check we don't need to > > do any of (a), (b), or (d). I feel that would be a minimal and > > sufficient fix to prevent any side impact of checkpointer on slots > > during an upgrade. > > I could get into the addition of a post-upgrade check to make sure > that nothing got invalidated while the upgrade was running, FWIW. > This validation tries to ensure that we don't have any bugs/issues in our patch. It may be a candidate for assert but I feel even if we encounter any bug it is better to fix the bug. -- With Regards, Amit Kapila.
On Fri, Sep 08, 2023 at 08:18:14AM +0530, Amit Kapila wrote: > This validation tries to ensure that we don't have any bugs/issues in > our patch. It may be a candidate for assert but I feel even if we > encounter any bug it is better to fix the bug. My guess is that an elog-like error is more adapted so as we are able to detect problems in more cases, but perhaps an assert may be enough for the buildfarm. If there is anything in the backend that causes slots to become invalidated, I agree that any issue causing that should be fixed, but isn't the point different here? Having a check at the end of an upgrade is a mean to improve the detection rate of bugs where slots get invalidated, so it is actually helpful to have one anyway? I am not sure what is your strategy here, do you mean to keep a check at the end of pg_upgrade only in the patch to validate it? Or do you mean to add something in pg_upgrade as part of the feature? I mean that doing the latter is benefitial for the sake of any patch committed and as a long-term method to rely on. -- Michael
Attachment
On Friday, September 8, 2023 10:58 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Sep 08, 2023 at 08:18:14AM +0530, Amit Kapila wrote: > > This validation tries to ensure that we don't have any bugs/issues in > > our patch. It may be a candidate for assert but I feel even if we > > encounter any bug it is better to fix the bug. > > My guess is that an elog-like error is more adapted so as we are able to detect > problems in more cases, but perhaps an assert may be enough for the > buildfarm. If there is anything in the backend that causes slots to become > invalidated, I agree that any issue causing that should be fixed, but isn't the > point different here? Having a check at the end of an upgrade is a mean to > improve the detection rate of bugs where slots get invalidated, so it is actually > helpful to have one anyway? I am not sure what is your strategy here, do you > mean to keep a check at the end of pg_upgrade only in the patch to validate it? > Or do you mean to add something in pg_upgrade as part of the feature? I > mean that doing the latter is benefitial for the sake of any patch committed and > as a long-term method to rely on. I feel adding a check at pg_upgrade may not 100% detect the slot invalidation if we check by querying the old cluster to get the slot info, because the invalidation can happen before the first time we fetch the info or after the last time we fetch the info(e.g. shutdown checkpoint could also invalidate slots) Personally, I think if we really want to add a check, it might be better to put it at server side, Like: reporting an ERROR at server side when invalidating the slot(InvalidatePossiblyObsoleteSlot) if in upgrade mode. Having said that I feel it's fine if we don't add this check as setting max_slot_wal_keep_size to -1 looks sufficient. Best Regards, Hou zj
On Fri, Sep 08, 2023 at 03:30:23AM +0000, Zhijie Hou (Fujitsu) wrote: > I feel adding a check at pg_upgrade may not 100% detect the slot invalidation > if we check by querying the old cluster to get the slot info, because the > invalidation can happen before the first time we fetch the info or after the > last time we fetch the info(e.g. shutdown checkpoint could also invalidate > slots) > > Personally, I think if we really want to add a check, it might be better to put > it at server side, Like: reporting an ERROR at server side when invalidating > the slot(InvalidatePossiblyObsoleteSlot) if in upgrade mode. Yeah, that may be enough to paint one isBinaryUpgrade in the invalidation path of the backend, with en elog(ERROR) as that would be an unexpected state. > Having said that I feel it's fine if we don't add this check as setting > max_slot_wal_keep_size to -1 looks sufficient. I would do both, FWIW, to stay on the safe side. And both are non-invasive. -- Michael
Attachment
On Fri, Sep 8, 2023 at 9:00 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > On Friday, September 8, 2023 10:58 AM Michael Paquier <michael@paquier.xyz> wrote: > > > > On Fri, Sep 08, 2023 at 08:18:14AM +0530, Amit Kapila wrote: > > > This validation tries to ensure that we don't have any bugs/issues in > > > our patch. It may be a candidate for assert but I feel even if we > > > encounter any bug it is better to fix the bug. > > > > My guess is that an elog-like error is more adapted so as we are able to detect > > problems in more cases, but perhaps an assert may be enough for the > > buildfarm. If there is anything in the backend that causes slots to become > > invalidated, I agree that any issue causing that should be fixed, but isn't the > > point different here? Having a check at the end of an upgrade is a mean to > > improve the detection rate of bugs where slots get invalidated, so it is actually > > helpful to have one anyway? I am not sure what is your strategy here, do you > > mean to keep a check at the end of pg_upgrade only in the patch to validate it? > > Or do you mean to add something in pg_upgrade as part of the feature? > > We can do whatever the consensus is but I feel such an end-check to some extent is only helpful for the testing of a patch before the commit but not otherwise. > > I > > mean that doing the latter is benefitial for the sake of any patch committed and > > as a long-term method to rely on. > What is your worry here? Are you worried that unknowingly in the future we could add some other way to invalidate slots during upgrades that we won't be able to detect? > I feel adding a check at pg_upgrade may not 100% detect the slot invalidation > if we check by querying the old cluster to get the slot info, because the > invalidation can happen before the first time we fetch the info or after the > last time we fetch the info(e.g. shutdown checkpoint could also invalidate > slots) > > Personally, I think if we really want to add a check, it might be better to put > it at server side, Like: reporting an ERROR at server side when invalidating > the slot(InvalidatePossiblyObsoleteSlot) if in upgrade mode. > I don't know whether we really need to be extra careful in this case, the same could be said about other consistency checks on the old cluster. -- With Regards, Amit Kapila.
On Fri, Sep 08, 2023 at 09:14:59AM +0530, Amit Kapila wrote: > On Fri, Sep 8, 2023 at 9:00 AM Zhijie Hou (Fujitsu) > <houzj.fnst@fujitsu.com> wrote: >>> I >>> mean that doing the latter is benefitial for the sake of any patch committed and >>> as a long-term method to rely on. > > What is your worry here? Are you worried that unknowingly in the > future we could add some other way to invalidate slots during upgrades > that we won't be able to detect? Exactly. A safety belt would not hurt, especially if the belt added is simple. The idea of a backend side elog(ERROR) with isBinaryUpgrade is tempting in the invalidation slot path. -- Michael
Attachment
On Fri, Sep 8, 2023 at 10:10 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Sep 08, 2023 at 09:14:59AM +0530, Amit Kapila wrote: > > On Fri, Sep 8, 2023 at 9:00 AM Zhijie Hou (Fujitsu) > > <houzj.fnst@fujitsu.com> wrote: > >>> I > >>> mean that doing the latter is benefitial for the sake of any patch committed and > >>> as a long-term method to rely on. > > > > What is your worry here? Are you worried that unknowingly in the > > future we could add some other way to invalidate slots during upgrades > > that we won't be able to detect? > > Exactly. A safety belt would not hurt, especially if the belt added > is simple. The idea of a backend side elog(ERROR) with > isBinaryUpgrade is tempting in the invalidation slot path. > I agree with doing something simple. So, to conclude, we agree on two things in this thread (a) Use max_slot_wal_keep_size to -1 to start postmaster for the old cluster during the upgrade; (b) Have an elog(ERROR) to avoid invalidating slots during the upgrade. -- With Regards, Amit Kapila.
On Fri, Sep 08, 2023 at 11:59:41AM +0530, Amit Kapila wrote: > I agree with doing something simple. So, to conclude, we agree on two > things in this thread (a) Use max_slot_wal_keep_size to -1 to start > postmaster for the old cluster during the upgrade; (b) Have an > elog(ERROR) to avoid invalidating slots during the upgrade. +1. -- Michael
Attachment
On Fri, Sep 8, 2023 at 11:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Sep 8, 2023 at 10:10 AM Michael Paquier <michael@paquier.xyz> wrote: > > > > On Fri, Sep 08, 2023 at 09:14:59AM +0530, Amit Kapila wrote: > > > On Fri, Sep 8, 2023 at 9:00 AM Zhijie Hou (Fujitsu) > > > <houzj.fnst@fujitsu.com> wrote: > > >>> I > > >>> mean that doing the latter is benefitial for the sake of any patch committed and > > >>> as a long-term method to rely on. > > > > > > What is your worry here? Are you worried that unknowingly in the > > > future we could add some other way to invalidate slots during upgrades > > > that we won't be able to detect? > > > > Exactly. A safety belt would not hurt, especially if the belt added > > is simple. The idea of a backend side elog(ERROR) with > > isBinaryUpgrade is tempting in the invalidation slot path. > > > > I agree with doing something simple. So, to conclude, we agree on two > things in this thread (a) Use max_slot_wal_keep_size to -1 to start > postmaster for the old cluster during the upgrade; (b) Have an > elog(ERROR) to avoid invalidating slots during the upgrade. +1 -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com