Thread: A recent message added to pg_upgade
Hello. Some messages recently introduced by commit 29d0a77fa6 seem to deviate slightly from our standards. + if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade) + { + ereport(ERROR, + errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("replication slots must not be invalidated during the upgrade"), + errhint("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade")); The message for errhint is not a complete sentence. And errmsg is not in telegraph style. The first attached makes minimum changes. However, if allowed, I'd like to propose an alternative set of messages as follows: + errmsg("replication slot is invalidated during upgrade"), + errhint("Set \"max_slot_wal_keep_size\" to -1 to avoid invalidation.")); The second attached does this. What do you think about those? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Attachment
On Fri, Oct 27, 2023 at 8:28 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > Hello. > > Some messages recently introduced by commit 29d0a77fa6 seem to deviate > slightly from our standards. > > + if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade) > + { > + ereport(ERROR, > + errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("replication slots must not be invalidated during the upgrade"), > + errhint("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade")); > > The message for errhint is not a complete sentence. Yeah, the hint message should have ended with a period - https://www.postgresql.org/docs/current/error-style-guide.html#ERROR-STYLE-GUIDE-GRAMMAR-PUNCTUATION. > The second attached does this. > > What do you think about those? + errmsg("replication slot is invalidated during upgrade"), + errhint("Set \"max_slot_wal_keep_size\" to -1 to avoid invalidation.")); } The above errhint LGTM. How about a slightly different errmsg, like the following? + errmsg("cannot invalidate replication slots when in binary upgrade mode"), + errhint("Set \"max_slot_wal_keep_size\" to -1 to avoid invalidation.")); ".... when in binary upgrade mode" is being used in many places. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Fri, Oct 27, 2023 at 8:52 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Fri, Oct 27, 2023 at 8:28 AM Kyotaro Horiguchi: > The above errhint LGTM. How about a slightly different errmsg, like > the following? > > + errmsg("cannot invalidate replication slots when > in binary upgrade mode"), > + errhint("Set \"max_slot_wal_keep_size\" to -1 to > avoid invalidation.")); > > ".... when in binary upgrade mode" is being used in many places. > By this time slot may be already invalidated, so how about: "replication slot was invalidated when in binary upgrade mode"? -- With Regards, Amit Kapila.
On Fri, Oct 27, 2023 at 1:58 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > Hello. > > Some messages recently introduced by commit 29d0a77fa6 seem to deviate > slightly from our standards. > > + if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade) > + { > + ereport(ERROR, > + errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("replication slots must not be invalidated during the upgrade"), > + errhint("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade")); > > The message for errhint is not a complete sentence. And errmsg is not > in telegraph style. The first attached makes minimum changes. > > However, if allowed, I'd like to propose an alternative set of > messages as follows: > > + errmsg("replication slot is invalidated during upgrade"), > + errhint("Set \"max_slot_wal_keep_size\" to -1 to avoid invalidation.")); > > The second attached does this. > > What do you think about those? > IIUC the only possible way to reach this error (according to the comment preceding it) is by the user overriding the GUC value (which was defaulted -1) on the command line. + /* + * The logical replication slots shouldn't be invalidated as + * max_slot_wal_keep_size GUC is set to -1 during the upgrade. + * + * The following is just a sanity check. + */ Given that, I felt a more relevant msg/hint might be like: errmsg("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade"), errhint("Do not override \"max_slot_wal_keep_size\" using command line options.")); ====== Kind Regards, Peter Smith. Fujitsu Australia
On Fri, Oct 27, 2023 at 9:37 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Fri, Oct 27, 2023 at 1:58 PM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > > > Hello. > > > > Some messages recently introduced by commit 29d0a77fa6 seem to deviate > > slightly from our standards. > > > > + if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade) > > + { > > + ereport(ERROR, > > + errcode(ERRCODE_INVALID_PARAMETER_VALUE), > > + errmsg("replication slots must not be invalidated during the upgrade"), > > + errhint("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade")); > > > > The message for errhint is not a complete sentence. And errmsg is not > > in telegraph style. The first attached makes minimum changes. > > > > However, if allowed, I'd like to propose an alternative set of > > messages as follows: > > > > + errmsg("replication slot is invalidated during upgrade"), > > + errhint("Set \"max_slot_wal_keep_size\" to -1 to avoid invalidation.")); > > > > The second attached does this. > > > > What do you think about those? > > > > IIUC the only possible way to reach this error (according to the > comment preceding it) is by the user overriding the GUC value (which > was defaulted -1) on the command line. > Yeah, this is my understanding as well. > + /* > + * The logical replication slots shouldn't be invalidated as > + * max_slot_wal_keep_size GUC is set to -1 during the upgrade. > + * > + * The following is just a sanity check. > + */ > > Given that, I felt a more relevant msg/hint might be like: > > errmsg("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade"), > errhint("Do not override \"max_slot_wal_keep_size\" using command line > options.")); > But OTOH, we don't have a value of user-passed options to ensure that. So, how about a slightly different message: "This can be caused by overriding \"max_slot_wal_keep_size\" using command line options." or something along those lines? I see a somewhat similar message in the existing code (errhint("This can be caused ...")). -- With Regards, Amit Kapila.
On Fri, Oct 27, 2023 at 9:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Oct 27, 2023 at 8:52 AM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > On Fri, Oct 27, 2023 at 8:28 AM Kyotaro Horiguchi: > > The above errhint LGTM. How about a slightly different errmsg, like > > the following? > > > > + errmsg("cannot invalidate replication slots when > > in binary upgrade mode"), > > + errhint("Set \"max_slot_wal_keep_size\" to -1 to > > avoid invalidation.")); > > > > ".... when in binary upgrade mode" is being used in many places. > > > > By this time slot may be already invalidated, so how about: > "replication slot was invalidated when in binary upgrade mode"? In this error spot, the is invalidated in memory but the invalidated state is not persisted to disk which happens after somewhere later: else { /* * We hold the slot now and have already invalidated it; flush it * to ensure that state persists. * -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Fri, Oct 27, 2023 at 9:52 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > errmsg("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade"), > > errhint("Do not override \"max_slot_wal_keep_size\" using command line > > options.")); > > > > But OTOH, we don't have a value of user-passed options to ensure that. > So, how about a slightly different message: "This can be caused by > overriding \"max_slot_wal_keep_size\" using command line options." or > something along those lines? I see a somewhat similar message in the > existing code (errhint("This can be caused ...")). I get it. I think having errdetail explaining the possible cause of the error is wanted here, something like: errmsg("cannot invalidate replication slots when in binary upgrade mode"), errdetail("This can be caused by overriding \"max_slot_wal_keep_size\" using command line options.")); errhint("Do not override or set \"max_slot_wal_keep_size\" to -1 .")); Thoughts? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
At Fri, 27 Oct 2023 09:51:43 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in > On Fri, Oct 27, 2023 at 9:37 AM Peter Smith <smithpb2250@gmail.com> wrote: > > IIUC the only possible way to reach this error (according to the > > comment preceding it) is by the user overriding the GUC value (which > > was defaulted -1) on the command line. > > > > Yeah, this is my understanding as well. > > > + /* > > + * The logical replication slots shouldn't be invalidated as > > + * max_slot_wal_keep_size GUC is set to -1 during the upgrade. > > + * > > + * The following is just a sanity check. > > + */ > > > > Given that, I felt a more relevant msg/hint might be like: > > > > errmsg("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade"), > > errhint("Do not override \"max_slot_wal_keep_size\" using command line > > options.")); > > > > But OTOH, we don't have a value of user-passed options to ensure that. > So, how about a slightly different message: "This can be caused by > overriding \"max_slot_wal_keep_size\" using command line options." or > something along those lines? I see a somewhat similar message in the > existing code (errhint("This can be caused ...")). The suggested error message looks to me like that of the GUC mechanism. While I don't have the wider picture about the feature, might we consider rejecting the parameter setting? With that modification, this message can be changed to elog one. I believe it's somewhat inconsiderate to point out what shouldn't have been done only after a problem has occurred. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2023-Oct-27, Kyotaro Horiguchi wrote: > @@ -1433,8 +1433,8 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause, > { > ereport(ERROR, > errcode(ERRCODE_INVALID_PARAMETER_VALUE), > - errmsg("replication slots must not be invalidated during the upgrade"), > - errhint("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade")); Hmm, if I read this code right, this error is going to be thrown by the checkpointer while finishing a checkpoint. Fortunately, the checkpoint record has already been written, but I don't know what would happen if this is thrown while trying to write the shutdown checkpoint. Probably nothing terribly good. I don't think this is useful. If the setting is invalid during binary upgrade, let's prevent it from being set at all right from the start of the upgrade process. In InvalidatePossiblyObsoleteSlot() we could have just an Assert() or elog(PANIC). -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
On Fri, Oct 27, 2023 at 2:02 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2023-Oct-27, Kyotaro Horiguchi wrote: > > > @@ -1433,8 +1433,8 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause, > > { > > ereport(ERROR, > > errcode(ERRCODE_INVALID_PARAMETER_VALUE), > > - errmsg("replication slots must not be invalidated during the upgrade"), > > - errhint("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade")); > > Hmm, if I read this code right, this error is going to be thrown by the > checkpointer while finishing a checkpoint. Fortunately, the checkpoint > record has already been written, but I don't know what would happen if > this is thrown while trying to write the shutdown checkpoint. Probably > nothing terribly good. > > I don't think this is useful. If the setting is invalid during binary > upgrade, let's prevent it from being set at all right from the start of > the upgrade process. We are already forcing the required setting "max_slot_wal_keep_size=-1" during the upgrade similar to some of the other settings like "full_page_writes". However, the user can provide an option for "max_slot_wal_keep_size" in which case it will be overridden. Now, I think (a) we can ensure that our setting always takes precedence in this case. The other idea is (b) to parse the user-provided options and check if "max_slot_wal_keep_size" has a value different than expected and raise an error accordingly. Or we can simply (c) document the usage of max_slot_wal_keep_size in the upgrade. I am not sure whether it is worth complicating the code for this as the user shouldn't be using such an option during the upgrade. So, I think doing (a) and (c) could be simpler. > > In InvalidatePossiblyObsoleteSlot() we could have > just an Assert() or elog(PANIC). > Yeah, we can change to either of those. -- With Regards, Amit Kapila.
At Fri, 27 Oct 2023 14:57:10 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in > On Fri, Oct 27, 2023 at 2:02 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > On 2023-Oct-27, Kyotaro Horiguchi wrote: > > > > > @@ -1433,8 +1433,8 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause, > > > { > > > ereport(ERROR, > > > errcode(ERRCODE_INVALID_PARAMETER_VALUE), > > > - errmsg("replication slots must not be invalidated during the upgrade"), > > > - errhint("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade")); > > > > Hmm, if I read this code right, this error is going to be thrown by the > > checkpointer while finishing a checkpoint. Fortunately, the checkpoint > > record has already been written, but I don't know what would happen if > > this is thrown while trying to write the shutdown checkpoint. Probably > > nothing terribly good. > > > > I don't think this is useful. If the setting is invalid during binary > > upgrade, let's prevent it from being set at all right from the start of > > the upgrade process. > > We are already forcing the required setting > "max_slot_wal_keep_size=-1" during the upgrade similar to some of the > other settings like "full_page_writes". However, the user can provide > an option for "max_slot_wal_keep_size" in which case it will be > overridden. Now, I think (a) we can ensure that our setting always > takes precedence in this case. The other idea is (b) to parse the > user-provided options and check if "max_slot_wal_keep_size" has a > value different than expected and raise an error accordingly. Or we > can simply (c) document the usage of max_slot_wal_keep_size in the > upgrade. I am not sure whether it is worth complicating the code for > this as the user shouldn't be using such an option during the upgrade. > So, I think doing (a) and (c) could be simpler. > > > > In InvalidatePossiblyObsoleteSlot() we could have > > just an Assert() or elog(PANIC). > > > > Yeah, we can change to either of those. This discussion seems like a bit off from my point. I suggested adding a check for that setting when IsBinaryUpgraded is true at the GUC level as shown in the attached patch. I believe Álvaro made a similar suggestion. While the error message is somewhat succinct, I think it is sufficient given the low possilibility of the scenario and the fact that it cannot occur inadvertently. -- Kyotaro Horiguchi NTT Open Source Software Center
Attachment
On Mon, Oct 30, 2023 at 7:58 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Fri, 27 Oct 2023 14:57:10 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in > > On Fri, Oct 27, 2023 at 2:02 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > > > On 2023-Oct-27, Kyotaro Horiguchi wrote: > > > > > > > @@ -1433,8 +1433,8 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause, > > > > { > > > > ereport(ERROR, > > > > errcode(ERRCODE_INVALID_PARAMETER_VALUE), > > > > - errmsg("replication slots must not be invalidated during the upgrade"), > > > > - errhint("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade")); > > > > > > Hmm, if I read this code right, this error is going to be thrown by the > > > checkpointer while finishing a checkpoint. Fortunately, the checkpoint > > > record has already been written, but I don't know what would happen if > > > this is thrown while trying to write the shutdown checkpoint. Probably > > > nothing terribly good. > > > > > > I don't think this is useful. If the setting is invalid during binary > > > upgrade, let's prevent it from being set at all right from the start of > > > the upgrade process. > > > > We are already forcing the required setting > > "max_slot_wal_keep_size=-1" during the upgrade similar to some of the > > other settings like "full_page_writes". However, the user can provide > > an option for "max_slot_wal_keep_size" in which case it will be > > overridden. Now, I think (a) we can ensure that our setting always > > takes precedence in this case. The other idea is (b) to parse the > > user-provided options and check if "max_slot_wal_keep_size" has a > > value different than expected and raise an error accordingly. Or we > > can simply (c) document the usage of max_slot_wal_keep_size in the > > upgrade. I am not sure whether it is worth complicating the code for > > this as the user shouldn't be using such an option during the upgrade. > > So, I think doing (a) and (c) could be simpler. > > > > > > In InvalidatePossiblyObsoleteSlot() we could have > > > just an Assert() or elog(PANIC). > > > > > > > Yeah, we can change to either of those. > > This discussion seems like a bit off from my point. I suggested adding > a check for that setting when IsBinaryUpgraded is true at the GUC > level as shown in the attached patch. I believe Álvaro made a similar > suggestion. While the error message is somewhat succinct, I think it > is sufficient given the low possilibility of the scenario and the fact > that it cannot occur inadvertently. > I think we can simply change that error message to assert if we want to go with the check hook idea of yours. BTW, can we add GUC_check_errdetail() with a better message as some of the other check function uses? Also, I guess we can add some comments or at least refer to the existing comments to explain the reason of such a check. -- With Regards, Amit Kapila.
On Monday, October 30, 2023 10:29 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Fri, 27 Oct 2023 14:57:10 +0530, Amit Kapila <amit.kapila16@gmail.com> > wrote in > > On Fri, Oct 27, 2023 at 2:02 PM Alvaro Herrera <alvherre@alvh.no-ip.org> > wrote: > > > > > > On 2023-Oct-27, Kyotaro Horiguchi wrote: > > > > > > > @@ -1433,8 +1433,8 @@ > InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause, > > > > { > > > > ereport(ERROR, > > > > > errcode(ERRCODE_INVALID_PARAMETER_VALUE), > > > > - errmsg("replication slots must not > be invalidated during the upgrade"), > > > > - > errhint("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade")); > > > > > > Hmm, if I read this code right, this error is going to be thrown by > > > the checkpointer while finishing a checkpoint. Fortunately, the > > > checkpoint record has already been written, but I don't know what > > > would happen if this is thrown while trying to write the shutdown > > > checkpoint. Probably nothing terribly good. > > > > > > I don't think this is useful. If the setting is invalid during > > > binary upgrade, let's prevent it from being set at all right from > > > the start of the upgrade process. > > > > We are already forcing the required setting > > "max_slot_wal_keep_size=-1" during the upgrade similar to some of the > > other settings like "full_page_writes". However, the user can provide > > an option for "max_slot_wal_keep_size" in which case it will be > > overridden. Now, I think (a) we can ensure that our setting always > > takes precedence in this case. The other idea is (b) to parse the > > user-provided options and check if "max_slot_wal_keep_size" has a > > value different than expected and raise an error accordingly. Or we > > can simply (c) document the usage of max_slot_wal_keep_size in the > > upgrade. I am not sure whether it is worth complicating the code for > > this as the user shouldn't be using such an option during the upgrade. > > So, I think doing (a) and (c) could be simpler. > > > > > > In InvalidatePossiblyObsoleteSlot() we could have just an Assert() > > > or elog(PANIC). > > > > > > > Yeah, we can change to either of those. > > This discussion seems like a bit off from my point. I suggested adding a check > for that setting when IsBinaryUpgraded is true at the GUC level as shown in the > attached patch. I believe Álvaro made a similar suggestion. While the error > message is somewhat succinct, I think it is sufficient given the low possilibility > of the scenario and the fact that it cannot occur inadvertently. Thanks for the diff and I think the approach basically works. One notable behavior of this approach it will reject the GUC setting even if there are no slots on old cluster or user set the value to a big enough value which doesn't cause invalidation. The behavior doesn't look bad to me but just mention it for reference. Best Regards, Hou zj
On Mon, Oct 30, 2023 at 8:51 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > This discussion seems like a bit off from my point. I suggested adding > > a check for that setting when IsBinaryUpgraded is true at the GUC > > level as shown in the attached patch. I believe Álvaro made a similar > > suggestion. While the error message is somewhat succinct, I think it > > is sufficient given the low possilibility of the scenario and the fact > > that it cannot occur inadvertently. > > > > I think we can simply change that error message to assert if we want > to go with the check hook idea of yours. BTW, can we add > GUC_check_errdetail() with a better message as some of the other check > function uses? Also, I guess we can add some comments or at least > refer to the existing comments to explain the reason of such a check. Will the check_hook approach work correctly? I haven't checked that by myself, but I see InitializeGUCOptions() getting called before IsBinaryUpgrade is set to true and the passed-in config options ('c') are parsed. If the check_hook approach works correctly, I think we must add a test hitting the error in check_max_slot_wal_keep_size for the IsBinaryUpgrade case. And, I agree with Amit to have a detailed messaging with GUC_check_errmsg/GUC_check_errdetail. Also, IMV, leaving the error message in InvalidatePossiblyObsoleteSlot() there (if required with a better wording as discussed initially in this thread) does no harm. Actually, it acts as another safety net given that max_slot_wal_keep_size GUC is reloadable via SIGHUP. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
At Mon, 30 Oct 2023 08:51:18 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in > I think we can simply change that error message to assert if we want > to go with the check hook idea of yours. BTW, can we add > GUC_check_errdetail() with a better message as some of the other check > function uses? Also, I guess we can add some comments or at least > refer to the existing comments to explain the reason of such a check. Definitely. I've attached the following changes. 1. Added GUC_check_errdetail() to the check function. 2. Added a comment to the check function (based on my knowledge about the feature). 3. Altered the ereport() into Assert() in InvalidatePossiblyObsoleteSlot(). I considered removing the !SlotIsLogical() condition since pg_upgrade always sets max_slot_wal_keep_size to -1, but I left it unchanged. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Attachment
At Mon, 30 Oct 2023 03:36:41 +0000, "Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com> wrote in > Thanks for the diff and I think the approach basically works. > > One notable behavior of this approach it will reject the GUC setting even if there > are no slots on old cluster or user set the value to a big enough value which > doesn't cause invalidation. The behavior doesn't look bad to me but just mention it > for reference. Indeed. pg_upgrade anyway sets the variable to -1 irrespective of the slot's existence, and I see no justification for allowing users to forcibly change it. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Mon, 30 Oct 2023 12:38:47 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > Will the check_hook approach work correctly? I haven't checked that by > myself, but I see InitializeGUCOptions() getting called before > IsBinaryUpgrade is set to true and the passed-in config options ('c') > are parsed. I'm not sure about the wanted behavior exactly, but the fast you pointed doesn't matter because the check is required after parsing the command line options. On the other hand, I'm not sure about the behavior that a setting in postgresql.conf is rejected. > If the check_hook approach works correctly, I think we must add a test > hitting the error in check_max_slot_wal_keep_size for the > IsBinaryUpgrade case. And, I agree with Amit to have a detailed > messaging with GUC_check_errmsg/GUC_check_errdetail. Also, IMV, > leaving the error message in InvalidatePossiblyObsoleteSlot() there > (if required with a better wording as discussed initially in this > thread) does no harm. Actually, it acts as another safety net given > that max_slot_wal_keep_size GUC is reloadable via SIGHUP. The error message, which is deemed impossible, adds an additional message translation. In another thread, we are discussing the reduction of translatable messages. Therefore, I suggest using elog() for the condition at the very least. Whether it should be elog() or Assert() remains open for discussion, as I don't have a firm stance on it. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Mon, Oct 30, 2023 at 1:42 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Mon, 30 Oct 2023 12:38:47 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > > Will the check_hook approach work correctly? I haven't checked that by > > myself, but I see InitializeGUCOptions() getting called before > > IsBinaryUpgrade is set to true and the passed-in config options ('c') > > are parsed. > > I'm not sure about the wanted behavior exactly, but the fast you > pointed doesn't matter because the check is required after parsing the > command line options. On the other hand, I'm not sure about the > behavior that a setting in postgresql.conf is rejected. Yeah. The check_hook is called even after the param is specified in postgresql.conf during the upgrade, so I see no problem there. > > If the check_hook approach works correctly, I think we must add a test > > hitting the error in check_max_slot_wal_keep_size for the > > IsBinaryUpgrade case. And, I agree with Amit to have a detailed > > messaging with GUC_check_errmsg/GUC_check_errdetail. Also, IMV, > > leaving the error message in InvalidatePossiblyObsoleteSlot() there > > (if required with a better wording as discussed initially in this > > thread) does no harm. Actually, it acts as another safety net given > > that max_slot_wal_keep_size GUC is reloadable via SIGHUP. > > The error message, which is deemed impossible, adds an additional > message translation. In another thread, we are discussing the > reduction of translatable messages. Therefore, I suggest using elog() > for the condition at the very least. Whether it should be elog() or > Assert() remains open for discussion, as I don't have a firm stance on > it. I get it. I agree to go with just the assert because the GUC check_hook kinda tightens the screws against setting max_slot_wal_keep_size to a value other than -1 during the binary upgrade, A few comments on your inhibit_m_s_w_k_s_during_upgrade_2.txt: 1. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Dear Bharath, > Will the check_hook approach work correctly? I tested by using the first version and worked well (rejected). Please see the log which recorded the output and log. Below lines were copied from server log and found that max_slot_wal_keep_size must not be >= 0. ``` waiting for server to start....2023-10-30 08:53:32.529 GMT [6903] FATAL: invalid value for parameter "max_slot_wal_keep_size":1 stopped waiting pg_ctl: could not start serve ``` > I haven't checked that by > myself, but I see InitializeGUCOptions() getting called before > IsBinaryUpgrade is set to true and the passed-in config options ('c') > are parsed. I thought the key point was that user-defined options are aligned after the "-b". User-defined options are set after the '-b' option, so check_hook could work as we expected. Thought? Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
On Mon, Oct 30, 2023 at 2:31 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > Never mind. Previous message was accidentally sent before I finished writing my comments. > Yeah. The check_hook is called even after the param is specified in > postgresql.conf during the upgrade, so I see no problem there. > > > The error message, which is deemed impossible, adds an additional > > message translation. In another thread, we are discussing the > > reduction of translatable messages. Therefore, I suggest using elog() > > for the condition at the very least. Whether it should be elog() or > > Assert() remains open for discussion, as I don't have a firm stance on > > it. > > I get it. I agree to go with just the assert because the GUC > check_hook kinda tightens the screws against setting > max_slot_wal_keep_size to a value other than -1 during the binary > upgrade, A few comments on your inhibit_m_s_w_k_s_during_upgrade_2.txt: 1. + * All WAL files on the publisher node must be retained during an upgrade to + * maintain connections from downstreams. While pg_upgrade explicitly sets It's not just the publisher, anyone using logical slots. Also, no downstream please. If you want, you can repurpose the comment that's added by 29d0a77f. /* * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the * checkpointer process. If WALs required by logical replication slots * are removed, the slots are unusable. This setting prevents the * invalidation of slots during the upgrade. We set this option when * cluster is PG17 or later because logical replication slots can only be * migrated since then. Besides, max_slot_wal_keep_size is added in PG13. */ 2. At present, only logical slots really require + * this. Can we remove the above comment as the code with SlotIsLogical(s) explains it all? 3. + GUC_check_errdetail("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade."); + return false; How about we be explicit like the following which helps users reason about this restriction instead of them looking at the comments/docs? GUC_check_errcode(ERRCODE_INVALID_PARAMETER_VALUE); GUC_check_errmsg(""\"max_slot_wal_keep_size\" must be set to -1 when in binary upgrade mode"); GUC_check_errdetail("A value of -1 prevents the removal of WAL required for logical slots upgrade."); return false; 4. I think a test case to hit the error in the check hook in 003_logical_slots.pl will help a lot here - not only covers the code but also helps demonstrate how one can reach the error. 5. I think the check_hook is better defined in xlog.c the place where it's actually being declared and in action. IMV, there's no reason for it to be in slot.c as it doesn't deal with any slot related variables/structs. This also avoids an unnecessary "utils/guc_hooks.h" inclusion in slot.c. +bool +check_max_slot_wal_keep_size(int *newval, void **extra, GucSource source) +{ 5. A possible problem with this check_hook approach is that it doesn't let anyone setting max_slot_wal_keep_size to a value other than -1 during pg_ugprade even if someone doesn't have logical slots or doesn't want to upgrade logical slots in which case the WAL file growth during pg_upgrade may be huge (transiently) unless the pg_resetwal step of pg_upgrade removes it at the end. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
At Mon, 30 Oct 2023 14:55:01 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > > I get it. I agree to go with just the assert because the GUC > > check_hook kinda tightens the screws against setting > > max_slot_wal_keep_size to a value other than -1 during the binary > > upgrade, Thanks for being on the same page. > A few comments on your inhibit_m_s_w_k_s_during_upgrade_2.txt: > > 1. > + * All WAL files on the publisher node must be retained during an upgrade to > + * maintain connections from downstreams. While pg_upgrade explicitly sets > > It's not just the publisher, anyone using logical slots. Also, no > downstream please. If you want, you can repurpose the comment that's > added by 29d0a77f. > > /* > * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the > * checkpointer process. If WALs required by logical replication slots > * are removed, the slots are unusable. This setting prevents the > * invalidation of slots during the upgrade. We set this option when > * cluster is PG17 or later because logical replication slots can only be > * migrated since then. Besides, max_slot_wal_keep_size is added in PG13. > */ It is helpful. Thanks! > 2. > At present, only logical slots really require > + * this. > > Can we remove the above comment as the code with SlotIsLogical(s) > explains it all? max_slot_wal_keep_size affects both logical and physical slots. Therefore, if we are interested only one of the two types of slots, it's important to clarify the rationale. Regardless of any potential exntension to physical slots, I believe it's essential to clarify the rationale. I couldn't determine from that extensive thread whether there's a possible extension to physical slots. Could you inform me if such an extension can happen and, if not, provide the reason? > 3. > + GUC_check_errdetail("\"max_slot_wal_keep_size\" must be set > to -1 during the upgrade."); > + return false; > > How about we be explicit like the following which helps users reason > about this restriction instead of them looking at the comments/docs? > > GUC_check_errcode(ERRCODE_INVALID_PARAMETER_VALUE); > GUC_check_errmsg(""\"max_slot_wal_keep_size\" must be set > to -1 when in binary upgrade mode"); > GUC_check_errdetail("A value of -1 prevents the removal of > WAL required for logical slots upgrade."); > return false; I don't quite see the reason to provide such a detailed explanation just for this error. Additionally, since this check is performed regardless of the presence or absense of logical slots, I think the errdetail message might potentially confuse those whosee it. Adding "binary" looks fine as is and done in the attached. > 4. I think a test case to hit the error in the check hook in > 003_logical_slots.pl will help a lot here - not only covers the code > but also helps demonstrate how one can reach the error. Yeah, of course. I was planning to add tests once the direction of the discussion became clear. I will add them in the next version. > 5. I think the check_hook is better defined in xlog.c the place where > it's actually being declared and in action. IMV, there's no reason for > it to be in slot.c as it doesn't deal with any slot related > variables/structs. This also avoids an unnecessary "utils/guc_hooks.h" > inclusion in slot.c. > +bool > +check_max_slot_wal_keep_size(int *newval, void **extra, GucSource source) > +{ Sounds reasonable. Moved. I simply moved it to xlog.c, but the function comment was thoroughly written only for this moved function, making it somewhat stand out.. > 5. A possible problem with this check_hook approach is that it doesn't > let anyone setting max_slot_wal_keep_size to a value other than -1 > during pg_ugprade even if someone doesn't have logical slots or > doesn't want to upgrade logical slots in which case the WAL file > growth during pg_upgrade may be huge (transiently) unless the > pg_resetwal step of pg_upgrade removes it at the end. While I doubt anyone wishes to set the variable to a specific value during upgrade, think there are individuals who might be reluctant to edit the config file due to unclear reasons. While we could consider an alternative - checking for logical slots during binary upgrade- it's debatable if the effort is justified. (I haven't verified its feasibility, however.) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Attachment
On Tue, Oct 31, 2023 at 2:19 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > > GUC_check_errcode(ERRCODE_INVALID_PARAMETER_VALUE); > > GUC_check_errmsg(""\"max_slot_wal_keep_size\" must be set > > to -1 when in binary upgrade mode"); > > GUC_check_errdetail("A value of -1 prevents the removal of > > WAL required for logical slots upgrade."); > > return false; > > I don't quite see the reason to provide such a detailed explanation > just for this error. Additionally, since this check is performed > regardless of the presence or absense of logical slots. Okay, I get it. > > 4. I think a test case to hit the error in the check hook in > > 003_logical_slots.pl will help a lot here - not only covers the code > > but also helps demonstrate how one can reach the error. > > Yeah, of course. I was planning to add tests once the direction of the > discussion became clear. I will add them in the next version. Yes, please. The test case to hit the ERROR in InvalidatePossiblyObsoleteSlot() is important even if the check_hook approach isn't going anywhere. > function comment was thoroughly written only for this moved function, > making it somewhat stand out.. I think that's fine. > > 5. A possible problem with this check_hook approach is that it doesn't > > let anyone setting max_slot_wal_keep_size to a value other than -1 > > during pg_ugprade even if someone doesn't have logical slots or > > doesn't want to upgrade logical slots in which case the WAL file > > growth during pg_upgrade may be huge (transiently) unless the > > pg_resetwal step of pg_upgrade removes it at the end. > > While I doubt anyone wishes to set the variable to a specific value > during upgrade, think there are individuals who might be reluctant to > edit the config file due to unclear reasons. While we could consider > an alternative - checking for logical slots during binary upgrade- > it's debatable if the effort is justified. (I haven't verified its > feasibility, however.) Checking for logical slots during binary upgrade doesn't help - what if there are logical slots present but no upgrade is wanted (via a new pg_uprade option)? Basically, how will the postgres server know whether someone wants pg_upgrade of logical slots or not? Can we check if someone is overriding max_slot_wal_keep_size in pg_upgrade itself (via pg_settings query from the server)? If yes, if logical slots exist and upgrade is wanted, then disallow the upgrade if GUC is set to value other than -1. I believe disallowing setting max_slot_wal_keep_size to a value other than -1 during binary upgrade may have serious consequences as it impacts WAL retention before the pg_resetwal comes into picture as part of pg_upgrade. Or what if we just live with what we have right now? I mean with ERROR in InvalidatePossiblyObsoleteSlot(). Or what if we just remove ERROR in InvalidatePossiblyObsoleteSlot or make it an Assert and say do not override max_slot_wal_keep_size in docs? Even if someone did override, let the pg_upgrade report the slot as invalidated and let the user delete the slot or decide what to do with it. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Tue, Oct 31, 2023 at 4:00 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > 5. A possible problem with this check_hook approach is that it doesn't > > > let anyone setting max_slot_wal_keep_size to a value other than -1 > > > during pg_ugprade even if someone doesn't have logical slots or > > > doesn't want to upgrade logical slots in which case the WAL file > > > growth during pg_upgrade may be huge (transiently) unless the > > > pg_resetwal step of pg_upgrade removes it at the end. > > > > While I doubt anyone wishes to set the variable to a specific value > > during upgrade, think there are individuals who might be reluctant to > > edit the config file due to unclear reasons. While we could consider > > an alternative - checking for logical slots during binary upgrade- > > it's debatable if the effort is justified. (I haven't verified its > > feasibility, however.) > > Checking for logical slots during binary upgrade doesn't help - what > if there are logical slots present but no upgrade is wanted (via a new > pg_uprade option)? Basically, how will the postgres server know > whether someone wants pg_upgrade of logical slots or not? Can we check > if someone is overriding max_slot_wal_keep_size in pg_upgrade itself > (via pg_settings query from the server)? If yes, if logical slots > exist and upgrade is wanted, then disallow the upgrade if GUC is set > to value other than -1. > I feel we can try to extend the functionality if we really see some user demand. It is not that we can't do it now but it doesn't seem prudent to make the functionality/code more complex than really required. > I believe disallowing setting max_slot_wal_keep_size to a value other > than -1 during binary upgrade may have serious consequences as it > impacts WAL retention before the pg_resetwal comes into picture as > part of pg_upgrade. > I don't think this is completely true because this setting will only impact if there are active slots and those slots need some WAL which we want to remove. This setting shouldn't be used as often as you are imagining. > Or what if we just live with what we have right now? I mean with ERROR > in InvalidatePossiblyObsoleteSlot(). > > Or what if we just remove ERROR in InvalidatePossiblyObsoleteSlot or > make it an Assert and say do not override max_slot_wal_keep_size in > docs? Even if someone did override, let the pg_upgrade report the slot > as invalidated and let the user delete the slot or decide what to do > with it. > The problem is this can happen in the background so it can happen at the time of shutdown when all the upgrade is complete. -- With Regards, Amit Kapila.
Dear Horiguchi-san, Thanks for making the patch! > > 4. I think a test case to hit the error in the check hook in > > 003_logical_slots.pl will help a lot here - not only covers the code > > but also helps demonstrate how one can reach the error. > > Yeah, of course. I was planning to add tests once the direction of the > discussion became clear. I will add them in the next version. I tried to make the part. Feel free to include it if not yet. We can check the server log, but I think it may be overkill. Also, I have one comment. ``` +bool +check_max_slot_wal_keep_size(int *newval, void **extra, GucSource source) +{ + if (IsBinaryUpgrade && *newval != -1) + { + GUC_check_errdetail("\"max_slot_wal_keep_size\" must be set to -1 during binary upgrade mode."); + return false; + } + return true; +} ``` Just to confirm - should we check the GucSource? Based on ur requirement, it might be enough we avoid overwriting while starting the server. Personally current code is OK because it is simpler, but I want to hear your opinion. Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
Hi, here are some minor review comments for the v3 patch. ====== src/backend/access/transam/xlog.c 1. check_max_slot_wal_keep_size +/* + * GUC check_hook for max_slot_wal_keep_size + * + * If WALs required by logical replication slots are removed, the slots are + * unusable. While pg_upgrade sets this variable to -1 via the command line to + * attempt to prevent such removal during binary upgrade, there are ways for + * users to override it. For the sake of completing the objective, ensure that + * this variable remains unchanged. See InvalidatePossiblyObsoleteSlot() and + * start_postmaster() in pg_upgrade for more details. + */ I asked ChatGPT to suggest alternative wording for that comment, and it came up with something that I felt was a slight improvement. SUGGESTION ... If WALs needed by logical replication slots are deleted, these slots become inoperable. During a binary upgrade, pg_upgrade sets this variable to -1 via the command line in an attempt to prevent such deletions, but users have ways to override it. To ensure the successful completion of the upgrade, it's essential to keep this variable unaltered. ... ~~~ 2. +bool +check_max_slot_wal_keep_size(int *newval, void **extra, GucSource source) +{ + if (IsBinaryUpgrade && *newval != -1) + { + GUC_check_errdetail("\"max_slot_wal_keep_size\" must be set to -1 during binary upgrade mode."); + return false; + } + return true; +} Some of the other GUC_check_errdetail()'s do not include the GUC name in the translatable message text. Isn't that a preferred style? SUGGESTION GUC_check_errdetail("\"%s\" must be set to -1 during binary upgrade mode.", "max_slot_wal_keep_size"); ====== src/backend/replication/slot.c 3. InvalidatePossiblyObsoleteSlot - if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade) - { - ereport(ERROR, - errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("replication slots must not be invalidated during the upgrade"), - errhint("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade")); - } + Assert (!*invalidated || !SlotIsLogical(s) || !IsBinaryUpgrade); IMO new Assert became trickier to understand than the original condition. YMMV. SUGGESTION Assert(!(*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)); ====== Kind Regards, Peter Smith. Fujitsu Australia
Thanks you for the comments! At Wed, 1 Nov 2023 18:08:19 +1100, Peter Smith <smithpb2250@gmail.com> wrote in > Hi, here are some minor review comments for the v3 patch. > > ====== > src/backend/access/transam/xlog.c > I asked ChatGPT to suggest alternative wording for that comment, and > it came up with something that I felt was a slight improvement. > > SUGGESTION > ... > If WALs needed by logical replication slots are deleted, these slots > become inoperable. During a binary upgrade, pg_upgrade sets this > variable to -1 via the command line in an attempt to prevent such > deletions, but users have ways to override it. To ensure the > successful completion of the upgrade, it's essential to keep this > variable unaltered. > ... > > ~~~ ChatGPT seems to tend to generate sentences in a slightly different from our usual writing. While I tried to retain the original phrasing in the patch, I don't mind using the suggested version. Used as is. > 2. > + GUC_check_errdetail("\"max_slot_wal_keep_size\" must be set to -1 > during binary upgrade mode."); > Some of the other GUC_check_errdetail()'s do not include the GUC name > in the translatable message text. Isn't that a preferred style? > SUGGESTION > GUC_check_errdetail("\"%s\" must be set to -1 during binary upgrade mode.", > "max_slot_wal_keep_size"); I believe that that style was adopted to minimize translatable messages by consolidting identical ones that only differ in variable names. I see both versions in the tree. I didn't find necessity to adopt this approach for this specific message, especially since I'm skeptical about adding new messages that end with "must be set to -1 during binary upgrade mode". (pg_upgrade sets synchronous_commit, fsync and full_page_writes to "off".) However, some unique messages are in this style, so I'm fine with using that style. Revised accordingly. > ====== > src/backend/replication/slot.c > > 3. InvalidatePossiblyObsoleteSlot > + Assert (!*invalidated || !SlotIsLogical(s) || !IsBinaryUpgrade); > > IMO new Assert became trickier to understand than the original condition. YMMV. > > SUGGESTION > Assert(!(*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)); Yeah, I also liked that style and considered using it, but I didn't feel it was too hard to read in this particular case, so I ended up using the current way. Just like with the point of other comments, I'm not particularly attached to this style. Thus if someone find it difficult to read, I have no issue with changing it. Revised as suggested. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Attachment
On Thu, Nov 2, 2023 at 1:58 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > Thanks you for the comments! > > At Wed, 1 Nov 2023 18:08:19 +1100, Peter Smith <smithpb2250@gmail.com> wrote in > > Hi, here are some minor review comments for the v3 patch. > > > > ====== > > src/backend/access/transam/xlog.c > ... > > 2. > > > + GUC_check_errdetail("\"max_slot_wal_keep_size\" must be set to -1 > > during binary upgrade mode."); > > > Some of the other GUC_check_errdetail()'s do not include the GUC name > > in the translatable message text. Isn't that a preferred style? > > > SUGGESTION > > GUC_check_errdetail("\"%s\" must be set to -1 during binary upgrade mode.", > > "max_slot_wal_keep_size"); > > I believe that that style was adopted to minimize translatable > messages by consolidting identical ones that only differ in variable > names. I see both versions in the tree. I didn't find necessity to > adopt this approach for this specific message, especially since I'm > skeptical about adding new messages that end with "must be set to -1 > during binary upgrade mode". (pg_upgrade sets synchronous_commit, > fsync and full_page_writes to "off".) > > However, some unique messages are in this style, so I'm fine with > using that style. Revised accordingly. > Checking this patch yesterday prompted me to create a new thread questioning the inconsistencies of the "GUC names in messages". In that thread, Tom Lane replied and gave some background information [1] about the GUC name embedding versus substitution. In hindsight, I think your original message was fine as-is, but there seem to be examples of every kind of style, so whatever you do would have some precedent. The patch v4 LGTM. ====== [1] https://www.postgresql.org/message-id/2758485.1698848717%40sss.pgh.pa.us Kind Regards, Peter Smith. Fujitsu Australia
On Thu, Nov 2, 2023 at 2:25 PM Peter Smith <smithpb2250@gmail.com> wrote: > > On Thu, Nov 2, 2023 at 1:58 PM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > > > Thanks you for the comments! > > > > At Wed, 1 Nov 2023 18:08:19 +1100, Peter Smith <smithpb2250@gmail.com> wrote in > > > Hi, here are some minor review comments for the v3 patch. > > > > > > ====== > > > src/backend/access/transam/xlog.c > > > ... > > > 2. > > > > > + GUC_check_errdetail("\"max_slot_wal_keep_size\" must be set to -1 > > > during binary upgrade mode."); > > > > > Some of the other GUC_check_errdetail()'s do not include the GUC name > > > in the translatable message text. Isn't that a preferred style? > > > > > SUGGESTION > > > GUC_check_errdetail("\"%s\" must be set to -1 during binary upgrade mode.", > > > "max_slot_wal_keep_size"); > > > > I believe that that style was adopted to minimize translatable > > messages by consolidting identical ones that only differ in variable > > names. I see both versions in the tree. I didn't find necessity to > > adopt this approach for this specific message, especially since I'm > > skeptical about adding new messages that end with "must be set to -1 > > during binary upgrade mode". (pg_upgrade sets synchronous_commit, > > fsync and full_page_writes to "off".) > > > > However, some unique messages are in this style, so I'm fine with > > using that style. Revised accordingly. > > > > Checking this patch yesterday prompted me to create a new thread > questioning the inconsistencies of the "GUC names in messages". In > that thread, Tom Lane replied and gave some background information [1] > about the GUC name embedding versus substitution. In hindsight, I > think your original message was fine as-is, but there seem to be > examples of every kind of style, so whatever you do would have some > precedent. > > The patch v4 LGTM. > To clarify, all the current code LGTM, but the patch is still missing a guc_hook test case, right? ====== Kind Regards, Peter Smith. Fujitsu Australia
On Thu, Nov 02, 2023 at 02:32:07PM +1100, Peter Smith wrote: > On Thu, Nov 2, 2023 at 2:25 PM Peter Smith <smithpb2250@gmail.com> wrote: >> Checking this patch yesterday prompted me to create a new thread >> questioning the inconsistencies of the "GUC names in messages". In >> that thread, Tom Lane replied and gave some background information [1] >> about the GUC name embedding versus substitution. In hindsight, I >> think your original message was fine as-is, but there seem to be >> examples of every kind of style, so whatever you do would have some >> precedent. >> >> The patch v4 LGTM. > > To clarify, all the current code LGTM, but the patch is still missing > a guc_hook test case, right? - NULL, NULL, NULL + check_max_slot_wal_keep_size, NULL, NULL FWIW, I am +-0 with what you are proposing here. I don't quite get why one may want to enforce this specific GUC at upgrade. Anyway, if they do, I'd be curious to hear why this is required and this patch would prevent them to do so. Actually, this could be a good reason for making the logical slot handling during pg_upgrade an option rather than a mandatory thing. -- Michael
Attachment
On Thu, Nov 2, 2023 at 11:32 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Nov 02, 2023 at 02:32:07PM +1100, Peter Smith wrote: > > On Thu, Nov 2, 2023 at 2:25 PM Peter Smith <smithpb2250@gmail.com> wrote: > >> Checking this patch yesterday prompted me to create a new thread > >> questioning the inconsistencies of the "GUC names in messages". In > >> that thread, Tom Lane replied and gave some background information [1] > >> about the GUC name embedding versus substitution. In hindsight, I > >> think your original message was fine as-is, but there seem to be > >> examples of every kind of style, so whatever you do would have some > >> precedent. > >> > >> The patch v4 LGTM. > > > > To clarify, all the current code LGTM, but the patch is still missing > > a guc_hook test case, right? > > - NULL, NULL, NULL > + check_max_slot_wal_keep_size, NULL, NULL > > FWIW, I am +-0 with what you are proposing here. I don't quite get > why one may want to enforce this specific GUC at upgrade. > I also can't think of a good reason to do so but OTOH, I can't imagine all possible scenarios. As this setting is invalid or can cause problems, it seems people favor preventing it. Alvaro also voted in favor of preventing it, so we are considering to proceed with it unless more people think otherwise. -- With Regards, Amit Kapila.
On Thu, Nov 2, 2023 at 2:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Nov 2, 2023 at 11:32 AM Michael Paquier <michael@paquier.xyz> wrote: > > > > On Thu, Nov 02, 2023 at 02:32:07PM +1100, Peter Smith wrote: > > > On Thu, Nov 2, 2023 at 2:25 PM Peter Smith <smithpb2250@gmail.com> wrote: > > >> Checking this patch yesterday prompted me to create a new thread > > >> questioning the inconsistencies of the "GUC names in messages". In > > >> that thread, Tom Lane replied and gave some background information [1] > > >> about the GUC name embedding versus substitution. In hindsight, I > > >> think your original message was fine as-is, but there seem to be > > >> examples of every kind of style, so whatever you do would have some > > >> precedent. > > >> > > >> The patch v4 LGTM. > > > > > > To clarify, all the current code LGTM, but the patch is still missing > > > a guc_hook test case, right? > > > > - NULL, NULL, NULL > > + check_max_slot_wal_keep_size, NULL, NULL > > > > FWIW, I am +-0 with what you are proposing here. I don't quite get > > why one may want to enforce this specific GUC at upgrade. > > > > I also can't think of a good reason to do so but OTOH, I can't imagine > all possible scenarios. As this setting is invalid or can cause > problems, it seems people favor preventing it. Alvaro also voted in > favor of preventing it, so we are considering to proceed with it > unless more people think otherwise. > Now, that Michael also committed another similar change in commit 7021d3b176, it is better to be consistent in both cases. So, either we should have check hooks for both parameters or follow another route where we always forcibly override these parameters (which means the user-provided values for these parameters will be ignored) in pg_upgrade and document it. Yet another simple way is to simply document the current behavior. In the future, if we see users complain about this or have use cases to use these parameters during an upgrade, we can accordingly try to adapt the behavior. Thoughts? -- With Regards, Amit Kapila.
On Fri, Nov 3, 2023 at 1:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Nov 2, 2023 at 2:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Nov 2, 2023 at 11:32 AM Michael Paquier <michael@paquier.xyz> wrote: > > > > > > On Thu, Nov 02, 2023 at 02:32:07PM +1100, Peter Smith wrote: > > > > On Thu, Nov 2, 2023 at 2:25 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > >> Checking this patch yesterday prompted me to create a new thread > > > >> questioning the inconsistencies of the "GUC names in messages". In > > > >> that thread, Tom Lane replied and gave some background information [1] > > > >> about the GUC name embedding versus substitution. In hindsight, I > > > >> think your original message was fine as-is, but there seem to be > > > >> examples of every kind of style, so whatever you do would have some > > > >> precedent. > > > >> > > > >> The patch v4 LGTM. > > > > > > > > To clarify, all the current code LGTM, but the patch is still missing > > > > a guc_hook test case, right? > > > > > > - NULL, NULL, NULL > > > + check_max_slot_wal_keep_size, NULL, NULL > > > > > > FWIW, I am +-0 with what you are proposing here. I don't quite get > > > why one may want to enforce this specific GUC at upgrade. > > > > > > > I also can't think of a good reason to do so but OTOH, I can't imagine > > all possible scenarios. As this setting is invalid or can cause > > problems, it seems people favor preventing it. Alvaro also voted in > > favor of preventing it, so we are considering to proceed with it > > unless more people think otherwise. > > > > Now, that Michael also committed another similar change in commit > 7021d3b176, it is better to be consistent in both cases. So, either we I agree. Both patches are setting a special GUC value at the command line, and both of them don't want the user to somehow override that. Since the requirements are the same, I felt the implementations (regardless if they use a guc hook or something else) should also be done the same way. Yesterday I posted a review comment on the other thread [1] (#2c) trying to express the same point about consistency. ====== [1] https://www.postgresql.org/message-id/CAHut%2BPsCzt%3DO3_xkyrskaZ3SMxaXoN4L5Z5CgvaGPNx3mXXxOQ%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
On Fri, Nov 03, 2023 at 01:33:26PM +1100, Peter Smith wrote: > On Fri, Nov 3, 2023 at 1:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote: >> Now, that Michael also committed another similar change in commit >> 7021d3b176, it is better to be consistent in both cases. So, either we > > I agree. Both patches are setting a special GUC value at the command > line, and both of them don't want the user to somehow override that. > Since the requirements are the same, I felt the implementations > (regardless if they use a guc hook or something else) should also be > done the same way. Yesterday I posted a review comment on the other > thread [1] (#2c) trying to express the same point about consistency. Yeah, I certainly agree about consistency in the implementation for both sides of the coin. Nevertheless, I'm still +-0 on the GUC hook addition as I am wondering if there could be a case where one would be interested in enforcing the state of the GUCs anyway, and we'd prevent entirely that. Another thing that we can do for max_logical_replication_workers, rather than a GUC hook, is to add a check on IsBinaryUpgrade in ApplyLauncherRegister(). At least that would be consistent with what we do for autovacuum as the apply worker is just a bgworker. -- Michael
Attachment
On Sun, Nov 5, 2023 at 5:33 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Nov 03, 2023 at 01:33:26PM +1100, Peter Smith wrote: > > On Fri, Nov 3, 2023 at 1:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > >> Now, that Michael also committed another similar change in commit > >> 7021d3b176, it is better to be consistent in both cases. So, either we > > > > I agree. Both patches are setting a special GUC value at the command > > line, and both of them don't want the user to somehow override that. > > Since the requirements are the same, I felt the implementations > > (regardless if they use a guc hook or something else) should also be > > done the same way. Yesterday I posted a review comment on the other > > thread [1] (#2c) trying to express the same point about consistency. > > Yeah, I certainly agree about consistency in the implementation for > both sides of the coin. > > Nevertheless, I'm still +-0 on the GUC hook addition as I am wondering > if there could be a case where one would be interested in enforcing > the state of the GUCs anyway, and we'd prevent entirely that. Another > thing that we can do for max_logical_replication_workers, rather than > a GUC hook, is to add a check on IsBinaryUpgrade in > ApplyLauncherRegister(). > Do you mean to say that if 'IsBinaryUpgrade' is true then let's not allow to launch launcher or apply worker? If so, I guess this won't be any better than prohibiting at an early stage or explicitly overriding those with internal values and documenting it, at least that way we can be consistent for both variables (max_logical_replication_workers and max_slot_wal_keep_size). -- With Regards, Amit Kapila.
On Tue, Nov 07, 2023 at 07:59:46AM +0530, Amit Kapila wrote: > Do you mean to say that if 'IsBinaryUpgrade' is true then let's not > allow to launch launcher or apply worker? If so, I guess this won't be > any better than prohibiting at an early stage or explicitly overriding > those with internal values and documenting it, at least that way we > can be consistent for both variables (max_logical_replication_workers > and max_slot_wal_keep_size). Yes, I mean to paint an extra IsBinaryUpgrade before registering the apply worker launcher. That would be consistent with what we do for autovacuum in the postmaster. -- Michael
Attachment
On Tue, Nov 7, 2023 at 8:12 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Nov 07, 2023 at 07:59:46AM +0530, Amit Kapila wrote: > > Do you mean to say that if 'IsBinaryUpgrade' is true then let's not > > allow to launch launcher or apply worker? If so, I guess this won't be > > any better than prohibiting at an early stage or explicitly overriding > > those with internal values and documenting it, at least that way we > > can be consistent for both variables (max_logical_replication_workers > > and max_slot_wal_keep_size). > > Yes, I mean to paint an extra IsBinaryUpgrade before registering the > apply worker launcher. That would be consistent with what we do for > autovacuum in the postmaster. > But then we don't need the hardcoded value of max_logical_replication_workers as zero by pg_upgrade. I think doing IsBinaryUpgrade for slots won't be neat, so we anyway need to keep using the special value of max_slot_wal_keep_size GUC. Though the handling for both won't be the same but I guess given the situation, that seems like a reasonable thing to do. If we follow that then we can have this special GUC hook only for max_slot_wal_keep_size GUC. -- With Regards, Amit Kapila.
On Tue, Nov 7, 2023 at 4:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Nov 7, 2023 at 8:12 AM Michael Paquier <michael@paquier.xyz> wrote: > > > > On Tue, Nov 07, 2023 at 07:59:46AM +0530, Amit Kapila wrote: > > > Do you mean to say that if 'IsBinaryUpgrade' is true then let's not > > > allow to launch launcher or apply worker? If so, I guess this won't be > > > any better than prohibiting at an early stage or explicitly overriding > > > those with internal values and documenting it, at least that way we > > > can be consistent for both variables (max_logical_replication_workers > > > and max_slot_wal_keep_size). > > > > Yes, I mean to paint an extra IsBinaryUpgrade before registering the > > apply worker launcher. That would be consistent with what we do for > > autovacuum in the postmaster. > > > > But then we don't need the hardcoded value of > max_logical_replication_workers as zero by pg_upgrade. I think doing > IsBinaryUpgrade for slots won't be neat, so we anyway need to keep > using the special value of max_slot_wal_keep_size GUC. Though the > handling for both won't be the same but I guess given the situation, > that seems like a reasonable thing to do. If we follow that then we > can have this special GUC hook only for max_slot_wal_keep_size GUC. > Michael, Horiguchi-San, and others, do you have any thoughts on what is the best way to proceed? -- With Regards, Amit Kapila.
On Thu, Nov 09, 2023 at 09:53:07AM +0530, Amit Kapila wrote: > On Tue, Nov 7, 2023 at 4:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote: >> But then we don't need the hardcoded value of >> max_logical_replication_workers as zero by pg_upgrade. I think doing >> IsBinaryUpgrade for slots won't be neat, so we anyway need to keep >> using the special value of max_slot_wal_keep_size GUC. Though the >> handling for both won't be the same but I guess given the situation, >> that seems like a reasonable thing to do. If we follow that then we >> can have this special GUC hook only for max_slot_wal_keep_size GUC. > > Michael, Horiguchi-San, and others, do you have any thoughts on what > is the best way to proceed? No problem for me to use a GUC hook for the WAL retention GUCs if you feel strongly about it at the end, but I'd rather use an approach based on IsBinaryUpgrade for the logical worker launcher to be consistent with autovacuum (where there's also an argument to refactor it to use a bgworker registration, centralizing the checks on IsBinaryUpgrade for all bgworkers, but that would be material for a different thread, if there's interest in doing that). The two situations we are trying to prevent (slot invalidation and bgworker launch) can be triggered under different contexts, so they don't have to use the same mechanisms to prevent what should not happen during an upgrade. -- Michael
Attachment
On Thu, Nov 9, 2023 at 3:55 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Nov 09, 2023 at 09:53:07AM +0530, Amit Kapila wrote: > > On Tue, Nov 7, 2023 at 4:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > >> But then we don't need the hardcoded value of > >> max_logical_replication_workers as zero by pg_upgrade. I think doing > >> IsBinaryUpgrade for slots won't be neat, so we anyway need to keep > >> using the special value of max_slot_wal_keep_size GUC. Though the > >> handling for both won't be the same but I guess given the situation, > >> that seems like a reasonable thing to do. If we follow that then we > >> can have this special GUC hook only for max_slot_wal_keep_size GUC. > > > > Michael, Horiguchi-San, and others, do you have any thoughts on what > > is the best way to proceed? > > No problem for me to use a GUC hook for the WAL retention GUCs if you > feel strongly about it at the end, but I'd rather use an approach > based on IsBinaryUpgrade for the logical worker launcher to be > consistent with autovacuum (where there's also an argument to refactor > it to use a bgworker registration, centralizing the checks on > IsBinaryUpgrade for all bgworkers, but that would be material for a > different thread, if there's interest in doing that). > > The two situations we are trying to prevent (slot invalidation and > bgworker launch) can be triggered under different contexts, so they > don't have to use the same mechanisms to prevent what should not > happen during an upgrade. > -- Having a GUC hook for the "max_slot_wal_keep_size" seemed OK to me. If the user overrides a GUC value (admittedly, maybe there is no reason why they would want to) then at least the hook will give an error, rather than us silently overwriting the user's value with -1. So, patch v4 LGTM, except it is better to include a test case. ~ Meanwhile, if preventing the apply worker launch is considered better to be implemented differently in ApplyLauncherRegister, then so be it. ====== Kind Regards, Peter Smith. Fujitsu Australia
At Thu, 9 Nov 2023 09:53:07 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in > Michael, Horiguchi-San, and others, do you have any thoughts on what > is the best way to proceed? As I previously mentioned, I believe that if rejection is to be the course of action, it would be best to proceed with it sooner rather than later. On the other hand, I am concerned about the need for users to perform extra steps depending on the source cluster conrfiguration. Therefore, another possible approach could be to simply ignore the given settings in the assignment hook rather than rejecting by the check hook, and forcibuly apply -1. What do you think about this third approach? I haven't checked this with pg_upgrade, but a standalone postmaster would emit the following messages. > postgres -b -c max_slot_wal_keep_size=-1 > LOG: "max_slot_wal_keep_size" is foced to set to -1 during binary upgrade mode. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Attachment
On Thu, Nov 9, 2023 at 11:40 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Thu, 9 Nov 2023 09:53:07 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in > > Michael, Horiguchi-San, and others, do you have any thoughts on what > > is the best way to proceed? > > As I previously mentioned, I believe that if rejection is to be the > course of action, it would be best to proceed with it sooner rather > than later. On the other hand, I am concerned about the need for users > to perform extra steps depending on the source cluster > conrfiguration. Therefore, another possible approach could be to > simply ignore the given settings in the assignment hook rather than > rejecting by the check hook, and forcibuly apply -1. > > What do you think about this third approach? > I have also proposed that as one of the alternatives but didn't get many votes. And, I think if the user is passing a special value of max_slot_wal_keep_size during the upgrade, it has to be a special case, and rejecting it upfront doesn't seem unreasonable to me. -- With Regards, Amit Kapila.
At Thu, 9 Nov 2023 12:00:59 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in > I have also proposed that as one of the alternatives but didn't get > many votes. And, I think if the user is passing a special value of > max_slot_wal_keep_size during the upgrade, it has to be a special > case, and rejecting it upfront doesn't seem unreasonable to me. Oops. Sorry, and understood. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Thu, Nov 09, 2023 at 05:04:28PM +1100, Peter Smith wrote: > Having a GUC hook for the "max_slot_wal_keep_size" seemed OK to me. If > the user overrides a GUC value (admittedly, maybe there is no reason > why they would want to) then at least the hook will give an error, > rather than us silently overwriting the user's value with -1. > > So, patch v4 LGTM, except it is better to include a test case. Where's this v4? I may be missing, but it does not seem to be attached to this thread.. -- Michael
Attachment
On Thu, Nov 9, 2023 at 12:38 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Nov 09, 2023 at 05:04:28PM +1100, Peter Smith wrote: > > Having a GUC hook for the "max_slot_wal_keep_size" seemed OK to me. If > > the user overrides a GUC value (admittedly, maybe there is no reason > > why they would want to) then at least the hook will give an error, > > rather than us silently overwriting the user's value with -1. > > > > So, patch v4 LGTM, except it is better to include a test case. > > Where's this v4? > I think it is in an email[1]. I can take care of this unless we see some opposition to this idea. [1] - https://www.postgresql.org/message-id/20231102.115834.1012152975995247837.horikyota.ntt%40gmail.com -- With Regards, Amit Kapila.
On Thu, Nov 09, 2023 at 01:12:54PM +0530, Amit Kapila wrote: > I think it is in an email[1]. Noted. > I can take care of this unless we see some opposition to this idea. Thanks! -- Michael
Attachment
Dear Horiguchi-san, hackers, > Thanks you for the comments! Thanks for updating the patch! I'm not sure it is intentional, but you might miss my post...I suggested to add a testcase. I attached the updated version which is almost the same as Horiguchi-san's one, but has a test. How do you think? Do you have other idea for testing? Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
On 2023-Nov-02, Kyotaro Horiguchi wrote: > diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c > index b541be8eec..46833f6ecd 100644 > --- a/src/backend/access/transam/xlog.c > +++ b/src/backend/access/transam/xlog.c > @@ -2063,6 +2063,29 @@ check_wal_segment_size(int *newval, void **extra, GucSource source) > return true; > } > > +/* > + * GUC check_hook for max_slot_wal_keep_size > + * > + * If WALs needed by logical replication slots are deleted, these slots become > + * inoperable. During a binary upgrade, pg_upgrade sets this variable to -1 via > + * the command line in an attempt to prevent such deletions, but users have > + * ways to override it. To ensure the successful completion of the upgrade, > + * it's essential to keep this variable unaltered. See > + * InvalidatePossiblyObsoleteSlot() and start_postmaster() in pg_upgrade for > + * more details. > + */ > +bool > +check_max_slot_wal_keep_size(int *newval, void **extra, GucSource source) > +{ > + if (IsBinaryUpgrade && *newval != -1) > + { > + GUC_check_errdetail("\"%s\" must be set to -1 during binary upgrade mode.", > + "max_slot_wal_keep_size"); > + return false; > + } > + return true; > +} One sentence in that comment reads weird. I'd do this: s/To ensure the ... unaltered/This check callback ensures the value is not overridden by the user/ > diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c > index 99823df3c7..5c3d2b1082 100644 > --- a/src/backend/replication/slot.c > +++ b/src/backend/replication/slot.c > @@ -1424,18 +1424,12 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause, > SpinLockRelease(&s->mutex); > > /* > - * The logical replication slots shouldn't be invalidated as > - * max_slot_wal_keep_size GUC is set to -1 during the upgrade. > - * > - * The following is just a sanity check. > + * check_max_slot_wal_keep_size() ensures max_slot_wal_keep_size is set > + * to -1, so, slot invalidation for logical slots shouldn't happen > + * during an upgrade. At present, only logical slots really require > + * this. > */ > - if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade) > - { > - ereport(ERROR, > - errcode(ERRCODE_INVALID_PARAMETER_VALUE), > - errmsg("replication slots must not be invalidated during the upgrade"), > - errhint("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade")); > - } > + Assert (!(*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)); I think it's worth adding a comment here, pointing to check_old_cluster_for_valid_slots() verifying that no already-invalidated slots exist before the upgrade starts. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
On Thu, Nov 9, 2023 at 4:09 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2023-Nov-02, Kyotaro Horiguchi wrote: > > > diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c > > index b541be8eec..46833f6ecd 100644 > > --- a/src/backend/access/transam/xlog.c > > +++ b/src/backend/access/transam/xlog.c > > @@ -2063,6 +2063,29 @@ check_wal_segment_size(int *newval, void **extra, GucSource source) > > return true; > > } > > > > +/* > > + * GUC check_hook for max_slot_wal_keep_size > > + * > > + * If WALs needed by logical replication slots are deleted, these slots become > > + * inoperable. During a binary upgrade, pg_upgrade sets this variable to -1 via > > + * the command line in an attempt to prevent such deletions, but users have > > + * ways to override it. To ensure the successful completion of the upgrade, > > + * it's essential to keep this variable unaltered. See > > + * InvalidatePossiblyObsoleteSlot() and start_postmaster() in pg_upgrade for > > + * more details. > > + */ > > +bool > > +check_max_slot_wal_keep_size(int *newval, void **extra, GucSource source) > > +{ > > + if (IsBinaryUpgrade && *newval != -1) > > + { > > + GUC_check_errdetail("\"%s\" must be set to -1 during binary upgrade mode.", > > + "max_slot_wal_keep_size"); > > + return false; > > + } > > + return true; > > +} > > One sentence in that comment reads weird. I'd do this: > > s/To ensure the ... unaltered/This check callback ensures the value is > not overridden by the user/ > These comments appear mostly repetitive to what is already mentioned in start_postmaster(). So, I have changed those referred to already written comments, and slightly adjusted the comments at another place. See attached. Personally, I don't see the need for a test for this, so removed the same but can add it back if you or others think so. -- With Regards, Amit Kapila.
Attachment
On 2023-Nov-09, Amit Kapila wrote: > These comments appear mostly repetitive to what is already mentioned > in start_postmaster(). So, I have changed those referred to already > written comments, and slightly adjusted the comments at another place. > See attached. I'd still rather mention check_old_cluster_for_valid_slots() just above the Assert() in InvalidatePossiblyObsoleteSlot(). It looks too bare to me otherwise. > Personally, I don't see the need for a test for this, so removed the > same but can add it back if you or others think so. I'm neutral on having a test for this. I'm not sure this is easy to break unintentionally. OTOH the test is cheap, since it only has to run pg_upgrade itself and not, say, another initdb. On the (as Robert says) third hand, would we have tests for each possible GUC that we'd like not to be changed during pg_upgrade? I suspect not, which suggests we don't want this one either. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "El Maquinismo fue proscrito so pena de cosquilleo hasta la muerte" (Ijon Tichy en Viajes, Stanislaw Lem)
On Thu, Nov 09, 2023 at 04:52:32PM +0900, Michael Paquier wrote: > Thanks! Also, please see also a patch about switching the logirep launcher to rely on IsBinaryUpgrade to prevent its startup. Any thoughts about that? -- Michael
Attachment
On Fri, Nov 10, 2023 at 7:50 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Nov 09, 2023 at 04:52:32PM +0900, Michael Paquier wrote: > > Thanks! > > Also, please see also a patch about switching the logirep launcher to > rely on IsBinaryUpgrade to prevent its startup. Any thoughts about > that? > Preventing these + * processes from starting while upgrading avoids any activity on the new + * cluster before the physical files are put in place, which could cause + * corruption on the new cluster upgrading to. I don't think this comment is correct because there won't be any apply activity on the new cluster as after restoration subscriptions should be disabled. On the old cluster, I think one problem is that the origins may move forward after we copy them which can cause data inconsistency issues. The other is that we may not prefer to generate additional data and WAL during the upgrade. Also, I am not completely sure about using the word 'corruption' in this context. -- With Regards, Amit Kapila.
On Fri, Nov 10, 2023 at 03:27:25PM +0530, Amit Kapila wrote: > I don't think this comment is correct because there won't be any apply > activity on the new cluster as after restoration subscriptions should > be disabled. On the old cluster, I think one problem is that the > origins may move forward after we copy them which can cause data > inconsistency issues. The other is that we may not prefer to generate > additional data and WAL during the upgrade. Also, I am not completely > sure about using the word 'corruption' in this context. What is your suggestion here? Would it be better to just aim for simplicity and just say that we don't want it to run because "it can lead to some inconsistent behaviors"? -- Michael
Attachment
On Sat, Nov 11, 2023 at 5:46 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Nov 10, 2023 at 03:27:25PM +0530, Amit Kapila wrote: > > I don't think this comment is correct because there won't be any apply > > activity on the new cluster as after restoration subscriptions should > > be disabled. On the old cluster, I think one problem is that the > > origins may move forward after we copy them which can cause data > > inconsistency issues. The other is that we may not prefer to generate > > additional data and WAL during the upgrade. Also, I am not completely > > sure about using the word 'corruption' in this context. > > What is your suggestion here? Would it be better to just aim for > simplicity and just say that we don't want it to run because "it can > lead to some inconsistent behaviors"? > I think we can be specific about logical replication stuff. I have not done any study on autovacuum behavior related to this, so we can update about it separately if required. I could think of something like the following: - /* Use -b to disable autovacuum. */ + /* + * Use -b to disable autovacuum and logical replication launcher + * (effective in PG17 or later for the latter). + * + * Logical replication workers can stream data during the upgrade which can + * cause replication origins to move forward after we have copied them. + * It can cause the system to request the data which is already present + * in the new cluster. + */ Now, ideally, such a comment change makes more sense along with the main patch, so either we can go without this comment change or probably wait till the main patch is ready and merge just before it or along with it. I am fine either way. BTW, it is not clear to me another part of the comment "... for the latter" in the proposed wording. Is there any typo there or am I missing something? -- With Regards, Amit Kapila.
On Mon, Nov 13, 2023 at 08:45:12AM +0530, Amit Kapila wrote: > I think we can be specific about logical replication stuff. I have not > done any study on autovacuum behavior related to this, so we can > update about it separately if required. Autovacuum, as far as I recall, could decide to do some work before files are physically copied from the old to the new cluster, corrupting the new cluster. Per 76dd09bbec89: + * If we have lost the autovacuum launcher, try to start a new one. + * We don't want autovacuum to run in binary upgrade mode because + * autovacuum might update relfrozenxid for empty tables before + * the physical files are put in place. > - /* Use -b to disable autovacuum. */ > + /* > + * Use -b to disable autovacuum and logical replication launcher > + * (effective in PG17 or later for the latter). > + * > + * Logical replication workers can stream data during the > upgrade which can > + * cause replication origins to move forward after we have copied them. > + * It can cause the system to request the data which is already present > + * in the new cluster. > + */ > > Now, ideally, such a comment change makes more sense along with the > main patch, so either we can go without this comment change or > probably wait till the main patch is ready and merge just before it or > along with it. I am fine either way. Another location would be to document that stuff directly in launcher.c where the check for IsBinaryUpgrade would be added. You are right that it makes little sense to document that now, so how about: 1) keeping pg_upgrade.c minimal, say: - /* Use -b to disable autovacuum. */ + /* + * Use -b to disable autovacuum and logical replication + * launcher (in 17~). + */ With a removal of the comment block related to max_logical_replication_workers=0? 2) Document that in ApplyLauncherRegister() as part of the main patch for the subscribers? > BTW, it is not clear to me another part of the comment "... for the > latter" in the proposed wording. Is there any typo there or am I > missing something? The "latter" refers to the logirep launcher here, as -b would affect it only in 17~ with the patch I sent previously. -- Michael
Attachment
On Mon, Nov 13, 2023 at 10:19 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, Nov 13, 2023 at 08:45:12AM +0530, Amit Kapila wrote: > > I think we can be specific about logical replication stuff. I have not > > done any study on autovacuum behavior related to this, so we can > > update about it separately if required. > > Autovacuum, as far as I recall, could decide to do some work before > files are physically copied from the old to the new cluster, > corrupting the new cluster. Per 76dd09bbec89: > > + * If we have lost the autovacuum launcher, try to start a new one. > + * We don't want autovacuum to run in binary upgrade mode because > + * autovacuum might update relfrozenxid for empty tables before > + * the physical files are put in place. > > > - /* Use -b to disable autovacuum. */ > > + /* > > + * Use -b to disable autovacuum and logical replication launcher > > + * (effective in PG17 or later for the latter). > > + * > > + * Logical replication workers can stream data during the > > upgrade which can > > + * cause replication origins to move forward after we have copied them. > > + * It can cause the system to request the data which is already present > > + * in the new cluster. > > + */ > > > > Now, ideally, such a comment change makes more sense along with the > > main patch, so either we can go without this comment change or > > probably wait till the main patch is ready and merge just before it or > > along with it. I am fine either way. > > Another location would be to document that stuff directly in > launcher.c where the check for IsBinaryUpgrade would be added. You > are right that it makes little sense to document that now, so how > about: > 1) keeping pg_upgrade.c minimal, say: > - /* Use -b to disable autovacuum. */ > + /* > + * Use -b to disable autovacuum and logical replication > + * launcher (in 17~). > + */ > With a removal of the comment block related to > max_logical_replication_workers=0? > 2) Document that in ApplyLauncherRegister() as part of the main patch > for the subscribers? > I am fine with this but there is no harm in doing this before or along with the main patch. As of now, I don't see any problem but as the main patch is still under review, so thought we could even wait for the patch to become "Ready For Committer". -- With Regards, Amit Kapila.
On Wed, Nov 15, 2023 at 07:58:06AM +0530, Amit Kapila wrote: > I am fine with this but there is no harm in doing this before or along > with the main patch. As of now, I don't see any problem but as the > main patch is still under review, so thought we could even wait for > the patch to become "Ready For Committer". WFM to wait until the other patch is ready before doing something here. -- Michael
Attachment
On Wed, Nov 15, 2023 at 07:58:06AM +0530, Amit Kapila wrote: > I am fine with this but there is no harm in doing this before or along > with the main patch. As of now, I don't see any problem but as the > main patch is still under review, so thought we could even wait for > the patch to become "Ready For Committer". My apologies for the delay. Now that 9a17be1e244a is in the tree, please find attached a patch to restrict the startup of the launcher using IsBinaryUpgrade in ApplyLauncherRegister(), with adjustments to the surrounding comments. Was there anything else you wanted to be covered and/or updated? -- Michael
Attachment
On Wed, Jan 10, 2024 at 10:11 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Nov 15, 2023 at 07:58:06AM +0530, Amit Kapila wrote: > > I am fine with this but there is no harm in doing this before or along > > with the main patch. As of now, I don't see any problem but as the > > main patch is still under review, so thought we could even wait for > > the patch to become "Ready For Committer". > > My apologies for the delay. > > Now that 9a17be1e244a is in the tree, please find attached a patch to > restrict the startup of the launcher using IsBinaryUpgrade in > ApplyLauncherRegister(), with adjustments to the surrounding comments. > - if (max_logical_replication_workers == 0) + /* + * The logical replication launcher is disabled during binary upgrades, + * as logical replication workers can stream data during the upgrade + * which can cause replication origins to move forward after we have + * copied them. It can cause the system to request the data which is + * already present in the new cluster. + */ + if (max_logical_replication_workers == 0 || IsBinaryUpgrade) This comment is not very clear to me. The first part of the sentence can't apply to the new cluster as after the upgrade, subscriptions will be disabled and the second part talks about requesting the wrong data in the new cluster. As per my understanding, the problem here is that, on the old cluster, the origins may move forward after we copy them and then we copy physical files. Now, in the new cluster when we try to request the data, it will be already present. > Was there anything else you wanted to be covered and/or updated? > No, only this patch. -- With Regards, Amit Kapila.
On Wed, Jan 10, 2024 at 06:02:12PM +0530, Amit Kapila wrote: > - if (max_logical_replication_workers == 0) > + /* > + * The logical replication launcher is disabled during binary upgrades, > + * as logical replication workers can stream data during the upgrade > + * which can cause replication origins to move forward after we have > + * copied them. It can cause the system to request the data which is > + * already present in the new cluster. > + */ > + if (max_logical_replication_workers == 0 || IsBinaryUpgrade) > > This comment is not very clear to me. The first part of the sentence > can't apply to the new cluster as after the upgrade, subscriptions > will be disabled and the second part talks about requesting the wrong > data in the new cluster. As per my understanding, the problem here is > that, on the old cluster, the origins may move forward after we copy > them and then we copy physical files. Now, in the new cluster when we > try to request the data, it will be already present. As far as I understand your complaint is about being more precise about where the workers could run when we do an upgrade. My patch covers the reason why it would be a problem, and I agree that it could be more detailed. Hence, how about something like that: "The logical replication launcher is disabled during binary upgrades, as a logical replication workers running on the cluster upgrading from could cause replication origins to move forward after they are copied to the cluster upgrading to, creating potentially conflicts with the physical files copied over." If you have a better suggestion, feel free. -- Michael
Attachment
On Thu, Jan 11, 2024 at 9:08 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Jan 10, 2024 at 06:02:12PM +0530, Amit Kapila wrote: > > - if (max_logical_replication_workers == 0) > > + /* > > + * The logical replication launcher is disabled during binary upgrades, > > + * as logical replication workers can stream data during the upgrade > > + * which can cause replication origins to move forward after we have > > + * copied them. It can cause the system to request the data which is > > + * already present in the new cluster. > > + */ > > + if (max_logical_replication_workers == 0 || IsBinaryUpgrade) > > > > This comment is not very clear to me. The first part of the sentence > > can't apply to the new cluster as after the upgrade, subscriptions > > will be disabled and the second part talks about requesting the wrong > > data in the new cluster. As per my understanding, the problem here is > > that, on the old cluster, the origins may move forward after we copy > > them and then we copy physical files. Now, in the new cluster when we > > try to request the data, it will be already present. > > As far as I understand your complaint is about being more precise > about where the workers could run when we do an upgrade. My patch > covers the reason why it would be a problem, and I agree that it could > be more detailed. > > Hence, how about something like that: > "The logical replication launcher is disabled during binary upgrades, > as a logical replication workers running on the cluster upgrading from > could cause replication origins to move forward after they are copied > to the cluster upgrading to, creating potentially conflicts with the > physical files copied over." > Looks better. One minor nitpick: /potentially/potential -- With Regards, Amit Kapila.
On Thu, Jan 11, 2024 at 11:25:44AM +0530, Amit Kapila wrote: > On Thu, Jan 11, 2024 at 9:08 AM Michael Paquier <michael@paquier.xyz> wrote: >> Hence, how about something like that: >> "The logical replication launcher is disabled during binary upgrades, >> as a logical replication workers running on the cluster upgrading from >> could cause replication origins to move forward after they are copied >> to the cluster upgrading to, creating potentially conflicts with the >> physical files copied over." > > Looks better. One minor nitpick: /potentially/potential Sure, WFM. Let's wait a bit and see if others have more comments to offer. -- Michael
Attachment
On 2024-Jan-11, Michael Paquier wrote: > Hence, how about something like that: > "The logical replication launcher is disabled during binary upgrades, > as a logical replication workers running on the cluster upgrading from > could cause replication origins to move forward after they are copied > to the cluster upgrading to, creating potentially conflicts with the > physical files copied over." "The logical replication launcher is disabled during binary upgrades, to avoid logical replication workers running on the source cluster. That would cause replication origins to move forward after having been copied to the target cluster, potentially creating conflicts with the copied data files." -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Linux transformó mi computadora, de una `máquina para hacer cosas', en un aparato realmente entretenido, sobre el cual cada día aprendo algo nuevo" (Jaime Salinas)
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > "The logical replication launcher is disabled during binary upgrades, to > avoid logical replication workers running on the source cluster. That > would cause replication origins to move forward after having been copied > to the target cluster, potentially creating conflicts with the copied > data files." "avoid logical replication workers running" still seems like shaky grammar. Perhaps s/avoid/avoid having/, or write "to prevent logical replication workers from running ...". Also perhaps s/would/could/. Otherwise +1 regards, tom lane
On Thu, Jan 11, 2024 at 10:01:16AM -0500, Tom Lane wrote: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: >> "The logical replication launcher is disabled during binary upgrades, to >> avoid logical replication workers running on the source cluster. That >> would cause replication origins to move forward after having been copied >> to the target cluster, potentially creating conflicts with the copied >> data files." > > "avoid logical replication workers running" still seems like shaky > grammar. Perhaps s/avoid/avoid having/, or write "to prevent logical > replication workers from running ...". After sleeping on it, your last suggestion sounds better to me, so I've incorporated that with Alvaro's wording (also cleaner than what I have posted), and applied the patch on HEAD. > Also perhaps s/would/could/. Yep. -- Michael