Thread: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]
API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]
From
Markus Wanner
Date:
On 24.03.22 20:32, Robert Haas wrote: > Fix possible recovery trouble if TRUNCATE overlaps a checkpoint. This patch changed the delayChkpt field of struct PGPROC from bool to int. Back-porting this change could be considered an API breaking change for extensions using this field. I'm not certain about padding behavior of compilers in general (or standards requirements around that), but at least on my machine, it seems sizeof(PGPROC) did not change, so padding led to subsequent fields still having the same offset. Nonetheless, the meaning of the field itself changed. And the additional assert now also triggers for the following pseudo-code of the extension I'm concerned about: /* * Prevent checkpoints being emitted in between additional * information in the logical message and the following * prepare record. */ MyProc->delayChkpt = true; LogLogicalMessage(...); /* Note that this will also reset the delayChkpt flag. */ PrepareTransaction(...); Now, I'm well aware this is not an official API, it just happens to be accessible for extensions. So I guess the underlying question is: What can extension developers expect? Which parts are okay to change even in stable branches and which can be relied upon to remain stable? And for this specific case: Is it worth reverting this change and applying a fully backwards compatible fix, instead? Regards Markus Wanner
Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]
From
Robert Haas
Date:
On Tue, Apr 5, 2022 at 9:02 AM Markus Wanner <markus.wanner@enterprisedb.com> wrote: > And for this specific case: Is it worth reverting this change and > applying a fully backwards compatible fix, instead? I think it's normally our policy to avoid changing definitions of accessible structs in back branches, except that we allow ourselves the indulgence of adding new members at the end or in padding space. So what would probably be best is if, in the back-branches, we changed "delayChkpt" back to a boolean, renamed it to delayChkptStart, and added a separate Boolean called delayChkptEnd. Maybe that could be added just after statusFlags, where I think it would fall into padding space. I think as the person who committed that patch I'm on the hook to fix this if nobody else would like to do it, but let me ask whether Kyotaro Horiguchi would like to propose a patch, since the original patch did, and/or whether you would like to propose a patch, as the person reporting the issue. -- Robert Haas EDB: http://www.enterprisedb.com
Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Apr 5, 2022 at 9:02 AM Markus Wanner > <markus.wanner@enterprisedb.com> wrote: >> And for this specific case: Is it worth reverting this change and >> applying a fully backwards compatible fix, instead? > I think it's normally our policy to avoid changing definitions of > accessible structs in back branches, except that we allow ourselves > the indulgence of adding new members at the end or in padding space. > So what would probably be best is if, in the back-branches, we changed > "delayChkpt" back to a boolean, renamed it to delayChkptStart, and > added a separate Boolean called delayChkptEnd. Maybe that could be > added just after statusFlags, where I think it would fall into padding > space. Renaming it would constitute an API break, which is if anything worse than an ABI break. While we're complaining at you, let me point out that changing a field's content and semantics while not changing its name is a time bomb waiting to break any third-party code that looks at (or modifies...) the field. What I think you need to do is: 1. In the back branches, revert delayChkpt to its previous type and semantics. Squeeze a separate delayChkptEnd bool in somewhere (you can't change the struct size either ...). 2. In HEAD, rename the field to something like delayChkptFlags, to ensure that any code touching it has to be inspected and updated. In other words, this is already an API break in HEAD, and that's fine, but it didn't break it hard enough to draw anyone's attention, which is not fine. regards, tom lane
Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]
From
Robert Haas
Date:
On Tue, Apr 5, 2022 at 10:17 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Renaming it would constitute an API break, which is if anything worse > than an ABI break. I don't think so, because an API break will cause a compilation failure, which an extension author can easily fix. > While we're complaining at you, let me point out that changing a field's > content and semantics while not changing its name is a time bomb waiting > to break any third-party code that looks at (or modifies...) the field. > > What I think you need to do is: > > 1. In the back branches, revert delayChkpt to its previous type and > semantics. Squeeze a separate delayChkptEnd bool in somewhere > (you can't change the struct size either ...). > > 2. In HEAD, rename the field to something like delayChkptFlags, > to ensure that any code touching it has to be inspected and updated. Well, we can do it that way, I suppose. -- Robert Haas EDB: http://www.enterprisedb.com
Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Apr 5, 2022 at 10:17 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Renaming it would constitute an API break, which is if anything worse >> than an ABI break. > I don't think so, because an API break will cause a compilation > failure, which an extension author can easily fix. My point is that we want that to happen in HEAD, but it's not okay for it to happen in a minor release of a stable branch. regards, tom lane
Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]
From
Robert Haas
Date:
On Tue, Apr 5, 2022 at 10:32 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Tue, Apr 5, 2022 at 10:17 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Renaming it would constitute an API break, which is if anything worse > >> than an ABI break. > > > I don't think so, because an API break will cause a compilation > > failure, which an extension author can easily fix. > > My point is that we want that to happen in HEAD, but it's not okay > for it to happen in a minor release of a stable branch. I understand, but I am not sure that I agree. I think that if an extension stops compiling against a back-branch, someone will notice the next time they try to compile it and will fix it. Maybe that's not amazing, but I don't think it's a huge deal either. On the other hand, if existing builds that someone has already shipped stop working with a new server release, that's a much larger issue. The extension packager can't go back and retroactively add a dependency on the server version to the already-shipped package. A new package can be shipped and specify a minimum minor version with which it will work, but an old package is what it is. -- Robert Haas EDB: http://www.enterprisedb.com
Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Apr 5, 2022 at 10:32 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> My point is that we want that to happen in HEAD, but it's not okay >> for it to happen in a minor release of a stable branch. > I understand, but I am not sure that I agree. I think that if an > extension stops compiling against a back-branch, someone will notice > the next time they try to compile it and will fix it. Maybe that's not > amazing, but I don't think it's a huge deal either. Well, perhaps it's not the end of the world, but it's still a large PITA for the maintainer of such an extension. They can't "just fix it" because some percentage of their userbase will still need to compile against older minor releases. Nor have you provided any way to handle that requirement via conditional compilation. regards, tom lane
Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]
From
Michael Paquier
Date:
On Tue, Apr 05, 2022 at 03:16:20PM -0400, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Apr 5, 2022 at 10:32 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> My point is that we want that to happen in HEAD, but it's not okay >>> for it to happen in a minor release of a stable branch. > >> I understand, but I am not sure that I agree. I think that if an >> extension stops compiling against a back-branch, someone will notice >> the next time they try to compile it and will fix it. Maybe that's not >> amazing, but I don't think it's a huge deal either. I agree with Tom's argument. The internals of this structure should not have changed in a stable branch. > Well, perhaps it's not the end of the world, but it's still a large > PITA for the maintainer of such an extension. They can't "just fix it" > because some percentage of their userbase will still need to compile > against older minor releases. Nor have you provided any way to handle > that requirement via conditional compilation. For example, I recall that some external extensions make use of sizeof(PGPROC) for their own business. Isn't 412ad7a going to be a problem to change this structure's internals for already-compiled code on stable branches? -- Michael
Attachment
Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]
From
Robert Haas
Date:
On Thu, Apr 7, 2022 at 2:28 AM Michael Paquier <michael@paquier.xyz> wrote: > > Well, perhaps it's not the end of the world, but it's still a large > > PITA for the maintainer of such an extension. They can't "just fix it" > > because some percentage of their userbase will still need to compile > > against older minor releases. Nor have you provided any way to handle > > that requirement via conditional compilation. > > For example, I recall that some external extensions make use of > sizeof(PGPROC) for their own business. Isn't 412ad7a going to be a > problem to change this structure's internals for already-compiled code > on stable branches? I don't think that commit changed sizeof(PGPROC), but it did affect the position of the delayChkpt and statusFlags members within the struct, which is what we now need to fix. Since I don't hear anyone else volunteering to take care of that, I'll go work on it. -- Robert Haas EDB: http://www.enterprisedb.com
Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]
From
Robert Haas
Date:
On Tue, Apr 5, 2022 at 10:17 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > What I think you need to do is: > > 1. In the back branches, revert delayChkpt to its previous type and > semantics. Squeeze a separate delayChkptEnd bool in somewhere > (you can't change the struct size either ...). > > 2. In HEAD, rename the field to something like delayChkptFlags, > to ensure that any code touching it has to be inspected and updated. Here are patches for master and v14 to do things this way. Comments? -- Robert Haas EDB: http://www.enterprisedb.com
Attachment
Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > Here are patches for master and v14 to do things this way. Comments? WFM. regards, tom lane
Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]
From
Petr Jelinek
Date:
> On 7. 4. 2022, at 17:19, Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Apr 5, 2022 at 10:17 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> What I think you need to do is: >> >> 1. In the back branches, revert delayChkpt to its previous type and >> semantics. Squeeze a separate delayChkptEnd bool in somewhere >> (you can't change the struct size either ...). >> >> 2. In HEAD, rename the field to something like delayChkptFlags, >> to ensure that any code touching it has to be inspected and updated. > > Here are patches for master and v14 to do things this way. Comments? Yeah I think this should do it (compilers should warn on master even without the rename, but who notices that right? :) ) Thanks, Petr
Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]
From
Michael Paquier
Date:
On Thu, Apr 07, 2022 at 11:19:15AM -0400, Robert Haas wrote: > Here are patches for master and v14 to do things this way. Comments? Thanks for the patches. They look correct. For ~14, I'd rather avoid the code duplication done by GetVirtualXIDsDelayingChkptEnd() and HaveVirtualXIDsDelayingChkpt() that could be avoided with an extra bool argument to the existing routine. The same kind of duplication happens with GetVirtualXIDsDelayingChkpt() and GetVirtualXIDsDelayingChkptEnd(). -- Michael
Attachment
Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]
From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes: > On Thu, Apr 07, 2022 at 11:19:15AM -0400, Robert Haas wrote: >> Here are patches for master and v14 to do things this way. Comments? > Thanks for the patches. They look correct. For ~14, I'd rather avoid > the code duplication done by GetVirtualXIDsDelayingChkptEnd() and > HaveVirtualXIDsDelayingChkpt() that could be avoided with an extra > bool argument to the existing routine. Isn't adding another argument an API break? (If there's any outside code calling GetVirtualXIDsDelayingChkpt, which it seems like there might be.) regards, tom lane
At Fri, 8 Apr 2022 08:47:42 +0900, Michael Paquier <michael@paquier.xyz> wrote in > On Thu, Apr 07, 2022 at 11:19:15AM -0400, Robert Haas wrote: > > Here are patches for master and v14 to do things this way. Comments? > > Thanks for the patches. They look correct. For ~14, I'd rather avoid > the code duplication done by GetVirtualXIDsDelayingChkptEnd() and > HaveVirtualXIDsDelayingChkpt() that could be avoided with an extra > bool argument to the existing routine. The same kind of duplication > happens with GetVirtualXIDsDelayingChkpt() and > GetVirtualXIDsDelayingChkptEnd(). FWIW, it is done in [1]. [1] https://www.postgresql.org/message-id/20220406.164521.17171257901083417.horikyota.ntt%40gmail.com regards. -- Kyotaro Horiguchi NTT Open Source Software Center
(Mmm. My mailer automatically teared off the [was: ..] part from the subject..) -- Kyotaro Horiguchi NTT Open Source Software Center
Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]
From
Robert Haas
Date:
On Thu, Apr 7, 2022 at 7:52 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Michael Paquier <michael@paquier.xyz> writes: > > On Thu, Apr 07, 2022 at 11:19:15AM -0400, Robert Haas wrote: > >> Here are patches for master and v14 to do things this way. Comments? > > > Thanks for the patches. They look correct. For ~14, I'd rather avoid > > the code duplication done by GetVirtualXIDsDelayingChkptEnd() and > > HaveVirtualXIDsDelayingChkpt() that could be avoided with an extra > > bool argument to the existing routine. > > Isn't adding another argument an API break? (If there's any outside > code calling GetVirtualXIDsDelayingChkpt, which it seems like there > might be.) Yeah, that's exactly why I didn't do what Michael proposes. If we're going to go to this trouble to avoid changing the layout of a PGPROC, we must be doing that on the theory that extension code cares about delayChkpt. And if that is so, it seems reasonable to suppose that it might also want to call the associated functions. Honestly, I wouldn't have thought that this mattered, because I wouldn't have guessed that any non-core code cared about delayChkpt. But I would have been wrong. -- Robert Haas EDB: http://www.enterprisedb.com
Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]
From
Michael Paquier
Date:
On Thu, Apr 07, 2022 at 10:19:35PM -0400, Robert Haas wrote: > Yeah, that's exactly why I didn't do what Michael proposes. If we're > going to go to this trouble to avoid changing the layout of a PGPROC, > we must be doing that on the theory that extension code cares about > delayChkpt. And if that is so, it seems reasonable to suppose that it > might also want to call the associated functions. Compatibility does not strike me as a problem with two static inline functions used as wrappers of their common logic. > Honestly, I wouldn't have thought that this mattered, because I > wouldn't have guessed that any non-core code cared about delayChkpt. > But I would have been wrong. That's a minor point. If you wish to keep this code as you are proposing, that's fine as well by me. -- Michael
Attachment
Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]
From
Markus Wanner
Date:
On Fri, 2022-04-08 at 08:47 +0900, Michael Paquier wrote: > On Thu, Apr 07, 2022 at 11:19:15AM -0400, Robert Haas wrote: > > Here are patches for master and v14 to do things this way. > > Comments? > > Thanks for the patches. They look correct. +1, looks good to me and addresses my specific original concern. > For ~14, I'd rather avoid > the code duplication done by GetVirtualXIDsDelayingChkptEnd() and > HaveVirtualXIDsDelayingChkpt() that could be avoided with an extra > bool argument to the existing routine. The same kind of duplication > happens with GetVirtualXIDsDelayingChkpt() and > GetVirtualXIDsDelayingChkptEnd(). I agree with Michael, it would be nice to not duplicate the code, but use a common underlying method. A modified patch is attached. Best Regards Markus
Attachment
Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]
From
Robert Haas
Date:
On Fri, Apr 8, 2022 at 4:47 AM Markus Wanner <markus.wanner@enterprisedb.com> wrote: > I agree with Michael, it would be nice to not duplicate the code, but > use a common underlying method. A modified patch is attached. I don't think this is better, but I don't think it's worth arguing about, either, so I'll do it this way if nobody objects. Meanwhile, I've committed the patch for master to master. -- Robert Haas EDB: http://www.enterprisedb.com
Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]
From
Robert Haas
Date:
On Fri, Apr 8, 2022 at 11:50 AM Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Apr 8, 2022 at 4:47 AM Markus Wanner > <markus.wanner@enterprisedb.com> wrote: > > I agree with Michael, it would be nice to not duplicate the code, but > > use a common underlying method. A modified patch is attached. > > I don't think this is better, but I don't think it's worth arguing > about, either, so I'll do it this way if nobody objects. > > Meanwhile, I've committed the patch for master to master. Well, I've just realized that Kyotaro Horiguchi volunteered to fix this on an email thread I did not see because of the way Gmail breaks the thread if you change the subject line. And he developed a very similar patch to what we have here. I'm going to use this one as the basis for going forward because I've already studied it in detail and it's less work for me to stick with what I know than to go study something else. But, he also noticed something which we didn't notice here, which is that before v13, the commit in question actually changed the size of PGXACT, which is really quite bad -- it needs to be 12 bytes for performance reasons. And there's no spare bytes available, so I think we should follow one of the suggestions that he had over in that email thread, and put delayChkptEnd in PGPROC even though delayChkpt is in PGXACT. Comments? -- Robert Haas EDB: http://www.enterprisedb.com
Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]
From
Markus Wanner
Date:
On Mon, 2022-04-11 at 15:21 -0400, Robert Haas wrote: > ... before v13, the commit in question actually > changed the size of PGXACT, which is really quite bad -- it needs to > be 12 bytes for performance reasons. And there's no spare bytes > available, so I think we should follow one of the suggestions that he > had over in that email thread, and put delayChkptEnd in PGPROC even > though delayChkpt is in PGXACT. This makes sense to me. Kudos to Kyotaro for considering this. At first read, this sounded like a trade-off between compatibility and performance for PG 12 and older. But I realize leaving delayChkpt in PGXACT and adding just delayChkptEnd to PGPROC is compatible and leaves PGXACT at a size of 12 bytes. So this sounds like a good approach to me. Best Regards Markus
Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]
From
Kyotaro Horiguchi
Date:
(My mailer has been fixed.) At Mon, 11 Apr 2022 21:45:59 +0200, Markus Wanner <markus.wanner@enterprisedb.com> wrote in > On Mon, 2022-04-11 at 15:21 -0400, Robert Haas wrote: > > ... before v13, the commit in question actually > > changed the size of PGXACT, which is really quite bad -- it needs to > > be 12 bytes for performance reasons. And there's no spare bytes > > available, so I think we should follow one of the suggestions that he > > had over in that email thread, and put delayChkptEnd in PGPROC even > > though delayChkpt is in PGXACT. > > This makes sense to me. Kudos to Kyotaro for considering this. > > At first read, this sounded like a trade-off between compatibility and > performance for PG 12 and older. But I realize leaving delayChkpt in > PGXACT and adding just delayChkptEnd to PGPROC is compatible and leaves > PGXACT at a size of 12 bytes. So this sounds like a good approach to > me. Thanks! So, I created the patches for back-patching from 10 to 14. (With fixed a silly bug of the v1-pg14 that HaveVirtualXIDsDelayingChkpt and HaveVirtualXIDsDelayingChkptEnd are inverted..) They revert delayChkpt-related changes made by the patch and add delayChkptEnd stuff. I compared among every pair of the patches for neighbouring versions, to make sure not making the same change in different way and they have the same set of hunks. This version takes the way sharing the common static function (*ChkptGuts) between the functions *Chkpt() and *ChkptEnd(). regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Attachment
Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]
From
Robert Haas
Date:
On Tue, Apr 12, 2022 at 6:55 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > (My mailer has been fixed.) Cool. > So, I created the patches for back-patching from 10 to 14. (With > fixed a silly bug of the v1-pg14 that HaveVirtualXIDsDelayingChkpt and > HaveVirtualXIDsDelayingChkptEnd are inverted..) I am very sorry not to use these, but as I said in my previous post on this thread, I prefer to commit what I wrote and Markus revised rather than these versions. I have done that now. -- Robert Haas EDB: http://www.enterprisedb.com
Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]
From
Kyotaro Horiguchi
Date:
At Thu, 14 Apr 2022 11:13:01 -0400, Robert Haas <robertmhaas@gmail.com> wrote in > On Tue, Apr 12, 2022 at 6:55 AM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > (My mailer has been fixed.) > > Cool. > > > So, I created the patches for back-patching from 10 to 14. (With > > fixed a silly bug of the v1-pg14 that HaveVirtualXIDsDelayingChkpt and > > HaveVirtualXIDsDelayingChkptEnd are inverted..) > > I am very sorry not to use these, but as I said in my previous post on > this thread, I prefer to commit what I wrote and Markus revised rather > than these versions. I have done that now. Not at all. It's just an unfortunate crossing. Thanks for fixing this. regards. -- Kyotaro Horiguchi NTT Open Source Software Center