Thread: [BUG] Failed Assertion in ReorderBufferChangeMemoryUpdate()
Hi hackers,
While working on [1], it appears that (on master) the issue reproduction (of toast rewrite not resetting the toast_hash) is triggering a failed assertion:
#2 0x0000000000b29fab in ExceptionalCondition (conditionName=0xce6850
"(rb->size >= sz) && (txn->size >= sz)", errorType=0xce5f84
"FailedAssertion", fileName=0xce5fd0 "reorderbuffer.c", lineNumber=3141)
at assert.c:69
#3 0x00000000008ff1fb in ReorderBufferChangeMemoryUpdate (rb=0x11a7a40,
change=0x11c94b8, addition=false) at reorderbuffer.c:3141
#4 0x00000000008fab27 in ReorderBufferReturnChange (rb=0x11a7a40,
change=0x11c94b8, upd_mem=true) at reorderbuffer.c:477
#5 0x0000000000902ec1 in ReorderBufferToastReset (rb=0x11a7a40,
txn=0x11b1998) at reorderbuffer.c:4799
#6 0x00000000008faaa2 in ReorderBufferReturnTXN (rb=0x11a7a40,
txn=0x11b1998) at reorderbuffer.c:448
#7 0x00000000008fc95b in ReorderBufferCleanupTXN (rb=0x11a7a40,
txn=0x11b1998) at reorderbuffer.c:1540
while on 12.5 for example, we would get (with the exact same repro):
ERROR: could not open relation with OID 0
The failed assertion is happening in the PG_CATCH() section of ReorderBufferProcessTXN().
We entered PG_CATCH() because elog(ERROR, "could not open relation with OID %u",...) has been triggered in ReorderBufferToastReplace().
But this elog(ERROR,) is being called after ReorderBufferChangeMemoryUpdate() being triggered with "addition" set to false.
As a consequence of elog(ERROR,) then ReorderBufferChangeMemoryUpdate() with "addition" set to true is not called at the end of ReorderBufferToastReplace().
That leads to a subsequent call to ReorderBufferChangeMemoryUpdate() (being triggered by 4daa140a2f adding ReorderBufferToastReset() calls to ReorderBufferReturnTXN()) triggering the failed assertion.
Please find attached a patch proposal to avoid the failed assertion (by ensuring that ReorderBufferChangeMemoryUpdate() being triggered with "addition" set to false in ReorderBufferToastReplace() is done after the elog(ERROR,)).
Adding Amit and Dilip as they are also aware of [1] and have worked on 4daa140a2f.
Also adding this patch in the commitfest.
Thanks
Bertrand
[1]: https://www.postgresql.org/message-id/b5146fb1-ad9e-7d6e-f980-98ed68744a7c@amazon.com
Attachment
On Fri, Aug 13, 2021 at 3:15 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote: > > > Please find attached a patch proposal to avoid the failed assertion (by ensuring that ReorderBufferChangeMemoryUpdate()being triggered with "addition" set to false in ReorderBufferToastReplace() is done afterthe elog(ERROR,)). > The error can occur at multiple places (like via palloc or various other places) between the first time we subtract the change_size and add it back after the change is re-computed. I think the correct fix would be that in the beginning we just compute the change_size by ReorderBufferChangeSize and then after re-computing the change, we just subtract the old change_size and add the new change_size. What do you think? -- With Regards, Amit Kapila.
On Fri, Aug 13, 2021 at 3:15 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
>
>
> Please find attached a patch proposal to avoid the failed assertion (by ensuring that ReorderBufferChangeMemoryUpdate() being triggered with "addition" set to false in ReorderBufferToastReplace() is done after the elog(ERROR,)).
>
The error can occur at multiple places (like via palloc or various
other places) between the first time we subtract the change_size and
add it back after the change is re-computed. I think the correct fix
would be that in the beginning we just compute the change_size by
ReorderBufferChangeSize and then after re-computing the change, we
just subtract the old change_size and add the new change_size. What do
you think?
Hi,
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
On Mon, Sep 6, 2021 at 4:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:On Fri, Aug 13, 2021 at 3:15 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
>
>
> Please find attached a patch proposal to avoid the failed assertion (by ensuring that ReorderBufferChangeMemoryUpdate() being triggered with "addition" set to false in ReorderBufferToastReplace() is done after the elog(ERROR,)).
>
The error can occur at multiple places (like via palloc or various
other places) between the first time we subtract the change_size and
add it back after the change is re-computed. I think the correct fix
would be that in the beginning we just compute the change_size by
ReorderBufferChangeSize and then after re-computing the change, we
just subtract the old change_size and add the new change_size. What do
you think?Yeah, that seems more logical to me.
Thanks for your feedback!
That seems indeed more logical, so I see 3 options to do so:
1) Add a new API say ReorderBufferChangeMemorySubstractSize() (with a Size as one parameter) and make use of it in ReorderBufferToastReplace()
2) Add a new "Size" parameter to ReorderBufferChangeMemoryUpdate(), so that if this parameter is > 0 then it would be used instead of "sz = ReorderBufferChangeSize(change)"
3) Do the substraction directly into ReorderBufferToastReplace() without any API
I'm inclined to go for option 2), what do you think?
Thanks
Bertrand
Thanks for your feedback!
That seems indeed more logical, so I see 3 options to do so:
1) Add a new API say ReorderBufferChangeMemorySubstractSize() (with a Size as one parameter) and make use of it in ReorderBufferToastReplace()
2) Add a new "Size" parameter to ReorderBufferChangeMemoryUpdate(), so that if this parameter is > 0 then it would be used instead of "sz = ReorderBufferChangeSize(change)"
3) Do the substraction directly into ReorderBufferToastReplace() without any API
I'm inclined to go for option 2), what do you think?
On Mon, Sep 6, 2021 at 9:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Mon, Sep 6, 2021 at 8:54 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote: >> >> Thanks for your feedback! >> >> That seems indeed more logical, so I see 3 options to do so: >> >> 1) Add a new API say ReorderBufferChangeMemorySubstractSize() (with a Size as one parameter) and make use of it in ReorderBufferToastReplace() >> >> 2) Add a new "Size" parameter to ReorderBufferChangeMemoryUpdate(), so that if this parameter is > 0 then it would beused instead of "sz = ReorderBufferChangeSize(change)" >> >> 3) Do the substraction directly into ReorderBufferToastReplace() without any API >> >> I'm inclined to go for option 2), what do you think? > Isn't it better if we use option 2) at all places as then we won't need any special check inside ReorderBufferChangeMemoryUpdate()? > Yet another option could be to create a new API say ReorderBufferReplaceChangeMemoryUpdate(), which takes, 2 parameters,oldchange, and newchange as inputs, it will compute the difference and add/subtract that size. Logically, thatis what we are actually trying to do right? i.e. replacing old change with the new change. > Note that in ReorderBufferToastReplace(), the new tuple is replaced in change so by the time we want to do this computation oldchange won't be preserved. -- With Regards, Amit Kapila.
On Mon, Sep 6, 2021 at 9:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Sep 6, 2021 at 8:54 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
>>
>> Thanks for your feedback!
>>
>> That seems indeed more logical, so I see 3 options to do so:
>>
>> 1) Add a new API say ReorderBufferChangeMemorySubstractSize() (with a Size as one parameter) and make use of it in ReorderBufferToastReplace()
>>
>> 2) Add a new "Size" parameter to ReorderBufferChangeMemoryUpdate(), so that if this parameter is > 0 then it would be used instead of "sz = ReorderBufferChangeSize(change)"
>>
>> 3) Do the substraction directly into ReorderBufferToastReplace() without any API
>>
>> I'm inclined to go for option 2), what do you think?
>
Isn't it better if we use option 2) at all places as then we won't
need any special check inside ReorderBufferChangeMemoryUpdate()?
> Yet another option could be to create a new API say ReorderBufferReplaceChangeMemoryUpdate(), which takes, 2 parameters, oldchange, and newchange as inputs, it will compute the difference and add/subtract that size. Logically, that is what we are actually trying to do right? i.e. replacing old change with the new change.
>
Note that in ReorderBufferToastReplace(), the new tuple is replaced in
change so by the time we want to do this computation oldchange won't
be preserved.
On Tue, Sep 7, 2021 at 11:08 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Sep 7, 2021 at 8:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> On Mon, Sep 6, 2021 at 9:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: >> > >> > On Mon, Sep 6, 2021 at 8:54 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote: >> >> >> >> Thanks for your feedback! >> >> >> >> That seems indeed more logical, so I see 3 options to do so: >> >> >> >> 1) Add a new API say ReorderBufferChangeMemorySubstractSize() (with a Size as one parameter) and make use of it inReorderBufferToastReplace() >> >> >> >> 2) Add a new "Size" parameter to ReorderBufferChangeMemoryUpdate(), so that if this parameter is > 0 then it wouldbe used instead of "sz = ReorderBufferChangeSize(change)" >> >> >> >> 3) Do the substraction directly into ReorderBufferToastReplace() without any API >> >> >> >> I'm inclined to go for option 2), what do you think? >> > >> >> Isn't it better if we use option 2) at all places as then we won't >> need any special check inside ReorderBufferChangeMemoryUpdate()? > > > If we want to do this then be careful about REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID change. Basically, ReorderBufferChangeMemoryUpdate()ignores this type of change whereas ReorderBufferChangeSize(), consider at least sizeof(ReorderBufferChange)bytes to this change. So if we compute the size using ReorderBufferChangeSize() outside of ReorderBufferChangeMemoryUpdate(),then total size will be different from what we have now. Logically, we should be ignoring/assertingREORDER_BUFFER_CHANGE_INTERNAL_TUPLECID in ReorderBufferChangeSize(), because ReorderBufferChangeMemoryUpdate()is the only caller for this function. > Why can't we simply ignore it in ReorderBufferChangeMemoryUpdate() as we are doing now? -- With Regards, Amit Kapila.
>> Isn't it better if we use option 2) at all places as then we won't
>> need any special check inside ReorderBufferChangeMemoryUpdate()?
>
>
> If we want to do this then be careful about REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID change. Basically, ReorderBufferChangeMemoryUpdate() ignores this type of change whereas ReorderBufferChangeSize(), consider at least sizeof(ReorderBufferChange) bytes to this change. So if we compute the size using ReorderBufferChangeSize() outside of ReorderBufferChangeMemoryUpdate(), then total size will be different from what we have now. Logically, we should be ignoring/asserting REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID in ReorderBufferChangeSize(), because ReorderBufferChangeMemoryUpdate() is the only caller for this function.
>
Why can't we simply ignore it in ReorderBufferChangeMemoryUpdate() as
we are doing now?
Hi,
On Tue, Sep 7, 2021 at 11:10 AM Amit Kapila <amit.kapila16@gmail.com> wrote:>> Isn't it better if we use option 2) at all places as then we won't
>> need any special check inside ReorderBufferChangeMemoryUpdate()?
>
>
> If we want to do this then be careful about REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID change. Basically, ReorderBufferChangeMemoryUpdate() ignores this type of change whereas ReorderBufferChangeSize(), consider at least sizeof(ReorderBufferChange) bytes to this change. So if we compute the size using ReorderBufferChangeSize() outside of ReorderBufferChangeMemoryUpdate(), then total size will be different from what we have now. Logically, we should be ignoring/asserting REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID in ReorderBufferChangeSize(), because ReorderBufferChangeMemoryUpdate() is the only caller for this function.
>
Why can't we simply ignore it in ReorderBufferChangeMemoryUpdate() as
we are doing now?Yeah right, we can actually do that, it doesn't matter even if we are passing the size from outside.
Agree, if no objections, I'll prepare a patch with the modified approach of option 2) proposed by Amit (means passing the size from the outside in all the cases).
Thanks
Bertrand
On Tue, Sep 7, 2021 at 11:33 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote: > > Hi, > > On 9/7/21 7:58 AM, Dilip Kumar wrote: > > > On Tue, Sep 7, 2021 at 11:10 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > >> >> Isn't it better if we use option 2) at all places as then we won't >> >> need any special check inside ReorderBufferChangeMemoryUpdate()? >> > >> > >> > If we want to do this then be careful about REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID change. Basically, ReorderBufferChangeMemoryUpdate()ignores this type of change whereas ReorderBufferChangeSize(), consider at least sizeof(ReorderBufferChange)bytes to this change. So if we compute the size using ReorderBufferChangeSize() outside of ReorderBufferChangeMemoryUpdate(),then total size will be different from what we have now. Logically, we should be ignoring/assertingREORDER_BUFFER_CHANGE_INTERNAL_TUPLECID in ReorderBufferChangeSize(), because ReorderBufferChangeMemoryUpdate()is the only caller for this function. >> > >> >> Why can't we simply ignore it in ReorderBufferChangeMemoryUpdate() as >> we are doing now? > > > Yeah right, we can actually do that, it doesn't matter even if we are passing the size from outside. > > Agree, if no objections, I'll prepare a patch with the modified approach of option 2) proposed by Amit (means passing thesize from the outside in all the cases). > Sounds reasonable. Another point that needs some thought is do we want to backpatch this change till v13? I am not sure if there is any user-visible bug here but maybe it is still good to fix this in back branches. What do you think? -- With Regards, Amit Kapila.
On 9/7/21 8:51 AM, Amit Kapila wrote: > On Tue, Sep 7, 2021 at 11:33 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote: >> Hi, >> >> On 9/7/21 7:58 AM, Dilip Kumar wrote: >> >> >> On Tue, Sep 7, 2021 at 11:10 AM Amit Kapila <amit.kapila16@gmail.com> wrote: >> >>>>> Isn't it better if we use option 2) at all places as then we won't >>>>> need any special check inside ReorderBufferChangeMemoryUpdate()? >>>> >>>> If we want to do this then be careful about REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID change. Basically, ReorderBufferChangeMemoryUpdate()ignores this type of change whereas ReorderBufferChangeSize(), consider at least sizeof(ReorderBufferChange)bytes to this change. So if we compute the size using ReorderBufferChangeSize() outside of ReorderBufferChangeMemoryUpdate(),then total size will be different from what we have now. Logically, we should be ignoring/assertingREORDER_BUFFER_CHANGE_INTERNAL_TUPLECID in ReorderBufferChangeSize(), because ReorderBufferChangeMemoryUpdate()is the only caller for this function. >>>> >>> Why can't we simply ignore it in ReorderBufferChangeMemoryUpdate() as >>> we are doing now? >> >> Yeah right, we can actually do that, it doesn't matter even if we are passing the size from outside. >> >> Agree, if no objections, I'll prepare a patch with the modified approach of option 2) proposed by Amit (means passingthe size from the outside in all the cases). >> > Sounds reasonable. Another point that needs some thought is do we want > to backpatch this change till v13? I am not sure if there is any > user-visible bug here but maybe it is still good to fix this in back > branches. What do you think? Yes, +1 to backpatch till v13 "per precaution". I will first come back with a proposed version for master and once we agree on a final/polished version then I'll do the backpatch ones (if needed). Thanks Bertrand
Re: [UNVERIFIED SENDER] Re: [BUG] Failed Assertion in ReorderBufferChangeMemoryUpdate()
On 9/7/21 9:11 AM, Drouvot, Bertrand wrote: > > On 9/7/21 8:51 AM, Amit Kapila wrote: >> On Tue, Sep 7, 2021 at 11:33 AM Drouvot, Bertrand >> <bdrouvot@amazon.com> wrote: >>> Hi, >>> >>> On 9/7/21 7:58 AM, Dilip Kumar wrote: >>> >>> >>> On Tue, Sep 7, 2021 at 11:10 AM Amit Kapila >>> <amit.kapila16@gmail.com> wrote: >>> >>>>>> Isn't it better if we use option 2) at all places as then we won't >>>>>> need any special check inside ReorderBufferChangeMemoryUpdate()? >>>>> >>>>> If we want to do this then be careful about >>>>> REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID change. Basically, >>>>> ReorderBufferChangeMemoryUpdate() ignores this type of change >>>>> whereas ReorderBufferChangeSize(), consider at least >>>>> sizeof(ReorderBufferChange) bytes to this change. So if we >>>>> compute the size using ReorderBufferChangeSize() outside of >>>>> ReorderBufferChangeMemoryUpdate(), then total size will be >>>>> different from what we have now. Logically, we should be >>>>> ignoring/asserting REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID in >>>>> ReorderBufferChangeSize(), because >>>>> ReorderBufferChangeMemoryUpdate() is the only caller for this >>>>> function. >>>>> >>>> Why can't we simply ignore it in ReorderBufferChangeMemoryUpdate() as >>>> we are doing now? >>> >>> Yeah right, we can actually do that, it doesn't matter even if we >>> are passing the size from outside. >>> >>> Agree, if no objections, I'll prepare a patch with the modified >>> approach of option 2) proposed by Amit (means passing the size from >>> the outside in all the cases). >>> >> Sounds reasonable. Another point that needs some thought is do we want >> to backpatch this change till v13? I am not sure if there is any >> user-visible bug here but maybe it is still good to fix this in back >> branches. What do you think? > > Yes, +1 to backpatch till v13 "per precaution". > > I will first come back with a proposed version for master and once we > agree on a final/polished version then I'll do the backpatch ones (if > needed). Please find enclosed patch v2 (for the master branch) implementing the modified approach of option 2) proposed by Amit. Thanks Bertrand
Attachment
Re: [UNVERIFIED SENDER] Re: [BUG] Failed Assertion in ReorderBufferChangeMemoryUpdate()
On Tue, Sep 7, 2021 at 2:02 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote: > > On 9/7/21 9:11 AM, Drouvot, Bertrand wrote: > > > > Please find enclosed patch v2 (for the master branch) implementing the > modified approach of option 2) proposed by Amit. > The patch looks good to me. I have made a minor modification in the comment to make it explicit why we are not subtracting the size immediately and ran pgindent. Kindly prepare back-branch patches and test the same. -- With Regards, Amit Kapila.
Attachment
Hi, On 9/9/21 1:16 PM, Amit Kapila wrote: > On Tue, Sep 7, 2021 at 2:02 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote: >> On 9/7/21 9:11 AM, Drouvot, Bertrand wrote: >> Please find enclosed patch v2 (for the master branch) implementing the >> modified approach of option 2) proposed by Amit. >> > The patch looks good to me. I have made a minor modification in the > comment to make it explicit why we are not subtracting the size > immediately Thanks! Indeed, good point to also mention in the comment why it has been done that way. > Kindly prepare back-branch patches and > test the same. Please find enclosed the modified (and tested) version for the REL_13_STABLE branch (the master patch version also applied on REL_14_STABLE). Thanks Bertrand
Attachment
Hi, On 9/9/21 3:16 PM, Drouvot, Bertrand wrote: > Hi, > > On 9/9/21 1:16 PM, Amit Kapila wrote: >> On Tue, Sep 7, 2021 at 2:02 PM Drouvot, Bertrand >> <bdrouvot@amazon.com> wrote: >>> On 9/7/21 9:11 AM, Drouvot, Bertrand wrote: >>> Please find enclosed patch v2 (for the master branch) implementing the >>> modified approach of option 2) proposed by Amit. >>> >> The patch looks good to me. I have made a minor modification in the >> comment to make it explicit why we are not subtracting the size >> immediately > > Thanks! > > Indeed, good point to also mention in the comment why it has been done > that way. > >> Kindly prepare back-branch patches and >> test the same. > > Please find enclosed the modified (and tested) version for the > REL_13_STABLE branch (the master patch version also applied on > REL_14_STABLE). > I've seen that the patches have been committed, thanks! I'm marking the corresponding CF entry as committed. Thanks Bertrand
On 2021-Sep-06, Amit Kapila wrote: > The error can occur at multiple places (like via palloc or various > other places) between the first time we subtract the change_size and > add it back after the change is re-computed. I think the correct fix > would be that in the beginning we just compute the change_size by > ReorderBufferChangeSize and then after re-computing the change, we > just subtract the old change_size and add the new change_size. What do > you think? Am I the only that that thinks this code is doing far too much in a PG_CATCH block? -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/ "Those who use electric razors are infidels destined to burn in hell while we drink from rivers of beer, download free vids and mingle with naked well shaved babes." (http://slashdot.org/comments.pl?sid=44793&cid=4647152)
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Am I the only that that thinks this code is doing far too much in a > PG_CATCH block? You mean the one in ReorderBufferProcessTXN? Yeah, that is mighty ugly. It might be all right given that it almost immediately does AbortCurrentTransaction, since that should basically clean up whatever is wrong. But I'm still full of questions after a brief look at it. * What is the sense in calling RollbackAndReleaseCurrentSubTransaction after having done AbortCurrentTransaction? * Is it really sane that we re-throw the error in some cases and not others? What is likely to catch that thrown error, and is it going to be prepared for us having already aborted the transaction? (It doesn't give me a warm feeling that the code coverage report shows that path to be un-exercised.) regards, tom lane
On Mon, Sep 13, 2021 at 10:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > Am I the only that that thinks this code is doing far too much in a > > PG_CATCH block? > > You mean the one in ReorderBufferProcessTXN? Yeah, that is mighty > ugly. It might be all right given that it almost immediately does > AbortCurrentTransaction, since that should basically clean up > whatever is wrong. But I'm still full of questions after a brief > look at it. > > * What is the sense in calling RollbackAndReleaseCurrentSubTransaction > after having done AbortCurrentTransaction? > It is required for the cases where we are starting internal sub-transaction (when decoding via SQL SRF). The first AbortCurrentTransaction will make the current subtransaction state to TBLOCK_SUBABORT and then RollbackAndReleaseCurrentSubTransaction will pop and release the current subtransaction. Note that the same sequence is performed outside catch and if we comment out RollbackAndReleaseCurrentSubTransaction, there are a lot of failures in test_decoding. I have manually debugged and checked that if we don't call RollbackAndReleaseCurrentSubTransaction in the catch block, it will retain the transaction in a bad state which on the next operation leads to a FATAL error. > * Is it really sane that we re-throw the error in some cases and > not others? What is likely to catch that thrown error, and is it > going to be prepared for us having already aborted the transaction? > There are two different ways which deal with the re-thrown error. For WALSender processes, we don't start internal subtransaction and on error, they simply exist. For SQL SRFs, we start internal transactions which will be released in the catch block, and then the top transaction will be aborted after we re-throw the error. The cases where we don't immediately rethrow are specifically for streaming transactions where we might have already sent some changes to the subscriber and we want to continue processing and send the abort to subscribers. -- With Regards, Amit Kapila.