Thread: Fix memory counter update in reorderbuffer
Hi, I found a bug in the memory counter update in reorderbuffer. It was introduced by commit 5bec1d6bc5e, so pg17 and master are affected. In ReorderBufferCleanupTXN() we zero the transaction size and then free the transaction entry as follows: /* Update the memory counter */ ReorderBufferChangeMemoryUpdate(rb, NULL, txn, false, txn->size); /* deallocate */ ReorderBufferReturnTXN(rb, txn); However, if the transaction entry has toast changes we free them in ReorderBufferToastReset() called from ReorderBufferToastReset(), and end up subtracting the memory usage from zero. Which results in an assertion failure: TRAP: failed Assert("(rb->size >= sz) && (txn->size >= sz)"), File: "reorderbuffer.c" This can happen for example if an error occurs while replaying transaction changes including toast changes. I've attached the patch that fixes the problem and includes a test case (the test part might not be committed as it slows the test case). Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
On Sat, Aug 3, 2024 at 1:21 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > I found a bug in the memory counter update in reorderbuffer. It was > introduced by commit 5bec1d6bc5e, so pg17 and master are affected. > > In ReorderBufferCleanupTXN() we zero the transaction size and then > free the transaction entry as follows: > > /* Update the memory counter */ > ReorderBufferChangeMemoryUpdate(rb, NULL, txn, false, txn->size); > > /* deallocate */ > ReorderBufferReturnTXN(rb, txn); > Why do we need to zero the transaction size explicitly? Shouldn't it automatically become zero after freeing all the changes? > However, if the transaction entry has toast changes we free them in > ReorderBufferToastReset() called from ReorderBufferToastReset(), > Here, you mean ReorderBufferToastReset() called from ReorderBufferReturnTXN(), right? BTW, commit 5bec1d6bc5e also introduced ReorderBufferChangeMemoryUpdate() in ReorderBufferTruncateTXN() which is also worth considering while fixing the reported problem. It may not have the same problem as you have reported but we can consider whether setting txn size as zero explicitly is required or not. -- With Regards, Amit Kapila.
On Tue, Aug 6, 2024 at 1:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Aug 3, 2024 at 1:21 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > I found a bug in the memory counter update in reorderbuffer. It was > > introduced by commit 5bec1d6bc5e, so pg17 and master are affected. > > > > In ReorderBufferCleanupTXN() we zero the transaction size and then > > free the transaction entry as follows: > > > > /* Update the memory counter */ > > ReorderBufferChangeMemoryUpdate(rb, NULL, txn, false, txn->size); > > > > /* deallocate */ > > ReorderBufferReturnTXN(rb, txn); > > > > Why do we need to zero the transaction size explicitly? Shouldn't it > automatically become zero after freeing all the changes? It will become zero after freeing all the changes. However, since updating the max-heap when freeing each change could bring some overhead, we freed the changes without updating the memory counter, and then zerod it. > > > However, if the transaction entry has toast changes we free them in > > ReorderBufferToastReset() called from ReorderBufferToastReset(), > > > > Here, you mean ReorderBufferToastReset() called from > ReorderBufferReturnTXN(), right? Right. Thank you for pointing it out. > BTW, commit 5bec1d6bc5e also introduced > ReorderBufferChangeMemoryUpdate() in ReorderBufferTruncateTXN() which > is also worth considering while fixing the reported problem. It may > not have the same problem as you have reported but we can consider > whether setting txn size as zero explicitly is required or not. The reason why it introduced ReorderBufferChangeMemoryUpdate() is the same as I mentioned above. And yes, as you mentioned, it doesn't have the same problem that I reported here. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Wed, Aug 7, 2024 at 7:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Tue, Aug 6, 2024 at 1:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Sat, Aug 3, 2024 at 1:21 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > I found a bug in the memory counter update in reorderbuffer. It was > > > introduced by commit 5bec1d6bc5e, so pg17 and master are affected. > > > > > > In ReorderBufferCleanupTXN() we zero the transaction size and then > > > free the transaction entry as follows: > > > > > > /* Update the memory counter */ > > > ReorderBufferChangeMemoryUpdate(rb, NULL, txn, false, txn->size); > > > > > > /* deallocate */ > > > ReorderBufferReturnTXN(rb, txn); > > > > > > > Why do we need to zero the transaction size explicitly? Shouldn't it > > automatically become zero after freeing all the changes? > > It will become zero after freeing all the changes. However, since > updating the max-heap when freeing each change could bring some > overhead, we freed the changes without updating the memory counter, > and then zerod it. > I think this should be covered in comments as it is not apparent. > > > BTW, commit 5bec1d6bc5e also introduced > > ReorderBufferChangeMemoryUpdate() in ReorderBufferTruncateTXN() which > > is also worth considering while fixing the reported problem. It may > > not have the same problem as you have reported but we can consider > > whether setting txn size as zero explicitly is required or not. > > The reason why it introduced ReorderBufferChangeMemoryUpdate() is the > same as I mentioned above. And yes, as you mentioned, it doesn't have > the same problem that I reported here. > I checked again and found that ReorderBufferResetTXN() first calls ReorderBufferTruncateTXN() and then ReorderBufferToastReset(). After that, it also tries to free spec_insert change which should have the same problem. So, what saves this path from the same problem? * + /* + * Update the memory counter of the transaction, removing it from + * the max-heap. + */ + ReorderBufferChangeMemoryUpdate(rb, NULL, txn, false, txn->size); + Assert(txn->size == 0); + pfree(txn); Just before freeing the TXN, updating the size looks odd though I understand the idea is to remove TXN from max_heap. Anyway, let's first discuss whether the same problem exists in ReorderBufferResetTXN() code path, and if so, how we want to fix it. -- With Regards, Amit Kapila.
On Wed, Aug 7, 2024 at 3:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Aug 7, 2024 at 7:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Tue, Aug 6, 2024 at 1:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Sat, Aug 3, 2024 at 1:21 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > I found a bug in the memory counter update in reorderbuffer. It was > > > > introduced by commit 5bec1d6bc5e, so pg17 and master are affected. > > > > > > > > In ReorderBufferCleanupTXN() we zero the transaction size and then > > > > free the transaction entry as follows: > > > > > > > > /* Update the memory counter */ > > > > ReorderBufferChangeMemoryUpdate(rb, NULL, txn, false, txn->size); > > > > > > > > /* deallocate */ > > > > ReorderBufferReturnTXN(rb, txn); > > > > > > > > > > Why do we need to zero the transaction size explicitly? Shouldn't it > > > automatically become zero after freeing all the changes? > > > > It will become zero after freeing all the changes. However, since > > updating the max-heap when freeing each change could bring some > > overhead, we freed the changes without updating the memory counter, > > and then zerod it. > > > > I think this should be covered in comments as it is not apparent. Agreed. > > > > > > BTW, commit 5bec1d6bc5e also introduced > > > ReorderBufferChangeMemoryUpdate() in ReorderBufferTruncateTXN() which > > > is also worth considering while fixing the reported problem. It may > > > not have the same problem as you have reported but we can consider > > > whether setting txn size as zero explicitly is required or not. > > > > The reason why it introduced ReorderBufferChangeMemoryUpdate() is the > > same as I mentioned above. And yes, as you mentioned, it doesn't have > > the same problem that I reported here. > > > > I checked again and found that ReorderBufferResetTXN() first calls > ReorderBufferTruncateTXN() and then ReorderBufferToastReset(). After > that, it also tries to free spec_insert change which should have the > same problem. So, what saves this path from the same problem? Good catch. I've not created a test case for that but it seems to be possible to happen. I think that subtracting txn->size to reduce the memory counter to zero seems to be a wrong idea in the first place. If we want to save updating memory counter and max-heap, we should use the exact memory size that we freed. In other words, just change the memory usage update to a batch operation. > > * > + /* > + * Update the memory counter of the transaction, removing it from > + * the max-heap. > + */ > + ReorderBufferChangeMemoryUpdate(rb, NULL, txn, false, txn->size); > + Assert(txn->size == 0); > + > pfree(txn); > > Just before freeing the TXN, updating the size looks odd though I > understand the idea is to remove TXN from max_heap. Anyway, let's > first discuss whether the same problem exists in > ReorderBufferResetTXN() code path, and if so, how we want to fix it. Agreed. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Thu, Aug 8, 2024 at 9:43 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Wed, Aug 7, 2024 at 3:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Aug 7, 2024 at 7:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > > BTW, commit 5bec1d6bc5e also introduced > > > > ReorderBufferChangeMemoryUpdate() in ReorderBufferTruncateTXN() which > > > > is also worth considering while fixing the reported problem. It may > > > > not have the same problem as you have reported but we can consider > > > > whether setting txn size as zero explicitly is required or not. > > > > > > The reason why it introduced ReorderBufferChangeMemoryUpdate() is the > > > same as I mentioned above. And yes, as you mentioned, it doesn't have > > > the same problem that I reported here. > > > > > > > I checked again and found that ReorderBufferResetTXN() first calls > > ReorderBufferTruncateTXN() and then ReorderBufferToastReset(). After > > that, it also tries to free spec_insert change which should have the > > same problem. So, what saves this path from the same problem? > > Good catch. I've not created a test case for that but it seems to be > possible to happen. > > I think that subtracting txn->size to reduce the memory counter to > zero seems to be a wrong idea in the first place. If we want to save > updating memory counter and max-heap, we should use the exact memory > size that we freed. In other words, just change the memory usage > update to a batch operation. > Sounds reasonable but how would you find the size for a batch update operation? Are you planning to track it while freeing the individual changes? -- With Regards, Amit Kapila.
On Fri, Aug 9, 2024 at 3:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Aug 8, 2024 at 9:43 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Wed, Aug 7, 2024 at 3:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Wed, Aug 7, 2024 at 7:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > > > > > > BTW, commit 5bec1d6bc5e also introduced > > > > > ReorderBufferChangeMemoryUpdate() in ReorderBufferTruncateTXN() which > > > > > is also worth considering while fixing the reported problem. It may > > > > > not have the same problem as you have reported but we can consider > > > > > whether setting txn size as zero explicitly is required or not. > > > > > > > > The reason why it introduced ReorderBufferChangeMemoryUpdate() is the > > > > same as I mentioned above. And yes, as you mentioned, it doesn't have > > > > the same problem that I reported here. > > > > > > > > > > I checked again and found that ReorderBufferResetTXN() first calls > > > ReorderBufferTruncateTXN() and then ReorderBufferToastReset(). After > > > that, it also tries to free spec_insert change which should have the > > > same problem. So, what saves this path from the same problem? > > > > Good catch. I've not created a test case for that but it seems to be > > possible to happen. > > > > I think that subtracting txn->size to reduce the memory counter to > > zero seems to be a wrong idea in the first place. If we want to save > > updating memory counter and max-heap, we should use the exact memory > > size that we freed. In other words, just change the memory usage > > update to a batch operation. > > > > Sounds reasonable but how would you find the size for a batch update > operation? Are you planning to track it while freeing the individual > changes? Yes, one idea is to make ReorderBufferReturnChange() return the memory size that it just freed. Then the caller who wants to update the memory counter in a batch sums up the memory size. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Sat, Aug 10, 2024 at 5:48 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Fri, Aug 9, 2024 at 3:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Aug 8, 2024 at 9:43 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > On Wed, Aug 7, 2024 at 3:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Wed, Aug 7, 2024 at 7:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > > > > > > > > > > BTW, commit 5bec1d6bc5e also introduced > > > > > > ReorderBufferChangeMemoryUpdate() in ReorderBufferTruncateTXN() which > > > > > > is also worth considering while fixing the reported problem. It may > > > > > > not have the same problem as you have reported but we can consider > > > > > > whether setting txn size as zero explicitly is required or not. > > > > > > > > > > The reason why it introduced ReorderBufferChangeMemoryUpdate() is the > > > > > same as I mentioned above. And yes, as you mentioned, it doesn't have > > > > > the same problem that I reported here. > > > > > > > > > > > > > I checked again and found that ReorderBufferResetTXN() first calls > > > > ReorderBufferTruncateTXN() and then ReorderBufferToastReset(). After > > > > that, it also tries to free spec_insert change which should have the > > > > same problem. So, what saves this path from the same problem? > > > > > > Good catch. I've not created a test case for that but it seems to be > > > possible to happen. > > > > > > I think that subtracting txn->size to reduce the memory counter to > > > zero seems to be a wrong idea in the first place. If we want to save > > > updating memory counter and max-heap, we should use the exact memory > > > size that we freed. In other words, just change the memory usage > > > update to a batch operation. > > > > > > > Sounds reasonable but how would you find the size for a batch update > > operation? Are you planning to track it while freeing the individual > > changes? > > Yes, one idea is to make ReorderBufferReturnChange() return the memory > size that it just freed. Then the caller who wants to update the > memory counter in a batch sums up the memory size. I've drafted the patch. I didn't change ReorderBufferReturnChange() but called ReorderBufferChangeSize() for individual change instead, as it's simpler.gi Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
On Tue, Aug 20, 2024 at 5:55 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Fri, Aug 16, 2024 at 12:22 AM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote: > > > > Thank you for testing the patch! > > I'm slightly hesitant to add a test under src/test/subscription since > it's a bug in ReorderBuffer and not specific to logical replication. > If we reasonably cannot add a test under contrib/test_decoding, I'm > okay with adding it under src/test/subscription. > Sounds like a reasonable approach. > I've attached the updated patch with the commit message (but without a > test case for now). > The patch LGTM except for one minor comment. + /* All changes must be returned */ + Assert(txn->size == 0); Isn't it better to say: "All changes must be deallocated." in the above comment? -- With Regards, Amit Kapila.
On Fri, Aug 23, 2024 at 3:44 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Tue, Aug 20, 2024 at 9:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > I've updated the updated patch with regression tests. > LGTM. -- With Regards, Amit Kapila.
On Sun, Aug 25, 2024 at 9:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Aug 23, 2024 at 3:44 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Tue, Aug 20, 2024 at 9:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > I've updated the updated patch with regression tests. > > > > LGTM. Thank you for reviewing the patch. Pushed. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Mon, Aug 26, 2024 at 11:25:53AM -0700, Masahiko Sawada wrote: > Thank you for reviewing the patch. Pushed. nitpick: I think this one needs a pgindent run. -- nathan
On Mon, Aug 26, 2024 at 2:27 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Mon, Aug 26, 2024 at 11:25:53AM -0700, Masahiko Sawada wrote: > > Thank you for reviewing the patch. Pushed. > > nitpick: I think this one needs a pgindent run. Thank you for pointing it out. I forgot to check with pgindent. I've pushed the change. Hope it fixes the problem. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com