Thread: Skip collecting decoded changes of already-aborted transactions
Hi, In logical decoding, we don't need to collect decoded changes of aborted transactions. While streaming changes, we can detect concurrent abort of the (sub)transaction but there is no mechanism to skip decoding changes of transactions that are known to already be aborted. With the attached WIP patch, we check CLOG when decoding the transaction for the first time. If it's already known to be aborted, we skip collecting decoded changes of such transactions. That way, when the logical replication is behind or restarts, we don't need to decode large transactions that already aborted, which helps improve the decoding performance. Feedback is very welcome. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
Hi, On 2023-06-09 14:16:44 +0900, Masahiko Sawada wrote: > In logical decoding, we don't need to collect decoded changes of > aborted transactions. While streaming changes, we can detect > concurrent abort of the (sub)transaction but there is no mechanism to > skip decoding changes of transactions that are known to already be > aborted. With the attached WIP patch, we check CLOG when decoding the > transaction for the first time. If it's already known to be aborted, > we skip collecting decoded changes of such transactions. That way, > when the logical replication is behind or restarts, we don't need to > decode large transactions that already aborted, which helps improve > the decoding performance. It's very easy to get uses of TransactionIdDidAbort() wrong. For one, it won't return true when a transaction was implicitly aborted due to a crash / restart. You're also supposed to use it only after a preceding TransactionIdIsInProgress() call. I'm not sure there are issues with not checking TransactionIdIsInProgress() first in this case, but I'm also not sure there aren't. A separate issue is that TransactionIdDidAbort() can end up being very slow if a lot of transactions are in progress concurrently. As soon as the clog buffers are extended all time is spent copying pages from the kernel pagecache. I'd not at all be surprised if this changed causes a substantial slowdown in workloads with lots of small transactions, where most transactions commit. Greetings, Andres Freund
On Sun, Jun 11, 2023 at 5:31 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2023-06-09 14:16:44 +0900, Masahiko Sawada wrote: > > In logical decoding, we don't need to collect decoded changes of > > aborted transactions. While streaming changes, we can detect > > concurrent abort of the (sub)transaction but there is no mechanism to > > skip decoding changes of transactions that are known to already be > > aborted. With the attached WIP patch, we check CLOG when decoding the > > transaction for the first time. If it's already known to be aborted, > > we skip collecting decoded changes of such transactions. That way, > > when the logical replication is behind or restarts, we don't need to > > decode large transactions that already aborted, which helps improve > > the decoding performance. > Thank you for the comment. > It's very easy to get uses of TransactionIdDidAbort() wrong. For one, it won't > return true when a transaction was implicitly aborted due to a crash / > restart. You're also supposed to use it only after a preceding > TransactionIdIsInProgress() call. > > I'm not sure there are issues with not checking TransactionIdIsInProgress() > first in this case, but I'm also not sure there aren't. Yeah, it seems to be better to use !TransactionIdDidCommit() with a preceding TransactionIdIsInProgress() check. > > A separate issue is that TransactionIdDidAbort() can end up being very slow if > a lot of transactions are in progress concurrently. As soon as the clog > buffers are extended all time is spent copying pages from the kernel > pagecache. I'd not at all be surprised if this changed causes a substantial > slowdown in workloads with lots of small transactions, where most transactions > commit. > Indeed. So it should check the transaction status less frequently. It doesn't benefit much even if we can skip collecting decoded changes of small transactions. Another idea is that we check the status of only large transactions. That is, when the size of decoded changes of an aborted transaction exceeds logical_decoding_work_mem, we mark it as aborted , free its changes decoded so far, and skip further collection. Regards -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Tue, Jun 13, 2023 at 2:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Sun, Jun 11, 2023 at 5:31 AM Andres Freund <andres@anarazel.de> wrote: > > > > A separate issue is that TransactionIdDidAbort() can end up being very slow if > > a lot of transactions are in progress concurrently. As soon as the clog > > buffers are extended all time is spent copying pages from the kernel > > pagecache. I'd not at all be surprised if this changed causes a substantial > > slowdown in workloads with lots of small transactions, where most transactions > > commit. > > > > Indeed. So it should check the transaction status less frequently. It > doesn't benefit much even if we can skip collecting decoded changes of > small transactions. Another idea is that we check the status of only > large transactions. That is, when the size of decoded changes of an > aborted transaction exceeds logical_decoding_work_mem, we mark it as > aborted , free its changes decoded so far, and skip further > collection. > Your idea might work for large transactions but I have not come across reports where this is reported as a problem. Do you see any such reports and can we see how much is the benefit with large transactions? Because we do have the handling of concurrent aborts during sys table scans and that might help sometimes for large transactions. -- With Regards, Amit Kapila.
On Thu, Jun 15, 2023 at 7:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jun 13, 2023 at 2:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Sun, Jun 11, 2023 at 5:31 AM Andres Freund <andres@anarazel.de> wrote: > > > > > > A separate issue is that TransactionIdDidAbort() can end up being very slow if > > > a lot of transactions are in progress concurrently. As soon as the clog > > > buffers are extended all time is spent copying pages from the kernel > > > pagecache. I'd not at all be surprised if this changed causes a substantial > > > slowdown in workloads with lots of small transactions, where most transactions > > > commit. > > > > > > > Indeed. So it should check the transaction status less frequently. It > > doesn't benefit much even if we can skip collecting decoded changes of > > small transactions. Another idea is that we check the status of only > > large transactions. That is, when the size of decoded changes of an > > aborted transaction exceeds logical_decoding_work_mem, we mark it as > > aborted , free its changes decoded so far, and skip further > > collection. > > > > Your idea might work for large transactions but I have not come across > reports where this is reported as a problem. Do you see any such > reports and can we see how much is the benefit with large > transactions? Because we do have the handling of concurrent aborts > during sys table scans and that might help sometimes for large > transactions. I've heard there was a case where a user had 29 million deletes in a single transaction with each one wrapped in a savepoint and rolled it back, which led to 11TB of spill files. If decoding such a large transaction fails for some reasons (e.g. a disk full), it would try decoding the same transaction again and again. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Wed, Jun 21, 2023 at 8:12 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Jun 15, 2023 at 7:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Jun 13, 2023 at 2:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > On Sun, Jun 11, 2023 at 5:31 AM Andres Freund <andres@anarazel.de> wrote: > > > > > > > > A separate issue is that TransactionIdDidAbort() can end up being very slow if > > > > a lot of transactions are in progress concurrently. As soon as the clog > > > > buffers are extended all time is spent copying pages from the kernel > > > > pagecache. I'd not at all be surprised if this changed causes a substantial > > > > slowdown in workloads with lots of small transactions, where most transactions > > > > commit. > > > > > > > > > > Indeed. So it should check the transaction status less frequently. It > > > doesn't benefit much even if we can skip collecting decoded changes of > > > small transactions. Another idea is that we check the status of only > > > large transactions. That is, when the size of decoded changes of an > > > aborted transaction exceeds logical_decoding_work_mem, we mark it as > > > aborted , free its changes decoded so far, and skip further > > > collection. > > > > > > > Your idea might work for large transactions but I have not come across > > reports where this is reported as a problem. Do you see any such > > reports and can we see how much is the benefit with large > > transactions? Because we do have the handling of concurrent aborts > > during sys table scans and that might help sometimes for large > > transactions. > > I've heard there was a case where a user had 29 million deletes in a > single transaction with each one wrapped in a savepoint and rolled it > back, which led to 11TB of spill files. If decoding such a large > transaction fails for some reasons (e.g. a disk full), it would try > decoding the same transaction again and again. > I was thinking why the existing handling of concurrent aborts doesn't handle such a case and it seems that we check that only on catalog access. However, in your case, the user probably is accessing the same relation without any concurrent DDL on the same table, so it would just be a cache look-up for catalogs. Your idea of checking aborts every logical_decoding_work_mem should work for such cases. -- With Regards, Amit Kapila.
On Fri, Jun 9, 2023 at 10:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Hi, > > In logical decoding, we don't need to collect decoded changes of > aborted transactions. While streaming changes, we can detect > concurrent abort of the (sub)transaction but there is no mechanism to > skip decoding changes of transactions that are known to already be > aborted. With the attached WIP patch, we check CLOG when decoding the > transaction for the first time. If it's already known to be aborted, > we skip collecting decoded changes of such transactions. That way, > when the logical replication is behind or restarts, we don't need to > decode large transactions that already aborted, which helps improve > the decoding performance. > +1 for the idea of checking the transaction status only when we need to flush it to the disk or send it downstream (if streaming in progress is enabled). Although this check is costly since we are planning only for large transactions then it is worth it if we can occasionally avoid disk or network I/O for the aborted transactions. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Fri, Jun 23, 2023 at 12:39 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Fri, Jun 9, 2023 at 10:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > Hi, > > > > In logical decoding, we don't need to collect decoded changes of > > aborted transactions. While streaming changes, we can detect > > concurrent abort of the (sub)transaction but there is no mechanism to > > skip decoding changes of transactions that are known to already be > > aborted. With the attached WIP patch, we check CLOG when decoding the > > transaction for the first time. If it's already known to be aborted, > > we skip collecting decoded changes of such transactions. That way, > > when the logical replication is behind or restarts, we don't need to > > decode large transactions that already aborted, which helps improve > > the decoding performance. > > > +1 for the idea of checking the transaction status only when we need > to flush it to the disk or send it downstream (if streaming in > progress is enabled). Although this check is costly since we are > planning only for large transactions then it is worth it if we can > occasionally avoid disk or network I/O for the aborted transactions. > Thanks. I've attached the updated patch. With this patch, we check the transaction status for only large-transactions when eviction. For regression test purposes, I disable this transaction status check when logical_replication_mode is set to 'immediate'. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
On Mon, 3 Jul 2023 at 07:16, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Fri, Jun 23, 2023 at 12:39 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Fri, Jun 9, 2023 at 10:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > Hi, > > > > > > In logical decoding, we don't need to collect decoded changes of > > > aborted transactions. While streaming changes, we can detect > > > concurrent abort of the (sub)transaction but there is no mechanism to > > > skip decoding changes of transactions that are known to already be > > > aborted. With the attached WIP patch, we check CLOG when decoding the > > > transaction for the first time. If it's already known to be aborted, > > > we skip collecting decoded changes of such transactions. That way, > > > when the logical replication is behind or restarts, we don't need to > > > decode large transactions that already aborted, which helps improve > > > the decoding performance. > > > > > +1 for the idea of checking the transaction status only when we need > > to flush it to the disk or send it downstream (if streaming in > > progress is enabled). Although this check is costly since we are > > planning only for large transactions then it is worth it if we can > > occasionally avoid disk or network I/O for the aborted transactions. > > > > Thanks. > > I've attached the updated patch. With this patch, we check the > transaction status for only large-transactions when eviction. For > regression test purposes, I disable this transaction status check when > logical_replication_mode is set to 'immediate'. May be there is some changes that are missing in the patch, which is giving the following errors: reorderbuffer.c: In function ‘ReorderBufferCheckTXNAbort’: reorderbuffer.c:3584:22: error: ‘logical_replication_mode’ undeclared (first use in this function) 3584 | if (unlikely(logical_replication_mode == LOGICAL_REP_MODE_IMMEDIATE)) | ^~~~~~~~~~~~~~~~~~~~~~~~ Regards, Vignesh
On Tue, 3 Oct 2023 at 15:54, vignesh C <vignesh21@gmail.com> wrote: > > On Mon, 3 Jul 2023 at 07:16, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Fri, Jun 23, 2023 at 12:39 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Fri, Jun 9, 2023 at 10:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > Hi, > > > > > > > > In logical decoding, we don't need to collect decoded changes of > > > > aborted transactions. While streaming changes, we can detect > > > > concurrent abort of the (sub)transaction but there is no mechanism to > > > > skip decoding changes of transactions that are known to already be > > > > aborted. With the attached WIP patch, we check CLOG when decoding the > > > > transaction for the first time. If it's already known to be aborted, > > > > we skip collecting decoded changes of such transactions. That way, > > > > when the logical replication is behind or restarts, we don't need to > > > > decode large transactions that already aborted, which helps improve > > > > the decoding performance. > > > > > > > +1 for the idea of checking the transaction status only when we need > > > to flush it to the disk or send it downstream (if streaming in > > > progress is enabled). Although this check is costly since we are > > > planning only for large transactions then it is worth it if we can > > > occasionally avoid disk or network I/O for the aborted transactions. > > > > > > > Thanks. > > > > I've attached the updated patch. With this patch, we check the > > transaction status for only large-transactions when eviction. For > > regression test purposes, I disable this transaction status check when > > logical_replication_mode is set to 'immediate'. > > May be there is some changes that are missing in the patch, which is > giving the following errors: > reorderbuffer.c: In function ‘ReorderBufferCheckTXNAbort’: > reorderbuffer.c:3584:22: error: ‘logical_replication_mode’ undeclared > (first use in this function) > 3584 | if (unlikely(logical_replication_mode == > LOGICAL_REP_MODE_IMMEDIATE)) > | ^~~~~~~~~~~~~~~~~~~~~~~~ With no update to the thread and the compilation still failing I'm marking this as returned with feedback. Please feel free to resubmit to the next CF when there is a new version of the patch. Regards, Vignesh
On Fri, Feb 2, 2024 at 12:48 AM vignesh C <vignesh21@gmail.com> wrote: > > On Tue, 3 Oct 2023 at 15:54, vignesh C <vignesh21@gmail.com> wrote: > > > > On Mon, 3 Jul 2023 at 07:16, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > On Fri, Jun 23, 2023 at 12:39 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > On Fri, Jun 9, 2023 at 10:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > > Hi, > > > > > > > > > > In logical decoding, we don't need to collect decoded changes of > > > > > aborted transactions. While streaming changes, we can detect > > > > > concurrent abort of the (sub)transaction but there is no mechanism to > > > > > skip decoding changes of transactions that are known to already be > > > > > aborted. With the attached WIP patch, we check CLOG when decoding the > > > > > transaction for the first time. If it's already known to be aborted, > > > > > we skip collecting decoded changes of such transactions. That way, > > > > > when the logical replication is behind or restarts, we don't need to > > > > > decode large transactions that already aborted, which helps improve > > > > > the decoding performance. > > > > > > > > > +1 for the idea of checking the transaction status only when we need > > > > to flush it to the disk or send it downstream (if streaming in > > > > progress is enabled). Although this check is costly since we are > > > > planning only for large transactions then it is worth it if we can > > > > occasionally avoid disk or network I/O for the aborted transactions. > > > > > > > > > > Thanks. > > > > > > I've attached the updated patch. With this patch, we check the > > > transaction status for only large-transactions when eviction. For > > > regression test purposes, I disable this transaction status check when > > > logical_replication_mode is set to 'immediate'. > > > > May be there is some changes that are missing in the patch, which is > > giving the following errors: > > reorderbuffer.c: In function ‘ReorderBufferCheckTXNAbort’: > > reorderbuffer.c:3584:22: error: ‘logical_replication_mode’ undeclared > > (first use in this function) > > 3584 | if (unlikely(logical_replication_mode == > > LOGICAL_REP_MODE_IMMEDIATE)) > > | ^~~~~~~~~~~~~~~~~~~~~~~~ > > With no update to the thread and the compilation still failing I'm > marking this as returned with feedback. Please feel free to resubmit > to the next CF when there is a new version of the patch. > I resumed working on this item. I've attached the new version patch. I rebased the patch to the current HEAD and updated comments and commit messages. The patch is straightforward and I'm somewhat satisfied with it, but I'm thinking of adding some tests for it. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
On Fri, Mar 15, 2024 at 3:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I resumed working on this item. I've attached the new version patch.
I rebased the patch to the current HEAD and updated comments and
commit messages. The patch is straightforward and I'm somewhat
satisfied with it, but I'm thinking of adding some tests for it.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
I just had a look at the patch, the patch no longer applies because of a removal of a header in a recent commit. Overall the patch looks fine, and I didn't find any issues. Some cosmetic comments:
in ReorderBufferCheckTXNAbort()
+ /* Quick return if we've already knew the transaction status */
+ if (txn->aborted)
+ return true;
knew/know
/*
+ * If logical_replication_mode is "immediate", we don't check the
+ * transaction status so the caller always process this transaction.
+ */
+ if (debug_logical_replication_streaming == DEBUG_LOGICAL_REP_STREAMING_IMMEDIATE)
+ return false;
/process/processes
+ if (txn->aborted)
+ return true;
knew/know
/*
+ * If logical_replication_mode is "immediate", we don't check the
+ * transaction status so the caller always process this transaction.
+ */
+ if (debug_logical_replication_streaming == DEBUG_LOGICAL_REP_STREAMING_IMMEDIATE)
+ return false;
/process/processes
regards,
Ajin Cherian
Fujitsu Australia
On Fri, Mar 15, 2024 at 1:21 PM Ajin Cherian <itsajin@gmail.com> wrote: > > > > On Fri, Mar 15, 2024 at 3:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> >> I resumed working on this item. I've attached the new version patch. >> >> I rebased the patch to the current HEAD and updated comments and >> commit messages. The patch is straightforward and I'm somewhat >> satisfied with it, but I'm thinking of adding some tests for it. >> >> Regards, >> >> -- >> Masahiko Sawada >> Amazon Web Services: https://aws.amazon.com > > > I just had a look at the patch, the patch no longer applies because of a removal of a header in a recent commit. Overallthe patch looks fine, and I didn't find any issues. Some cosmetic comments: Thank you for your review comments. > in ReorderBufferCheckTXNAbort() > + /* Quick return if we've already knew the transaction status */ > + if (txn->aborted) > + return true; > > knew/know Maybe it should be "known"? > > /* > + * If logical_replication_mode is "immediate", we don't check the > + * transaction status so the caller always process this transaction. > + */ > + if (debug_logical_replication_streaming == DEBUG_LOGICAL_REP_STREAMING_IMMEDIATE) > + return false; > > /process/processes > Fixed. In addition to these changes, I've made some changes to the latest patch. Here is the summary: - Use txn_flags field to record the transaction status instead of two 'committed' and 'aborted' flags. - Add regression tests. - Update commit message. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
On Mon, Mar 18, 2024 at 7:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
In addition to these changes, I've made some changes to the latest
patch. Here is the summary:
- Use txn_flags field to record the transaction status instead of two
'committed' and 'aborted' flags.
- Add regression tests.
- Update commit message.
Regards,
Hi Sawada-san,
Thanks for the updated patch. Some comments:
1.
+ * already aborted, we discards all changes accumulated so far and ignore
+ * future changes, and return true. Otherwise return false.
+ * already aborted, we discards all changes accumulated so far and ignore
+ * future changes, and return true. Otherwise return false.
we discards/we discard
2. In function ReorderBufferCheckTXNAbort(): I haven't tested this but I wonder how prepared transactions would be considered, they are neither committed, nor in progress.
regards,
Ajin Cherian
Fujitsu Australia
On Wed, Mar 27, 2024 at 8:49 PM Ajin Cherian <itsajin@gmail.com> wrote: > > > > On Mon, Mar 18, 2024 at 7:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> >> In addition to these changes, I've made some changes to the latest >> patch. Here is the summary: >> >> - Use txn_flags field to record the transaction status instead of two >> 'committed' and 'aborted' flags. >> - Add regression tests. >> - Update commit message. >> >> Regards, >> > > Hi Sawada-san, > > Thanks for the updated patch. Some comments: Thank you for the view comments! > > 1. > + * already aborted, we discards all changes accumulated so far and ignore > + * future changes, and return true. Otherwise return false. > > we discards/we discard Will fix it. > > 2. In function ReorderBufferCheckTXNAbort(): I haven't tested this but I wonder how prepared transactions would be considered,they are neither committed, nor in progress. The transaction that is prepared but not resolved yet is considered as in-progress. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Hi, here are some review comments for your patch v4-0001. ====== contrib/test_decoding/sql/stats.sql 1. Huh? The test fails because the "expected results" file for these new tests is missing from the patch. ====== .../replication/logical/reorderbuffer.c 2. static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, - bool txn_prepared); + bool txn_prepared, bool mark_streamed); IIUC this new 'mark_streamed' parameter is more like a prerequisite for the other conditions to decide to mark the tx as streamed -- i.e. it is more like 'can_mark_streamed', so I felt the name should be changed to be like that (everywhere it is used). ~~~ 3. ReorderBufferTruncateTXN - * 'txn_prepared' indicates that we have decoded the transaction at prepare - * time. + * If mark_streamed is true, we could mark the transaction as streamed. + * + * 'streaming_txn' indicates that the given transaction is a streaming transaction. */ static void -ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, bool txn_prepared) +ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, bool txn_prepared, + bool mark_streamed) ~ What's that new comment about 'streaming_txn' for? It seemed unrelated to the patch code. ~~~ 4. /* * Mark the transaction as streamed. * * The top-level transaction, is marked as streamed always, even if it * does not contain any changes (that is, when all the changes are in * subtransactions). * * For subtransactions, we only mark them as streamed when there are * changes in them. * * We do it this way because of aborts - we don't want to send aborts for * XIDs the downstream is not aware of. And of course, it always knows * about the toplevel xact (we send the XID in all messages), but we never * stream XIDs of empty subxacts. */ if (mark_streamed && (!txn_prepared) && (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0))) txn->txn_flags |= RBTXN_IS_STREAMED; ~~ With the patch introduction of the new parameter, I felt this code might be better if it was refactored as follows: /* Mark the transaction as streamed, if appropriate. */ if (can_mark_streamed) { /* ... large comment */ if ((!txn_prepared) && (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0))) txn->txn_flags |= RBTXN_IS_STREAMED; } ~~~ 5. ReorderBufferPrepare - if (txn->concurrent_abort && !rbtxn_is_streamed(txn)) + if (!txn_aborted && rbtxn_did_abort(txn) && !rbtxn_is_streamed(txn)) rb->prepare(rb, txn, txn->final_lsn); ~ Maybe I misunderstood this logic, but won't a "concurrent abort" cause your new Assert added in ReorderBufferProcessTXN to fail? + /* Update transaction status */ + Assert((curtxn->txn_flags & (RBTXN_COMMITTED | RBTXN_ABORTED)) == 0); ~~~ 6. ReorderBufferCheckTXNAbort + /* Check the transaction status using CLOG lookup */ + if (TransactionIdIsInProgress(txn->xid)) + return false; + + if (TransactionIdDidCommit(txn->xid)) + { + /* + * Remember the transaction is committed so that we can skip CLOG + * check next time, avoiding the pressure on CLOG lookup. + */ + txn->txn_flags |= RBTXN_COMMITTED; + return false; + } IIUC the purpose of the TransactionIdDidCommit() was to avoid the overhead of calling the TransactionIdIsInProgress(). So, shouldn't the order of these checks be swapped? Otherwise, there might be 1 extra unnecessary call to TransactionIdIsInProgress() next time. ====== src/include/replication/reorderbuffer.h 7. #define RBTXN_PREPARE 0x0040 #define RBTXN_SKIPPED_PREPARE 0x0080 #define RBTXN_HAS_STREAMABLE_CHANGE 0x0100 +#define RBTXN_COMMITTED 0x0200 +#define RBTXN_ABORTED 0x0400 For consistency with the existing bitmask names, I guess these should be named: - RBTXN_COMMITTED --> RBTXN_IS_COMMITTED - RBTXN_ABORTED --> RBTXN_IS_ABORTED ~~~ 8. Similarly, IMO the macros should have the same names as the bitmasks, like the other nearby ones generally seem to. rbtxn_did_commit --> rbtxn_is_committed rbtxn_did_abort --> rbtxn_is_aborted ====== 9. Also, attached is a top-up patch for other cosmetic nitpicks: - comment wording - typos in comments - excessive or missing blank lines - etc. ====== Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Wed, Mar 27, 2024 at 4:49 AM Ajin Cherian <itsajin@gmail.com> wrote: > > > > On Mon, Mar 18, 2024 at 7:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> >> In addition to these changes, I've made some changes to the latest >> patch. Here is the summary: >> >> - Use txn_flags field to record the transaction status instead of two >> 'committed' and 'aborted' flags. >> - Add regression tests. >> - Update commit message. >> >> Regards, >> > > Hi Sawada-san, > > Thanks for the updated patch. Some comments: > > 1. > + * already aborted, we discards all changes accumulated so far and ignore > + * future changes, and return true. Otherwise return false. > > we discards/we discard This comment is incorporated into the latest v5 patch I've just sent[1]. > > 2. In function ReorderBufferCheckTXNAbort(): I haven't tested this but I wonder how prepared transactions would be considered,they are neither committed, nor in progress. > IIUC prepared transactions are considered as in-progress. Regards, [1] https://www.postgresql.org/message-id/CAD21AoDJE-bLdxt9T_z1rw74RN%3DE0n0%2BesYU0eo%2B-_P32EbuVg%40mail.gmail.com -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Hi Sawada-San, here are some review comments for the patch v5-0001. ====== Commit message. 1. This commit introduces an additional check to determine if a transaction is already aborted by a CLOG lookup, so the logical decoding skips further change also when it doesn't touch system catalogs. ~ Is that wording backwards? Is it meant to say: This commit introduces an additional CLOG lookup check to determine if a transaction is already aborted, so the ... ====== contrib/test_decoding/sql/stats.sql 2 +SELECT slot_name, spill_txns = 0 AS spill_txn, spill_count = 0 AS spill_count FROM pg_stat_replication_slots WHERE slot_name = 'regression_slot_stats4_twophase'; Why do the SELECT "= 0" like this, instead of just having zeros in the "expected" results? ====== .../replication/logical/reorderbuffer.c 3. static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, - bool txn_prepared); + bool txn_prepared, bool mark_streamed); That last parameter name ('mark_streamed') does not match the same parameter name in this function's definition. ~~~ ReorderBufferTruncateTXN: 4. if (txn_streaming && (!txn_prepared) && (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0))) txn->txn_flags |= RBTXN_IS_STREAMED; if (txn_prepared) { ~ Since the following condition was already "if (txn_prepared)" would it be better remove the "(!txn_prepared)" here and instead just refactor the code like: if (txn_prepared) { ... } else if (txn_streaming && (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0))) { ... } ~~~ ReorderBufferProcessTXN: 5. + + /* Remember the transaction is aborted */ + Assert((curtxn->txn_flags & RBTXN_IS_COMMITTED) == 0); + curtxn->txn_flags |= RBTXN_IS_ABORTED; Missing period on comment. ~~~ ReorderBufferCheckTXNAbort: 6. + * If GUC 'debug_logical_replication_streaming' is "immediate", we don't + * check the transaction status, so the caller always processes this + * transaction. This is to disable this check for regression tests. + */ +static bool +ReorderBufferCheckTXNAbort(ReorderBuffer *rb, ReorderBufferTXN *txn) +{ + /* + * If GUC 'debug_logical_replication_streaming' is "immediate", we don't + * check the transaction status, so the caller always processes this + * transaction. + */ + if (unlikely(debug_logical_replication_streaming == DEBUG_LOGICAL_REP_STREAMING_IMMEDIATE)) + return false; + The wording of the sentence "This is to disable..." seemed a bit confusing. Maybe this area can be simplified by doing the following. 6a. Change the function comment to say more like below: When the GUC 'debug_logical_replication_streaming' is set to "immediate", we don't check the transaction status, meaning the caller will always process this transaction. This mode is used by regression tests to avoid unnecessary transaction status checking. ~ 6b. It is not necessary for this 2nd comment to repeat everything that was already said in the function comment. A simpler comment here might be all you need: SUGGESTION: Quick return for regression tests. ~~~ 7. Is it worth mentioning about this skipping of the transaction status check in the docs for this GUC? [1] ====== [1] https://www.postgresql.org/docs/devel/runtime-config-developer.html Kind Regards, Peter Smith. Fujitsu Australia.
On Tue, Nov 12, 2024 at 5:00 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > I've attached the updated patch. > Hi, here are some review comments for the latest v6-0001. ====== contrib/test_decoding/sql/stats.sql 1. +INSERT INTO stats_test SELECT 'serialize-topbig--1:'||g.i FROM generate_series(1, 5000) g(i); I didn't understand the meaning of "serialize-topbig--1". My guess is it is a typo that was supposed to say "toobig". Perhaps there should also be some comment to explain that this "toobig" stuff was done deliberately like this to exceed 'logical_decoding_work_mem' because that would normally (if it was not aborted) cause a spill to disk. ~~~ 2. +-- Check stats. We should not spill anything as the transaction is already +-- aborted. +SELECT pg_stat_force_next_flush(); +SELECT slot_name, spill_txns AS spill_txn, spill_count AS spill_count FROM pg_stat_replication_slots WHERE slot_name = 'regression_slot_stats4_twophase'; + Those aliases seem unnecessary: "spill_txns AS spill_txn" and "spill_count AS spill_count" ====== .../replication/logical/reorderbuffer.c ReorderBufferCheckTXNAbort: 3. Other static functions are also declared at the top of this module. For consistency, shouldn't this be the same? ~~~ 4. + * We don't mark the transaction as streamed since this function can be + * called for non-streamed transactions too. + */ + ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), false); + ReorderBufferToastReset(rb, txn); Given the comment says "since this function can be called for non-streamed transactions too", would it be easier to pass rbtxn_is_streamed(txn) here instead of 'false', and then just remove the comment? ====== Kind Regards, Peter Smith. Fujitsu Australia
On Mon, 11 Nov 2024 at 23:30, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Sun, Nov 10, 2024 at 11:24 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > Hi Sawada-San, here are some review comments for the patch v5-0001. > > > > Thank you for reviewing the patch! > > > ====== > > Commit message. > > > > 1. > > This commit introduces an additional check to determine if a > > transaction is already aborted by a CLOG lookup, so the logical > > decoding skips further change also when it doesn't touch system > > catalogs. > > > > ~ > > > > Is that wording backwards? Is it meant to say: > > > > This commit introduces an additional CLOG lookup check to determine if > > a transaction is already aborted, so the ... > > Fixed. > > > > > ====== > > contrib/test_decoding/sql/stats.sql > > > > 2 > > +SELECT slot_name, spill_txns = 0 AS spill_txn, spill_count = 0 AS > > spill_count FROM pg_stat_replication_slots WHERE slot_name = > > 'regression_slot_stats4_twophase'; > > > > Why do the SELECT "= 0" like this, instead of just having zeros in the > > "expected" results? > > Indeed. I used "=0" like other queries in the same file do, but it > makes sense to me just to have zeros in the expected file. That way, > it would make it a bit easier to investigate in case of failures. > > > > > ====== > > .../replication/logical/reorderbuffer.c > > > > 3. > > static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, > > - bool txn_prepared); > > + bool txn_prepared, bool mark_streamed); > > > > That last parameter name ('mark_streamed') does not match the same > > parameter name in this function's definition. > > Fixed. > > > > > ~~~ > > > > ReorderBufferTruncateTXN: > > > > 4. > > if (txn_streaming && (!txn_prepared) && > > (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0))) > > txn->txn_flags |= RBTXN_IS_STREAMED; > > > > if (txn_prepared) > > { > > ~ > > > > Since the following condition was already "if (txn_prepared)" would it > > be better remove the "(!txn_prepared)" here and instead just refactor > > the code like: > > > > if (txn_prepared) > > { > > ... > > } > > else if (txn_streaming && (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0))) > > { > > ... > > } > > Good idea. > > > > > ~~~ > > > > ReorderBufferProcessTXN: > > > > 5. > > + > > + /* Remember the transaction is aborted */ > > + Assert((curtxn->txn_flags & RBTXN_IS_COMMITTED) == 0); > > + curtxn->txn_flags |= RBTXN_IS_ABORTED; > > > > Missing period on comment. > > Fixed. > > > > > ~~~ > > > > ReorderBufferCheckTXNAbort: > > > > 6. > > + * If GUC 'debug_logical_replication_streaming' is "immediate", we don't > > + * check the transaction status, so the caller always processes this > > + * transaction. This is to disable this check for regression tests. > > + */ > > +static bool > > +ReorderBufferCheckTXNAbort(ReorderBuffer *rb, ReorderBufferTXN *txn) > > +{ > > + /* > > + * If GUC 'debug_logical_replication_streaming' is "immediate", we don't > > + * check the transaction status, so the caller always processes this > > + * transaction. > > + */ > > + if (unlikely(debug_logical_replication_streaming == > > DEBUG_LOGICAL_REP_STREAMING_IMMEDIATE)) > > + return false; > > + > > > > The wording of the sentence "This is to disable..." seemed a bit > > confusing. Maybe this area can be simplified by doing the following. > > > > 6a. > > Change the function comment to say more like below: > > > > When the GUC 'debug_logical_replication_streaming' is set to > > "immediate", we don't check the transaction status, meaning the caller > > will always process this transaction. This mode is used by regression > > tests to avoid unnecessary transaction status checking. > > > > ~ > > > > 6b. > > It is not necessary for this 2nd comment to repeat everything that was > > already said in the function comment. A simpler comment here might be > > all you need: > > > > SUGGESTION: > > Quick return for regression tests. > > Agreed with the above two comments. Fixed. > > > > > ~~~ > > > > 7. > > Is it worth mentioning about this skipping of the transaction status > > check in the docs for this GUC? [1] > > If we want to mention this optimization in the docs, we have to > explain how the optimization works too. I think it's too detailed. > > I've attached the updated patch. Few minor suggestions: 1) Can we use rbtxn_is_committed here? + /* Remember the transaction is aborted. */ + Assert((curtxn->txn_flags & RBTXN_IS_COMMITTED) == 0); + curtxn->txn_flags |= RBTXN_IS_ABORTED; 2) Similarly here too: + /* + * Mark the transaction as aborted so we ignore future changes of this + * transaction. + */ + Assert((txn->txn_flags & RBTXN_IS_COMMITTED) == 0); + txn->txn_flags |= RBTXN_IS_ABORTED; 3) Can we use rbtxn_is_aborted here? + /* + * Remember the transaction is committed so that we can skip CLOG + * check next time, avoiding the pressure on CLOG lookup. + */ + Assert((txn->txn_flags & RBTXN_IS_ABORTED) == 0); Regards, Vignesh
On Tue, Nov 12, 2024 at 7:29 PM vignesh C <vignesh21@gmail.com> wrote: > > On Mon, 11 Nov 2024 at 23:30, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Sun, Nov 10, 2024 at 11:24 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > Hi Sawada-San, here are some review comments for the patch v5-0001. > > > > > > > Thank you for reviewing the patch! > > > > > ====== > > > Commit message. > > > > > > 1. > > > This commit introduces an additional check to determine if a > > > transaction is already aborted by a CLOG lookup, so the logical > > > decoding skips further change also when it doesn't touch system > > > catalogs. > > > > > > ~ > > > > > > Is that wording backwards? Is it meant to say: > > > > > > This commit introduces an additional CLOG lookup check to determine if > > > a transaction is already aborted, so the ... > > > > Fixed. > > > > > > > > ====== > > > contrib/test_decoding/sql/stats.sql > > > > > > 2 > > > +SELECT slot_name, spill_txns = 0 AS spill_txn, spill_count = 0 AS > > > spill_count FROM pg_stat_replication_slots WHERE slot_name = > > > 'regression_slot_stats4_twophase'; > > > > > > Why do the SELECT "= 0" like this, instead of just having zeros in the > > > "expected" results? > > > > Indeed. I used "=0" like other queries in the same file do, but it > > makes sense to me just to have zeros in the expected file. That way, > > it would make it a bit easier to investigate in case of failures. > > > > > > > > ====== > > > .../replication/logical/reorderbuffer.c > > > > > > 3. > > > static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, > > > - bool txn_prepared); > > > + bool txn_prepared, bool mark_streamed); > > > > > > That last parameter name ('mark_streamed') does not match the same > > > parameter name in this function's definition. > > > > Fixed. > > > > > > > > ~~~ > > > > > > ReorderBufferTruncateTXN: > > > > > > 4. > > > if (txn_streaming && (!txn_prepared) && > > > (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0))) > > > txn->txn_flags |= RBTXN_IS_STREAMED; > > > > > > if (txn_prepared) > > > { > > > ~ > > > > > > Since the following condition was already "if (txn_prepared)" would it > > > be better remove the "(!txn_prepared)" here and instead just refactor > > > the code like: > > > > > > if (txn_prepared) > > > { > > > ... > > > } > > > else if (txn_streaming && (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0))) > > > { > > > ... > > > } > > > > Good idea. > > > > > > > > ~~~ > > > > > > ReorderBufferProcessTXN: > > > > > > 5. > > > + > > > + /* Remember the transaction is aborted */ > > > + Assert((curtxn->txn_flags & RBTXN_IS_COMMITTED) == 0); > > > + curtxn->txn_flags |= RBTXN_IS_ABORTED; > > > > > > Missing period on comment. > > > > Fixed. > > > > > > > > ~~~ > > > > > > ReorderBufferCheckTXNAbort: > > > > > > 6. > > > + * If GUC 'debug_logical_replication_streaming' is "immediate", we don't > > > + * check the transaction status, so the caller always processes this > > > + * transaction. This is to disable this check for regression tests. > > > + */ > > > +static bool > > > +ReorderBufferCheckTXNAbort(ReorderBuffer *rb, ReorderBufferTXN *txn) > > > +{ > > > + /* > > > + * If GUC 'debug_logical_replication_streaming' is "immediate", we don't > > > + * check the transaction status, so the caller always processes this > > > + * transaction. > > > + */ > > > + if (unlikely(debug_logical_replication_streaming == > > > DEBUG_LOGICAL_REP_STREAMING_IMMEDIATE)) > > > + return false; > > > + > > > > > > The wording of the sentence "This is to disable..." seemed a bit > > > confusing. Maybe this area can be simplified by doing the following. > > > > > > 6a. > > > Change the function comment to say more like below: > > > > > > When the GUC 'debug_logical_replication_streaming' is set to > > > "immediate", we don't check the transaction status, meaning the caller > > > will always process this transaction. This mode is used by regression > > > tests to avoid unnecessary transaction status checking. > > > > > > ~ > > > > > > 6b. > > > It is not necessary for this 2nd comment to repeat everything that was > > > already said in the function comment. A simpler comment here might be > > > all you need: > > > > > > SUGGESTION: > > > Quick return for regression tests. > > > > Agreed with the above two comments. Fixed. > > > > > > > > ~~~ > > > > > > 7. > > > Is it worth mentioning about this skipping of the transaction status > > > check in the docs for this GUC? [1] > > > > If we want to mention this optimization in the docs, we have to > > explain how the optimization works too. I think it's too detailed. > > > > I've attached the updated patch. > > Few minor suggestions: > 1) Can we use rbtxn_is_committed here? > + /* Remember the transaction is aborted. */ > + Assert((curtxn->txn_flags & RBTXN_IS_COMMITTED) == 0); > + curtxn->txn_flags |= RBTXN_IS_ABORTED; > > 2) Similarly here too: > + /* > + * Mark the transaction as aborted so we ignore future changes of this > + * transaction. > + */ > + Assert((txn->txn_flags & RBTXN_IS_COMMITTED) == 0); > + txn->txn_flags |= RBTXN_IS_ABORTED; > > 3) Can we use rbtxn_is_aborted here? > + /* > + * Remember the transaction is committed so that we > can skip CLOG > + * check next time, avoiding the pressure on CLOG lookup. > + */ > + Assert((txn->txn_flags & RBTXN_IS_ABORTED) == 0); > Thank you for reviewing the patch! These comments are incorporated into the latest v6 patch I just sent[1]. Regards, [1] https://www.postgresql.org/message-id/CAD21AoDtMjbc8YCQiX1K8%2BRKeahcX2MLt3gwApm5BWGfv14i5A%40mail.gmail.com -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Hi Sawda-San, Here are some more review comments for the latest (accidentally called v6 again?) v6-0001 patch. ====== contrib/test_decoding/sql/stats.sql 1. +-- Execute a transaction that is prepared and aborted. We detect that the +-- transaction is aborted before spilling changes, and then skip collecting +-- further changes. You had replied (referring to the above comment): I think we already mentioned the transaction is going to be spilled but actually not. ~ Yes, spilling was already mentioned in the current comment but I felt it assumes the reader is expected to know details of why it was going to be spilled in the first place. In other words, I thought the comment could include a bit more explanatory background info: (Also, it's not really "we detect" the abort -- it's the new postgres code of this patch that detects it.) SUGGESTION: Execute a transaction that is prepared but then aborted. The INSERT data exceeds the 'logical_decoding_work_mem limit' limit which normally would result in the transaction being spilled to disk, but now when Postgres detects the abort it skips the spilling and also skips collecting further changes. ~~~ 2. +-- Check if the transaction is not spilled as it's already aborted. +SELECT count(*) FROM pg_logical_slot_get_changes('regression_slot_stats4_twophase', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); +SELECT pg_stat_force_next_flush(); +SELECT slot_name, spill_txns, spill_count FROM pg_stat_replication_slots WHERE slot_name = 'regression_slot_stats4_twophase'; + /Check if the transaction is not spilled/Verify that the transaction was not spilled/ ====== .../replication/logical/reorderbuffer.c ReorderBufferResetTXN: 3. /* Discard the changes that we just streamed */ - ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn)); + ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), true); Looking at the calling code for ReorderBufferResetTXN it seems this function can called for streaming OR prepared. So is it OK here to be passing hardwired 'true' as the txn_streaming parameter, or should that be passing rbtxn_is_streamed(txn)? ~~~ ReorderBufferLargestStreamableTopTXN: 4. if ((largest == NULL || txn->total_size > largest_size) && (txn->total_size > 0) && !(rbtxn_has_partial_change(txn)) && - rbtxn_has_streamable_change(txn)) + rbtxn_has_streamable_change(txn) && !(rbtxn_is_aborted(txn))) { largest = txn; largest_size = txn->total_size; I felt that this increasingly complicated code would be a lot easier to understand if you just separate the conditions into: (a) the ones that filter out transaction you don't care about; (b) the ones that check for the largest size. For example, SUGGESTION: dlist_foreach(...) { ... /* Don't consider these kinds of transactions for eviction. */ if (rbtxn_has_partial_change(txn) || !rbtxn_has_streamable_change(txn) || rbtxn_is_aborted(txn)) continue; /* Find the largest of the eviction candidates. */ if ((largest == NULL || txn->total_size > largest_size) && (txn->total_size > 0)) { largest = txn; largest_size = txn->total_size; } } ~~~ ReorderBufferCheckMemoryLimit: 5. + /* skip the transaction if already aborted */ + if (ReorderBufferCheckTXNAbort(rb, txn)) + { + /* All changes should be truncated */ + Assert(txn->size == 0 && txn->total_size == 0); + continue; + } The "discard all changes accumulated so far" side-effect happening here is not very apparent from the function name. Maybe a better name for ReorderBufferCheckTXNAbort() would be something like 'ReorderBufferCleanupIfAbortedTXN()'. ====== Kind Regards, Peter Smith. Fujitsu Australia
Hi Sawada-Sn, Here are some review comments for patch v8-0001. ====== contrib/test_decoding/sql/stats.sql 1. +-- The INSERT changes are large enough to be spilled but not, because the +-- transaction is aborted. The logical decoding skips collecting further +-- changes too. The transaction is prepared to make sure the decoding processes +-- the aborted transaction. /to be spilled but not/to be spilled but will not be/ ====== .../replication/logical/reorderbuffer.c ReorderBufferTruncateTXN: 2. /* * Discard changes from a transaction (and subtransactions), either after - * streaming or decoding them at PREPARE. Keep the remaining info - - * transactions, tuplecids, invalidations and snapshots. + * streaming, decoding them at PREPARE, or detecting the transaction abort. + * Keep the remaining info - transactions, tuplecids, invalidations and + * snapshots. * * We additionally remove tuplecids after decoding the transaction at prepare * time as we only need to perform invalidation at rollback or commit prepared. * + * The given transaction is marked as streamed if appropriate and the caller + * asked it by passing 'mark_txn_streaming' being true. + * * 'txn_prepared' indicates that we have decoded the transaction at prepare * time. */ static void -ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, bool txn_prepared) +ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, bool txn_prepared, + bool mark_txn_streaming) I think the function comment should describe the parameters in the same order that they appear in the function signature. ~~~ 3. + else if (mark_txn_streaming && (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0))) + { ... + txn->txn_flags |= RBTXN_IS_STREAMED; + } I guess it doesn't matter much, but for the sake of readability, should the condition also be checking !rbtxn_is_streamed(txn) to avoid overwriting the RBTXN_IS_STREAMED bit when it was set already? ~~~ ReorderBufferTruncateTXNIfAborted: 4. + /* + * The transaction aborted. We discard the changes we've collected so far, + * and free all resources allocated for toast reconstruction. The full + * cleanup will happen as part of decoding ABORT record of this + * transaction. + * + * Since we don't check the transaction status while replaying the + * transaction, we don't need to reset toast reconstruction data here. + */ + ReorderBufferTruncateTXN(rb, txn, false, false); 4a. The first part of the comment says "... and free all resources allocated for toast reconstruction", but the second part says "we don't need to reset toast reconstruction data here". Is that a contradiction? ~ 4b. Shouldn't this call still be passing rbtxn_prepared(txn) as the 2nd last param, like it used to? ====== Kind Regards, Peter Smith. Fujitsu Australia
On Fri, 15 Nov 2024 at 23:32, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Nov 14, 2024 at 7:07 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > Hi Sawada-Sn, > > > > Here are some review comments for patch v8-0001. > > Thank you for the comments. > > > > > ====== > > contrib/test_decoding/sql/stats.sql > > > > 1. > > +-- The INSERT changes are large enough to be spilled but not, because the > > +-- transaction is aborted. The logical decoding skips collecting further > > +-- changes too. The transaction is prepared to make sure the decoding processes > > +-- the aborted transaction. > > > > /to be spilled but not/to be spilled but will not be/ > > Fixed. > > > > > ====== > > .../replication/logical/reorderbuffer.c > > > > ReorderBufferTruncateTXN: > > > > 2. > > /* > > * Discard changes from a transaction (and subtransactions), either after > > - * streaming or decoding them at PREPARE. Keep the remaining info - > > - * transactions, tuplecids, invalidations and snapshots. > > + * streaming, decoding them at PREPARE, or detecting the transaction abort. > > + * Keep the remaining info - transactions, tuplecids, invalidations and > > + * snapshots. > > * > > * We additionally remove tuplecids after decoding the transaction at prepare > > * time as we only need to perform invalidation at rollback or commit prepared. > > * > > + * The given transaction is marked as streamed if appropriate and the caller > > + * asked it by passing 'mark_txn_streaming' being true. > > + * > > * 'txn_prepared' indicates that we have decoded the transaction at prepare > > * time. > > */ > > static void > > -ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, > > bool txn_prepared) > > +ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, > > bool txn_prepared, > > + bool mark_txn_streaming) > > > > I think the function comment should describe the parameters in the > > same order that they appear in the function signature. > > Not sure it should be. We sometimes describe the overall idea of the > function first while using arguments names, and then describe what > other arguments mean. > > > > > ~~~ > > > > 3. > > + else if (mark_txn_streaming && (rbtxn_is_toptxn(txn) || > > (txn->nentries_mem != 0))) > > + { > > ... > > + txn->txn_flags |= RBTXN_IS_STREAMED; > > + } > > > > I guess it doesn't matter much, but for the sake of readability, > > should the condition also be checking !rbtxn_is_streamed(txn) to avoid > > overwriting the RBTXN_IS_STREAMED bit when it was set already? > > Not sure it improves readability because it adds one more check there. > If it's important not to re-set RBTXN_IS_STREAMED, it makes sense to > have that check and describe in the comment. But in this case, I think > we don't necessarily need to do that. > > > ~~~ > > > > ReorderBufferTruncateTXNIfAborted: > > > > 4. > > + /* > > + * The transaction aborted. We discard the changes we've collected so far, > > + * and free all resources allocated for toast reconstruction. The full > > + * cleanup will happen as part of decoding ABORT record of this > > + * transaction. > > + * > > + * Since we don't check the transaction status while replaying the > > + * transaction, we don't need to reset toast reconstruction data here. > > + */ > > + ReorderBufferTruncateTXN(rb, txn, false, false); > > > > 4a. > > The first part of the comment says "... and free all resources > > allocated for toast reconstruction", but the second part says "we > > don't need to reset toast reconstruction data here". Is that a > > contradiction? > > Yes, the comment is out-of-date. Since this function is not called > while replaying the transaction, it should not have any toast > reconstruction data. > > > > > ~ > > > > 4b. > > Shouldn't this call still be passing rbtxn_prepared(txn) as the 2nd > > last param, like it used to? > > Actually it's not necessary because it should always be false. But > thinking more, it seems to be better to use rbtxn_preapred(txn) since > it's consistent with other places and it's not necessary to put > assumptions there. Few comments: 1) Should we have the Assert inside ReorderBufferTruncateTXNIfAborted instead of having it at multiple callers, ReorderBufferResetTXN also has the Assert inside the function after truncate of the transaction: @@ -3672,6 +3758,14 @@ ReorderBufferCheckMemoryLimit(ReorderBuffer *rb) Assert(txn->total_size > 0); Assert(rb->size >= txn->total_size); + /* skip the transaction if aborted */ + if (ReorderBufferTruncateTXNIfAborted(rb, txn)) + { + /* All changes should be discarded */ + Assert(txn->size == 0 && txn->total_size == 0); + continue; + } + ReorderBufferStreamTXN(rb, txn); } else @@ -3687,6 +3781,14 @@ ReorderBufferCheckMemoryLimit(ReorderBuffer *rb) Assert(txn->size > 0); Assert(rb->size >= txn->size); + /* skip the transaction if aborted */ + if (ReorderBufferTruncateTXNIfAborted(rb, txn)) + { + /* All changes should be discarded */ + Assert(txn->size == 0 && txn->total_size == 0); + continue; + } 2) txn->txn_flags can be moved to the next line to keep it within 80 chars in this case: * Check the transaction status by looking CLOG and discard all changes if * the transaction is aborted. The transaction status is cached in txn->txn_flags * so we can skip future changes and avoid CLOG lookups on the next call. Return 3) Is there any scenario where the Assert can fail as the toast is not reset: + * Since we don't check the transaction status while replaying the + * transaction, we don't need to reset toast reconstruction data here. + */ + ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), false); + if (ReorderBufferTruncateTXNIfAborted(rb, txn)) + { + /* All changes should be discarded */ + Assert(txn->size == 0 && txn->total_size == 0); + continue; + } 4) This can be changed to a single line comment: + /* + * Quick return if the transaction status is already known. + */ + if (rbtxn_is_committed(txn)) + return false; Regards, Vignesh
Hi, Here are my review comments for patch v9-0001. These are only trivial nits for some code comments. Everything else looked good to me. ====== .../replication/logical/reorderbuffer.c ReorderBufferTruncateTXN: 1. + * The given transaction is marked as streamed if appropriate and the caller + * asked it by passing 'mark_txn_streaming' being true. /asked it/requested it/ /being true/as true/ ~~~ ReorderBufferPrepare: 2. + /* + * Remember if the transaction is already aborted to check if we detect + * that the transaction is concurrently aborted during the replay. + */ SUGGESTION: Remember if the transaction is already aborted so we can detect when the transaction is concurrently aborted during the replay. ====== Kind Regards, Peter Smith. Fujitsu Australia
On Mon, Nov 18, 2024 at 11:12 PM vignesh C <vignesh21@gmail.com> wrote: > > > Few comments: Thank you for reviewing the patch! > 1) Should we have the Assert inside ReorderBufferTruncateTXNIfAborted > instead of having it at multiple callers, ReorderBufferResetTXN also > has the Assert inside the function after truncate of the transaction: > @@ -3672,6 +3758,14 @@ ReorderBufferCheckMemoryLimit(ReorderBuffer *rb) > Assert(txn->total_size > 0); > Assert(rb->size >= txn->total_size); > > + /* skip the transaction if aborted */ > + if (ReorderBufferTruncateTXNIfAborted(rb, txn)) > + { > + /* All changes should be discarded */ > + Assert(txn->size == 0 && txn->total_size == 0); > + continue; > + } > + > ReorderBufferStreamTXN(rb, txn); > } > else > @@ -3687,6 +3781,14 @@ ReorderBufferCheckMemoryLimit(ReorderBuffer *rb) > Assert(txn->size > 0); > Assert(rb->size >= txn->size); > > + /* skip the transaction if aborted */ > + if (ReorderBufferTruncateTXNIfAborted(rb, txn)) > + { > + /* All changes should be discarded */ > + Assert(txn->size == 0 && txn->total_size == 0); > + continue; > + } Moved. > > 2) txn->txn_flags can be moved to the next line to keep it within 80 > chars in this case: > * Check the transaction status by looking CLOG and discard all changes if > * the transaction is aborted. The transaction status is cached in > txn->txn_flags > * so we can skip future changes and avoid CLOG lookups on the next call. Return Fixed. > > 3) Is there any scenario where the Assert can fail as the toast is not reset: > + * Since we don't check the transaction status while replaying the > + * transaction, we don't need to reset toast reconstruction data here. > + */ > + ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), false); > > + if (ReorderBufferTruncateTXNIfAborted(rb, txn)) > + { > + /* All changes should be discarded */ > + Assert(txn->size == 0 && txn->total_size == 0); > + continue; > + } IIUC we reconstruct TOAST data when replaying the transaction. On the other hand, this function is called while adding a decoded change but not when replaying the transaction. So we should not have any toast reconstruction data at this point unless I'm missing something. Do you have any scenario where we call ReorderBufferTruncateTXNIfAborted() while a transaction has TOAST reconstruction data? > > 4) This can be changed to a single line comment: > + /* > + * Quick return if the transaction status is already known. > + */ > + if (rbtxn_is_committed(txn)) > + return false; Fixed. I'll post the updated version patch soon. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Tue, 26 Nov 2024 at 02:58, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Mon, Nov 18, 2024 at 11:12 PM vignesh C <vignesh21@gmail.com> wrote: > > > > > > Few comments: > > Thank you for reviewing the patch! > > > 1) Should we have the Assert inside ReorderBufferTruncateTXNIfAborted > > instead of having it at multiple callers, ReorderBufferResetTXN also > > has the Assert inside the function after truncate of the transaction: > > @@ -3672,6 +3758,14 @@ ReorderBufferCheckMemoryLimit(ReorderBuffer *rb) > > Assert(txn->total_size > 0); > > Assert(rb->size >= txn->total_size); > > > > + /* skip the transaction if aborted */ > > + if (ReorderBufferTruncateTXNIfAborted(rb, txn)) > > + { > > + /* All changes should be discarded */ > > + Assert(txn->size == 0 && txn->total_size == 0); > > + continue; > > + } > > + > > ReorderBufferStreamTXN(rb, txn); > > } > > else > > @@ -3687,6 +3781,14 @@ ReorderBufferCheckMemoryLimit(ReorderBuffer *rb) > > Assert(txn->size > 0); > > Assert(rb->size >= txn->size); > > > > + /* skip the transaction if aborted */ > > + if (ReorderBufferTruncateTXNIfAborted(rb, txn)) > > + { > > + /* All changes should be discarded */ > > + Assert(txn->size == 0 && txn->total_size == 0); > > + continue; > > + } > > Moved. > > > > > 2) txn->txn_flags can be moved to the next line to keep it within 80 > > chars in this case: > > * Check the transaction status by looking CLOG and discard all changes if > > * the transaction is aborted. The transaction status is cached in > > txn->txn_flags > > * so we can skip future changes and avoid CLOG lookups on the next call. Return > > Fixed. > > > > > 3) Is there any scenario where the Assert can fail as the toast is not reset: > > + * Since we don't check the transaction status while replaying the > > + * transaction, we don't need to reset toast reconstruction data here. > > + */ > > + ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), false); > > > > + if (ReorderBufferTruncateTXNIfAborted(rb, txn)) > > + { > > + /* All changes should be discarded */ > > + Assert(txn->size == 0 && txn->total_size == 0); > > + continue; > > + } > > IIUC we reconstruct TOAST data when replaying the transaction. On the > other hand, this function is called while adding a decoded change but > not when replaying the transaction. So we should not have any toast > reconstruction data at this point unless I'm missing something. Do you > have any scenario where we call ReorderBufferTruncateTXNIfAborted() > while a transaction has TOAST reconstruction data? I have checked further regarding the toast and verified the population of the toast hash. I agree with you on this. Overall, the patch appears to be in good shape. Regards, Vignesh
On Tue, Nov 26, 2024 at 10:01 PM vignesh C <vignesh21@gmail.com> wrote: > > On Tue, 26 Nov 2024 at 02:58, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Mon, Nov 18, 2024 at 11:12 PM vignesh C <vignesh21@gmail.com> wrote: > > > > > > > > > Few comments: > > > > Thank you for reviewing the patch! > > > > > 1) Should we have the Assert inside ReorderBufferTruncateTXNIfAborted > > > instead of having it at multiple callers, ReorderBufferResetTXN also > > > has the Assert inside the function after truncate of the transaction: > > > @@ -3672,6 +3758,14 @@ ReorderBufferCheckMemoryLimit(ReorderBuffer *rb) > > > Assert(txn->total_size > 0); > > > Assert(rb->size >= txn->total_size); > > > > > > + /* skip the transaction if aborted */ > > > + if (ReorderBufferTruncateTXNIfAborted(rb, txn)) > > > + { > > > + /* All changes should be discarded */ > > > + Assert(txn->size == 0 && txn->total_size == 0); > > > + continue; > > > + } > > > + > > > ReorderBufferStreamTXN(rb, txn); > > > } > > > else > > > @@ -3687,6 +3781,14 @@ ReorderBufferCheckMemoryLimit(ReorderBuffer *rb) > > > Assert(txn->size > 0); > > > Assert(rb->size >= txn->size); > > > > > > + /* skip the transaction if aborted */ > > > + if (ReorderBufferTruncateTXNIfAborted(rb, txn)) > > > + { > > > + /* All changes should be discarded */ > > > + Assert(txn->size == 0 && txn->total_size == 0); > > > + continue; > > > + } > > > > Moved. > > > > > > > > 2) txn->txn_flags can be moved to the next line to keep it within 80 > > > chars in this case: > > > * Check the transaction status by looking CLOG and discard all changes if > > > * the transaction is aborted. The transaction status is cached in > > > txn->txn_flags > > > * so we can skip future changes and avoid CLOG lookups on the next call. Return > > > > Fixed. > > > > > > > > 3) Is there any scenario where the Assert can fail as the toast is not reset: > > > + * Since we don't check the transaction status while replaying the > > > + * transaction, we don't need to reset toast reconstruction data here. > > > + */ > > > + ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), false); > > > > > > + if (ReorderBufferTruncateTXNIfAborted(rb, txn)) > > > + { > > > + /* All changes should be discarded */ > > > + Assert(txn->size == 0 && txn->total_size == 0); > > > + continue; > > > + } > > > > IIUC we reconstruct TOAST data when replaying the transaction. On the > > other hand, this function is called while adding a decoded change but > > not when replaying the transaction. So we should not have any toast > > reconstruction data at this point unless I'm missing something. Do you > > have any scenario where we call ReorderBufferTruncateTXNIfAborted() > > while a transaction has TOAST reconstruction data? > > I have checked further regarding the toast and verified the population > of the toast hash. I agree with you on this. Overall, the patch > appears to be in good shape. Thank you for the confirmation! I thought we'd done performance tests with this patch but Michael-san pointed out we've not done yet. So I've done benchmark tests in two scenarios: A. Skip decoding large aborted transactions. 1. Preparation (SQL commands) create table test (c int); select pg_create_logical_replication_slot('s', 'test_decoding'); begin; insert into test select generate_series(1, 1_000_000); commit; begin; insert into test select generate_series(1, 1_000_000); rollback; begin; insert into test select generate_series(1, 1_000_000); rollback; 2. Performance tests (results are w/o patch vs. w/ patch) -- causes some spill/streamed transactions set logical_decoding_work_mem to '64MB'; select 'non-streaming', count(*) from pg_logical_slot_peek_changes('s', null, null, 'stream-changes', 'false'); -> 2636.208 ms vs. 2070.906 ms select 'streaming', count(*) from pg_logical_slot_peek_changes('s', null, null, 'stream-changes', 'true'); -> 910.579 ms vs. 653.574 ms -- no spill/streamed transactions set logical_decoding_work_mem to '5GB'; select 'non-streaming', count(*) from pg_logical_slot_peek_changes('s', null, null, 'stream-changes', 'false'); -> 962.863 ms vs. 956.910 ms select 'streaming', count(*) from pg_logical_slot_peek_changes('s', null, null, 'stream-changes', 'true'); -> 973.426 ms vs. 973.033 ms According to the results, skipping logical decoding of already-aborted transactions contributes performance improvements. B. Decoding medium-size transactions to check overheads of CLOG lookups. 1. Preparation (shell script) pgbench -i -s 1 postgres psql -c "create table test (c int)" psql -c "select pg_create_logical_replication_slot('s', 'test_decoding')" echo "insert into test select generate_series(1, 100)" > /tmp/bench.sql pgbench -t 10000 -c 10 -j 5 -f /tmp/bench.sql postgres 2. Performance tests -- spill/streamed transactions set logical_decoding_work_mem to '64'; select 'non-streaming', count(*) from pg_logical_slot_peek_changes('s', null, null, 'stream-changes', 'false'); -> 7230.537 ms vs. 7154.322 ms select 'streaming', count(*) from pg_logical_slot_peek_changes('s', null, null, 'stream-changes', 'true'); -> 6702.438 ms vs. 6678.232 ms Overall, I don't see noticeable overheads of CLOG lookups. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Tue, Nov 26, 2024 at 3:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > I've attached a new version patch that incorporates all comments I got so far. > Review comments: =============== 1. + * The given transaction is marked as streamed if appropriate and the caller + * requested it by passing 'mark_txn_streaming' as true. + * * 'txn_prepared' indicates that we have decoded the transaction at prepare * time. */ static void -ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, bool txn_prepared) +ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, bool txn_prepared, + bool mark_txn_streaming) { ... } + else if (mark_txn_streaming && (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0))) + { + /* + * Mark the transaction as streamed, if appropriate. The comments related to the above changes don't clarify in which cases the 'mark_txn_streaming' should be set. Before this patch, it was clear from the comments and code about the cases where we would decide to mark it as streamed. 2. + /* + * Mark the transaction as aborted so we ignore future changes of this + * transaction. /so we ignore/so we can ignore/ 3. * Helper function for ReorderBufferProcessTXN to handle the concurrent - * abort of the streaming transaction. This resets the TXN such that it - * can be used to stream the remaining data of transaction being processed. - * This can happen when the subtransaction is aborted and we still want to - * continue processing the main or other subtransactions data. + * abort of the streaming (prepared) transaction. ... In the above comment, "... streaming (prepared)...", you added prepared to imply that this function handles concurrent abort for both in-progress and prepared transactions. Am I correct? If so, the current change makes it less clear. If you see the comments at its caller, they are clearer. 4. + /* + * Remember if the transaction is already aborted so we can detect when + * the transaction is concurrently aborted during the replay. + */ + already_aborted = rbtxn_is_aborted(txn); + ReorderBufferReplay(txn, rb, xid, txn->final_lsn, txn->end_lsn, txn->xact_time.prepare_time, txn->origin_id, txn->origin_lsn); @@ -2832,10 +2918,10 @@ ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid, * when rollback prepared is decoded and sent, the downstream should be * able to rollback such a xact. See comments atop DecodePrepare. * - * Note, for the concurrent_abort + streaming case a stream_prepare was + * Note, for the concurrent abort + streaming case a stream_prepare was * already sent within the ReorderBufferReplay call above. */ - if (txn->concurrent_abort && !rbtxn_is_streamed(txn)) + if (!already_aborted && rbtxn_is_aborted(txn) && !rbtxn_is_streamed(txn)) rb->prepare(rb, txn, txn->final_lsn); It is not clear from the comments how the 'already_aborted' is handled. I think after this patch we would have already truncated all its changes. If so, why do we need to try to replay the changes of such a xact? 5. +/* + * Check the transaction status by looking CLOG and discard all changes if + * the transaction is aborted. The transaction status is cached in + * txn->txn_flags so we can skip future changes and avoid CLOG lookups on the + * next call. Return true if the transaction is aborted, otherwise return + * false. + * + * When the 'debug_logical_replication_streaming' is set to "immediate", we + * don't check the transaction status, meaning the caller will always process + * this transaction. + */ +static bool +ReorderBufferTruncateTXNIfAborted(ReorderBuffer *rb, ReorderBufferTXN *txn) +{ I think this function is being invoked to mark a sub-transaction as aborted. It is better to explain in comments how it interacts with sub-transactions, why it is okay to mark them as aborted, and how the other parts of the system interact with it. -- With Regards, Amit Kapila.
On Tue, Nov 26, 2024 at 3:02 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > I've attached a new version patch that incorporates all comments I got so far. > > I think the patch is in good shape but I'm considering whether we > might want to call ReorderBufferToastReset() after truncating all > changes, in ReorderBufferTruncateTXNIfAborted() just in case. Will > investigate further. > There’s something that seems a bit odd to me. Consider the case where the largest transaction(s) are aborted. If ReorderBufferCanStartStreaming() returns true, the changes from this transaction will only be discarded if it's a streamable transaction. However, if ReorderBufferCanStartStreaming() is false, the changes will be discarded regardless. What seems strange to me in this patch is truncating the changes of a large aborted transaction depending on whether we need to stream or spill but actually that should be completely independent IMHO. My concern is that if the largest transaction is aborted but isn’t yet streamable, we might end up picking the next transaction, which could be much smaller. This smaller transaction might not help us stay within the memory limit, and we could repeat this process for a few more transactions. In contrast, it might be more efficient to simply discard the large aborted transaction, even if it’s not streamable, to avoid this issue. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Tue, Dec 10, 2024 at 10:59 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Nov 26, 2024 at 3:02 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > I've attached a new version patch that incorporates all comments I got so far. > > > > I think the patch is in good shape but I'm considering whether we > > might want to call ReorderBufferToastReset() after truncating all > > changes, in ReorderBufferTruncateTXNIfAborted() just in case. Will > > investigate further. > > > > There’s something that seems a bit odd to me. Consider the case where > the largest transaction(s) are aborted. If > ReorderBufferCanStartStreaming() returns true, the changes from this > transaction will only be discarded if it's a streamable transaction. > However, if ReorderBufferCanStartStreaming() is false, the changes > will be discarded regardless. > > What seems strange to me in this patch is truncating the changes of a > large aborted transaction depending on whether we need to stream or > spill but actually that should be completely independent IMHO. My > concern is that if the largest transaction is aborted but isn’t yet > streamable, we might end up picking the next transaction, which could > be much smaller. This smaller transaction might not help us stay > within the memory limit, and we could repeat this process for a few > more transactions. In contrast, it might be more efficient to simply > discard the large aborted transaction, even if it’s not streamable, to > avoid this issue. > If the largest transaction is non-streamable, won't the transaction returned by ReorderBufferLargestTXN() in the other case already suffice the need? -- With Regards, Amit Kapila.
On Tue, Dec 10, 2024 at 10:39 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > 5. > +/* > + * Check the transaction status by looking CLOG and discard all changes if > + * the transaction is aborted. The transaction status is cached in > + * txn->txn_flags so we can skip future changes and avoid CLOG lookups on the > + * next call. Return true if the transaction is aborted, otherwise return > + * false. > + * > + * When the 'debug_logical_replication_streaming' is set to "immediate", we > + * don't check the transaction status, meaning the caller will always process > + * this transaction. > + */ > +static bool > +ReorderBufferTruncateTXNIfAborted(ReorderBuffer *rb, ReorderBufferTXN *txn) > +{ > > I think this function is being invoked to mark a sub-transaction as > aborted. It is better to explain in comments how it interacts with > sub-transactions, why it is okay to mark them as aborted, and how the > other parts of the system interact with it. > The current name suggests that the main purpose is to truncate the txn which is okay but wouldn't it be better to name on the lines of ReorderBufferCheckAndTruncateAbortedTXN()? In the following comment, can we move 'Return ...' to the next line to make the return values from the function clear? + * next call. Return true if the transaction is aborted, otherwise return + * false. -- With Regards, Amit Kapila.
On Tue, Dec 10, 2024 at 11:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Dec 10, 2024 at 10:59 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Tue, Nov 26, 2024 at 3:02 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > I've attached a new version patch that incorporates all comments I got so far. > > > > > > I think the patch is in good shape but I'm considering whether we > > > might want to call ReorderBufferToastReset() after truncating all > > > changes, in ReorderBufferTruncateTXNIfAborted() just in case. Will > > > investigate further. > > > > > > > There’s something that seems a bit odd to me. Consider the case where > > the largest transaction(s) are aborted. If > > ReorderBufferCanStartStreaming() returns true, the changes from this > > transaction will only be discarded if it's a streamable transaction. > > However, if ReorderBufferCanStartStreaming() is false, the changes > > will be discarded regardless. > > > > What seems strange to me in this patch is truncating the changes of a > > large aborted transaction depending on whether we need to stream or > > spill but actually that should be completely independent IMHO. My > > concern is that if the largest transaction is aborted but isn’t yet > > streamable, we might end up picking the next transaction, which could > > be much smaller. This smaller transaction might not help us stay > > within the memory limit, and we could repeat this process for a few > > more transactions. In contrast, it might be more efficient to simply > > discard the large aborted transaction, even if it’s not streamable, to > > avoid this issue. > > > > If the largest transaction is non-streamable, won't the transaction > returned by ReorderBufferLargestTXN() in the other case already > suffice the need? I see your point, but I don’t think it’s quite the same. When ReorderBufferCanStartStreaming() is true, the function ReorderBufferLargestStreamableTopTXN() looks for the largest transaction among those that have a base_snapshot. So, if the largest transaction is aborted but hasn’t yet received a base_snapshot, it will instead select the largest transaction that does have a base_snapshot, which could be significantly smaller than the largest aborted transaction. I’m not saying this is a very common scenario, but I do feel that the logic behind truncating the largest transaction doesn’t seem entirely consistent. However, maybe this isn't a major issue. We could justify the current behavior by saying that before picking any transaction for streaming or spilling, we first check whether it has been aborted. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Mon, Dec 9, 2024 at 10:19 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Dec 10, 2024 at 11:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Dec 10, 2024 at 10:59 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Tue, Nov 26, 2024 at 3:02 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > > > I've attached a new version patch that incorporates all comments I got so far. > > > > > > > > I think the patch is in good shape but I'm considering whether we > > > > might want to call ReorderBufferToastReset() after truncating all > > > > changes, in ReorderBufferTruncateTXNIfAborted() just in case. Will > > > > investigate further. > > > > > > > > > > There’s something that seems a bit odd to me. Consider the case where > > > the largest transaction(s) are aborted. If > > > ReorderBufferCanStartStreaming() returns true, the changes from this > > > transaction will only be discarded if it's a streamable transaction. > > > However, if ReorderBufferCanStartStreaming() is false, the changes > > > will be discarded regardless. > > > > > > What seems strange to me in this patch is truncating the changes of a > > > large aborted transaction depending on whether we need to stream or > > > spill but actually that should be completely independent IMHO. My > > > concern is that if the largest transaction is aborted but isn’t yet > > > streamable, we might end up picking the next transaction, which could > > > be much smaller. This smaller transaction might not help us stay > > > within the memory limit, and we could repeat this process for a few > > > more transactions. In contrast, it might be more efficient to simply > > > discard the large aborted transaction, even if it’s not streamable, to > > > avoid this issue. > > > > > > > If the largest transaction is non-streamable, won't the transaction > > returned by ReorderBufferLargestTXN() in the other case already > > suffice the need? > > I see your point, but I don’t think it’s quite the same. When > ReorderBufferCanStartStreaming() is true, the function > ReorderBufferLargestStreamableTopTXN() looks for the largest > transaction among those that have a base_snapshot. So, if the largest > transaction is aborted but hasn’t yet received a base_snapshot, it > will instead select the largest transaction that does have a > base_snapshot, which could be significantly smaller than the largest > aborted transaction. IIUC the transaction entries in reorderbuffer have the base snapshot before decoding the first change (see SnapBuildProcessChange()). In which case the transaction doesn't have the base snapshot and has the largest amount of changes? Subtransaction entries could transfer its base snapshot to its parent transaction entry but such subtransactions will be picked by ReorderBufferLargestTXN(). Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Wed, Dec 11, 2024 at 3:18 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Mon, Dec 9, 2024 at 10:19 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > If the largest transaction is non-streamable, won't the transaction > > > returned by ReorderBufferLargestTXN() in the other case already > > > suffice the need? > > > > I see your point, but I don’t think it’s quite the same. When > > ReorderBufferCanStartStreaming() is true, the function > > ReorderBufferLargestStreamableTopTXN() looks for the largest > > transaction among those that have a base_snapshot. So, if the largest > > transaction is aborted but hasn’t yet received a base_snapshot, it > > will instead select the largest transaction that does have a > > base_snapshot, which could be significantly smaller than the largest > > aborted transaction. > > IIUC the transaction entries in reorderbuffer have the base snapshot > before decoding the first change (see SnapBuildProcessChange()). In > which case the transaction doesn't have the base snapshot and has the > largest amount of changes? Subtransaction entries could transfer its > base snapshot to its parent transaction entry but such subtransactions > will be picked by ReorderBufferLargestTXN(). > IIRC, there could be cases where reorder buffers of transactions can grow in size without having a base snapshot, I think transactions doing DDLs and generating a lot of INVALIDATION messages could fall in such a category. And that was one of the reasons why we were using txns_by_base_snapshot_lsn inside ReorderBufferLargestStreamableTopTXN(). -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Dec 11, 2024 at 8:21 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Dec 11, 2024 at 3:18 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Mon, Dec 9, 2024 at 10:19 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > If the largest transaction is non-streamable, won't the transaction > > > > returned by ReorderBufferLargestTXN() in the other case already > > > > suffice the need? > > > > > > I see your point, but I don’t think it’s quite the same. When > > > ReorderBufferCanStartStreaming() is true, the function > > > ReorderBufferLargestStreamableTopTXN() looks for the largest > > > transaction among those that have a base_snapshot. So, if the largest > > > transaction is aborted but hasn’t yet received a base_snapshot, it > > > will instead select the largest transaction that does have a > > > base_snapshot, which could be significantly smaller than the largest > > > aborted transaction. > > > > IIUC the transaction entries in reorderbuffer have the base snapshot > > before decoding the first change (see SnapBuildProcessChange()). In > > which case the transaction doesn't have the base snapshot and has the > > largest amount of changes? Subtransaction entries could transfer its > > base snapshot to its parent transaction entry but such subtransactions > > will be picked by ReorderBufferLargestTXN(). > > > IIRC, there could be cases where reorder buffers of transactions can > grow in size without having a base snapshot, I think transactions > doing DDLs and generating a lot of INVALIDATION messages could fall in > such a category. > Are we recording such changes in the reorder buffer? If so, can you please share how? AFAICU, the main idea behind skipping aborts is to avoid sending a lot of data to the client that later needs to be discarded or cases where we spent resources/time spilling the changes that later need to be discarded. In that vein, the current idea of the patch where it truncates and skips aborted xacts before streaming or spilling them sounds reasonable. -- With Regards, Amit Kapila.
On Thu, Dec 12, 2024 at 11:08 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Dec 11, 2024 at 8:21 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Wed, Dec 11, 2024 at 3:18 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > On Mon, Dec 9, 2024 at 10:19 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > > > If the largest transaction is non-streamable, won't the transaction > > > > > returned by ReorderBufferLargestTXN() in the other case already > > > > > suffice the need? > > > > > > > > I see your point, but I don’t think it’s quite the same. When > > > > ReorderBufferCanStartStreaming() is true, the function > > > > ReorderBufferLargestStreamableTopTXN() looks for the largest > > > > transaction among those that have a base_snapshot. So, if the largest > > > > transaction is aborted but hasn’t yet received a base_snapshot, it > > > > will instead select the largest transaction that does have a > > > > base_snapshot, which could be significantly smaller than the largest > > > > aborted transaction. > > > > > > IIUC the transaction entries in reorderbuffer have the base snapshot > > > before decoding the first change (see SnapBuildProcessChange()). In > > > which case the transaction doesn't have the base snapshot and has the > > > largest amount of changes? Subtransaction entries could transfer its > > > base snapshot to its parent transaction entry but such subtransactions > > > will be picked by ReorderBufferLargestTXN(). > > > > > IIRC, there could be cases where reorder buffers of transactions can > > grow in size without having a base snapshot, I think transactions > > doing DDLs and generating a lot of INVALIDATION messages could fall in > > such a category. > > > > Are we recording such changes in the reorder buffer? If so, can you > please share how? xact_decode, do add the XLOG_XACT_INVALIDATIONS in the reorder buffer and for such changes we don't call SnapBuildProcessChange() that means it is possible to collect such changes in reorder buffer without setting the base_snapshot AFAICU, the main idea behind skipping aborts is to > avoid sending a lot of data to the client that later needs to be > discarded or cases where we spent resources/time spilling the changes > that later need to be discarded. In that vein, the current idea of the > patch where it truncates and skips aborted xacts before streaming or > spilling them sounds reasonable. I believe in one of my previous responses (a few emails above), I agreed that it's a reasonable goal to check for aborted transactions just before spilling or streaming, and if we detect an aborted transaction, we can avoid streaming/spilling and simply discard the changes. However, I wanted to make a point that if we have a large aborted transaction without a base snapshot (assuming that's possible), we might end up streaming many small transactions to stay under the memory limit. Even though we try to stay within the limit, we still might not succeed because the main issue is the large aborted transaction, which doesn't have a base snapshot. So, instead of streaming many small transactions, if we had selected the largest transaction first and checked if it was aborted, we could have avoided streaming all those smaller transactions. I agree this is a hypothetical scenario and may not be worth optimizing, and that's completely fair. I just wanted to clarify the point I raised when I first started reviewing this patch. I haven't tried it myself, but I believe this scenario could be created by starting a transaction that performs multiple DDLs and then ultimately gets aborted. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Dec 11, 2024 at 10:01 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Dec 12, 2024 at 11:08 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Dec 11, 2024 at 8:21 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Wed, Dec 11, 2024 at 3:18 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > On Mon, Dec 9, 2024 at 10:19 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > > > > > > If the largest transaction is non-streamable, won't the transaction > > > > > > returned by ReorderBufferLargestTXN() in the other case already > > > > > > suffice the need? > > > > > > > > > > I see your point, but I don’t think it’s quite the same. When > > > > > ReorderBufferCanStartStreaming() is true, the function > > > > > ReorderBufferLargestStreamableTopTXN() looks for the largest > > > > > transaction among those that have a base_snapshot. So, if the largest > > > > > transaction is aborted but hasn’t yet received a base_snapshot, it > > > > > will instead select the largest transaction that does have a > > > > > base_snapshot, which could be significantly smaller than the largest > > > > > aborted transaction. > > > > > > > > IIUC the transaction entries in reorderbuffer have the base snapshot > > > > before decoding the first change (see SnapBuildProcessChange()). In > > > > which case the transaction doesn't have the base snapshot and has the > > > > largest amount of changes? Subtransaction entries could transfer its > > > > base snapshot to its parent transaction entry but such subtransactions > > > > will be picked by ReorderBufferLargestTXN(). > > > > > > > IIRC, there could be cases where reorder buffers of transactions can > > > grow in size without having a base snapshot, I think transactions > > > doing DDLs and generating a lot of INVALIDATION messages could fall in > > > such a category. > > > > > > > Are we recording such changes in the reorder buffer? If so, can you > > please share how? > > xact_decode, do add the XLOG_XACT_INVALIDATIONS in the reorder buffer > and for such changes we don't call SnapBuildProcessChange() that means > it is possible to collect such changes in reorder buffer without > setting the base_snapshot DDLs write not only XLOG_XACT_INVALIDATIONS but also system catalog changes. I think that when decoding these system catalog changes, we end up calling SnapBuildProcessChange(). I understand that decoding XLOG_XACT_INVALIDATIONS doesn't call SnapBuildProcessChange() but queues invalidation messages to the reorderbuffer, but I still don't understand cases where a transaction entry is quite big and has only a lot of invalidation messages. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Fri, Dec 13, 2024 at 3:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > DDLs write not only XLOG_XACT_INVALIDATIONS but also system catalog > changes. I think that when decoding these system catalog changes, we > end up calling SnapBuildProcessChange(). I understand that decoding > XLOG_XACT_INVALIDATIONS doesn't call SnapBuildProcessChange() but > queues invalidation messages to the reorderbuffer, but I still don't > understand cases where a transaction entry is quite big and has only a > lot of invalidation messages. You are right that SnapBuildProcessChange() will be called when there are changes in the system catalog. However it is very much possible that when you are processing the system catalog operation the snapbuild state is not yet SNAPBUILD_FULL_SNAPSHOT and by the time you reach to XLOG_XACT_INVALIDATIONS some concurrent transaction get committed and snapbuild state change to SNAPBUILD_FULL_SNAPSHOT. However, I need to agree that such a transaction can not really be very large because this can contain Invalidation messages at max from a single DDL command so maybe we don't need to do anything special for them and we can go ahead with the approach you followed in the current patch. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Sun, Dec 15, 2024 at 10:45 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Fri, Dec 13, 2024 at 3:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > DDLs write not only XLOG_XACT_INVALIDATIONS but also system catalog > > changes. I think that when decoding these system catalog changes, we > > end up calling SnapBuildProcessChange(). I understand that decoding > > XLOG_XACT_INVALIDATIONS doesn't call SnapBuildProcessChange() but > > queues invalidation messages to the reorderbuffer, but I still don't > > understand cases where a transaction entry is quite big and has only a > > lot of invalidation messages. > > You are right that SnapBuildProcessChange() will be called when there > are changes in the system catalog. However it is very much possible > that when you are processing the system catalog operation the > snapbuild state is not yet SNAPBUILD_FULL_SNAPSHOT and by the time you > reach to XLOG_XACT_INVALIDATIONS some concurrent transaction get > committed and snapbuild state change to SNAPBUILD_FULL_SNAPSHOT. > However, I need to agree that such a transaction can not really be > very large because this can contain Invalidation messages at max from > a single DDL command so maybe we don't need to do anything special for > them and we can go ahead with the approach you followed in the current > patch. > Thanks, I also think we can proceed with the current approach. So, the pending task is to address a few comments raised by me. -- With Regards, Amit Kapila.
On Thu, Dec 19, 2024 at 7:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Mon, Dec 9, 2024 at 9:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Nov 26, 2024 at 3:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > I've attached a new version patch that incorporates all comments I got so far. > > > > > > > Review comments: > > Thank you for reviewing the patch! > > > =============== > > 1. > > + * The given transaction is marked as streamed if appropriate and the caller > > + * requested it by passing 'mark_txn_streaming' as true. > > + * > > * 'txn_prepared' indicates that we have decoded the transaction at prepare > > * time. > > */ > > static void > > -ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, > > bool txn_prepared) > > +ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, > > bool txn_prepared, > > + bool mark_txn_streaming) > > { > > ... > > } > > + else if (mark_txn_streaming && (rbtxn_is_toptxn(txn) || > > (txn->nentries_mem != 0))) > > + { > > + /* > > + * Mark the transaction as streamed, if appropriate. > > > > The comments related to the above changes don't clarify in which cases > > the 'mark_txn_streaming' should be set. Before this patch, it was > > clear from the comments and code about the cases where we would decide > > to mark it as streamed. > > I think we can rename it to txn_streaming for consistency with > txn_prepared. I've changed the comment for that. > @@ -2067,7 +2143,7 @@ ReorderBufferResetTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, ReorderBufferChange *specinsert) { /* Discard the changes that we just streamed */ - ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn)); + ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), true); @@ -1924,7 +2000,7 @@ ReorderBufferStreamCommit(ReorderBuffer *rb, ReorderBufferTXN *txn) * full cleanup will happen as part of the COMMIT PREPAREDs, so now * just truncate txn by removing changes and tuplecids. */ - ReorderBufferTruncateTXN(rb, txn, true); + ReorderBufferTruncateTXN(rb, txn, true, true); In both the above places, the patch unconditionally passes the 'txn_streaming' even for prepared transactions when it wouldn't be a streaming xact. Inside the function, the patch handled that by first checking whether the transaction is prepared (txn_prepared). So, the logic will work but the function signature and the way its callers are using make it difficult to use and extend in the future. I think for the first case, we should get the streaming parameter in ReorderBufferResetTXN(), and for the second case ReorderBufferStreamCommit(), we should pass it as false because by that time transaction is already streamed and prepared. We are invoking it for cleanup. Even when we call ReorderBufferTruncateTXN() from ReorderBufferCheckAndTruncateAbortedTXN(), it will be better to write a comment at the caller about why we are passing this parameter as false. > > > > > 4. > > + /* > > + * Remember if the transaction is already aborted so we can detect when > > + * the transaction is concurrently aborted during the replay. > > + */ > > + already_aborted = rbtxn_is_aborted(txn); > > + > > ReorderBufferReplay(txn, rb, xid, txn->final_lsn, txn->end_lsn, > > txn->xact_time.prepare_time, txn->origin_id, txn->origin_lsn); > > > > @@ -2832,10 +2918,10 @@ ReorderBufferPrepare(ReorderBuffer *rb, > > TransactionId xid, > > * when rollback prepared is decoded and sent, the downstream should be > > * able to rollback such a xact. See comments atop DecodePrepare. > > * > > - * Note, for the concurrent_abort + streaming case a stream_prepare was > > + * Note, for the concurrent abort + streaming case a stream_prepare was > > * already sent within the ReorderBufferReplay call above. > > */ > > - if (txn->concurrent_abort && !rbtxn_is_streamed(txn)) > > + if (!already_aborted && rbtxn_is_aborted(txn) && !rbtxn_is_streamed(txn)) > > rb->prepare(rb, txn, txn->final_lsn); > > > > It is not clear from the comments how the 'already_aborted' is > > handled. I think after this patch we would have already truncated all > > its changes. If so, why do we need to try to replay the changes of > > such a xact? > > I used ReorderBufferReplay() for convenience; it sends begin_prepare() > and prepare() appropriately, handles streaming-prepared transactions, > and updates statistics etc. But as you pointed out, it would not be > necessary to set up a historical snapshot etc. I agree that we don't > need to try replaying such aborted transactions but I'd like to > confirm we don't really need to execute invalidation messages evein in > aborted transactions. > We need to execute invalidations if we have loaded any cache entries, for example in the case of streaming. See comments in the function ReorderBufferAbort(). However, I find both the current changes and the previous patch a bit difficult to follow. How about if we instead invent a flag like RBTXN_SENT_PREPARE or something like that and then use that flag to decide whether to send prepare in ReorderBufferPrepare(). Then add comments for the cases in which prepare will be sent from ReorderBufferPrepare(). * + * Since we don't check the transaction status while replaying the + * transaction, we don't need to reset toast reconstruction data here. + */ + ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), false); + + /* All changes should be discarded */ + Assert(txn->size == 0); Can we expect the size to be zero without resetting the toast data? In ReorderBufferToastReset(), we call ReorderBufferReturnChange() which reduces the change size. So, won't that size still be accounted for in txn? -- With Regards, Amit Kapila.
On Thu, Dec 19, 2024 at 2:56 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Dec 19, 2024 at 7:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Mon, Dec 9, 2024 at 9:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Tue, Nov 26, 2024 at 3:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > I've attached a new version patch that incorporates all comments I got so far. > > > > > > > > > > Review comments: > > > > Thank you for reviewing the patch! > > > > > =============== > > > 1. > > > + * The given transaction is marked as streamed if appropriate and the caller > > > + * requested it by passing 'mark_txn_streaming' as true. > > > + * > > > * 'txn_prepared' indicates that we have decoded the transaction at prepare > > > * time. > > > */ > > > static void > > > -ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, > > > bool txn_prepared) > > > +ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, > > > bool txn_prepared, > > > + bool mark_txn_streaming) > > > { > > > ... > > > } > > > + else if (mark_txn_streaming && (rbtxn_is_toptxn(txn) || > > > (txn->nentries_mem != 0))) > > > + { > > > + /* > > > + * Mark the transaction as streamed, if appropriate. > > > > > > The comments related to the above changes don't clarify in which cases > > > the 'mark_txn_streaming' should be set. Before this patch, it was > > > clear from the comments and code about the cases where we would decide > > > to mark it as streamed. > > > > I think we can rename it to txn_streaming for consistency with > > txn_prepared. I've changed the comment for that. > > > > @@ -2067,7 +2143,7 @@ ReorderBufferResetTXN(ReorderBuffer *rb, > ReorderBufferTXN *txn, > ReorderBufferChange *specinsert) > { > /* Discard the changes that we just streamed */ > - ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn)); > + ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), true); > > @@ -1924,7 +2000,7 @@ ReorderBufferStreamCommit(ReorderBuffer *rb, > ReorderBufferTXN *txn) > * full cleanup will happen as part of the COMMIT PREPAREDs, so now > * just truncate txn by removing changes and tuplecids. > */ > - ReorderBufferTruncateTXN(rb, txn, true); > + ReorderBufferTruncateTXN(rb, txn, true, true); > > In both the above places, the patch unconditionally passes the > 'txn_streaming' even for prepared transactions when it wouldn't be a > streaming xact. Inside the function, the patch handled that by first > checking whether the transaction is prepared (txn_prepared). So, the > logic will work but the function signature and the way its callers are > using make it difficult to use and extend in the future. > Valid concern. > I think for the first case, we should get the streaming parameter in > ReorderBufferResetTXN(), I think we cannot pass 'rbtxn_is_streamed(txn)' to ReorderBufferTruncateTXN() in the first case. ReorderBufferResetTXN() is called to handle the concurrent abort of the streaming transaction but the transaction might not have been marked as streamed at that time. Since ReorderBufferTruncateTXN() is responsible for both discarding changes and marking the transaction as streamed, we need to unconditionally pass txn_streaming = true in this case. > and for the second case > ReorderBufferStreamCommit(), we should pass it as false because by > that time transaction is already streamed and prepared. We are > invoking it for cleanup. Agreed. > Even when we call ReorderBufferTruncateTXN() > from ReorderBufferCheckAndTruncateAbortedTXN(), it will be better to > write a comment at the caller about why we are passing this parameter > as false. Agreed. On second thoughts, I think the confusion related to txn_streaming came from the fact that ReorderBufferTruncateTXN() does both discarding changes and marking the transaction as streamed. If we make the function do just discarding changes, we don't need to introduce the txn_streaming function argument. Instead, we need to have a separate function to mark the transaction as streamed and call it before ReorderBufferTruncateTXN() where appropriate. And ReorderBufferCheckAndTruncateAbortedTXN() just calls ReorderBufferTruncateTXN(). > > > > > > > > > 4. > > > + /* > > > + * Remember if the transaction is already aborted so we can detect when > > > + * the transaction is concurrently aborted during the replay. > > > + */ > > > + already_aborted = rbtxn_is_aborted(txn); > > > + > > > ReorderBufferReplay(txn, rb, xid, txn->final_lsn, txn->end_lsn, > > > txn->xact_time.prepare_time, txn->origin_id, txn->origin_lsn); > > > > > > @@ -2832,10 +2918,10 @@ ReorderBufferPrepare(ReorderBuffer *rb, > > > TransactionId xid, > > > * when rollback prepared is decoded and sent, the downstream should be > > > * able to rollback such a xact. See comments atop DecodePrepare. > > > * > > > - * Note, for the concurrent_abort + streaming case a stream_prepare was > > > + * Note, for the concurrent abort + streaming case a stream_prepare was > > > * already sent within the ReorderBufferReplay call above. > > > */ > > > - if (txn->concurrent_abort && !rbtxn_is_streamed(txn)) > > > + if (!already_aborted && rbtxn_is_aborted(txn) && !rbtxn_is_streamed(txn)) > > > rb->prepare(rb, txn, txn->final_lsn); > > > > > > It is not clear from the comments how the 'already_aborted' is > > > handled. I think after this patch we would have already truncated all > > > its changes. If so, why do we need to try to replay the changes of > > > such a xact? > > > > I used ReorderBufferReplay() for convenience; it sends begin_prepare() > > and prepare() appropriately, handles streaming-prepared transactions, > > and updates statistics etc. But as you pointed out, it would not be > > necessary to set up a historical snapshot etc. I agree that we don't > > need to try replaying such aborted transactions but I'd like to > > confirm we don't really need to execute invalidation messages evein in > > aborted transactions. > > > > We need to execute invalidations if we have loaded any cache entries, > for example in the case of streaming. See comments in the function > ReorderBufferAbort(). However, I find both the current changes and the > previous patch a bit difficult to follow. How about if we instead > invent a flag like RBTXN_SENT_PREPARE or something like that and then > use that flag to decide whether to send prepare in > ReorderBufferPrepare(). Then add comments for the cases in which > prepare will be sent from ReorderBufferPrepare(). The idea of using RBTXN_SENT_PREPARE sounds good to me. I'll use it. > > * > + * Since we don't check the transaction status while replaying the > + * transaction, we don't need to reset toast reconstruction data here. > + */ > + ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), false); > + > + /* All changes should be discarded */ > + Assert(txn->size == 0); > > Can we expect the size to be zero without resetting the toast data? In > ReorderBufferToastReset(), we call ReorderBufferReturnChange() which > reduces the change size. So, won't that size still be accounted for in > txn? IIUC the toast reconstruction data is created only while replaying the transaction but the ReorderBufferCheckAndTruncateAbortedTXN() is not called during that. So I think any toast data should not be accumulated at that time. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Fri, Dec 20, 2024 at 12:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Dec 19, 2024 at 2:56 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > @@ -2067,7 +2143,7 @@ ReorderBufferResetTXN(ReorderBuffer *rb, > > ReorderBufferTXN *txn, > > ReorderBufferChange *specinsert) > > { > > /* Discard the changes that we just streamed */ > > - ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn)); > > + ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), true); > > > > @@ -1924,7 +2000,7 @@ ReorderBufferStreamCommit(ReorderBuffer *rb, > > ReorderBufferTXN *txn) > > * full cleanup will happen as part of the COMMIT PREPAREDs, so now > > * just truncate txn by removing changes and tuplecids. > > */ > > - ReorderBufferTruncateTXN(rb, txn, true); > > + ReorderBufferTruncateTXN(rb, txn, true, true); > > > > In both the above places, the patch unconditionally passes the > > 'txn_streaming' even for prepared transactions when it wouldn't be a > > streaming xact. Inside the function, the patch handled that by first > > checking whether the transaction is prepared (txn_prepared). So, the > > logic will work but the function signature and the way its callers are > > using make it difficult to use and extend in the future. > > > > Valid concern. > > > I think for the first case, we should get the streaming parameter in > > ReorderBufferResetTXN(), > > I think we cannot pass 'rbtxn_is_streamed(txn)' to > ReorderBufferTruncateTXN() in the first case. ReorderBufferResetTXN() > is called to handle the concurrent abort of the streaming transaction > but the transaction might not have been marked as streamed at that > time. Since ReorderBufferTruncateTXN() is responsible for both > discarding changes and marking the transaction as streamed, we need to > unconditionally pass txn_streaming = true in this case. > Can't we use 'stream_started' variable available at the call site of ReorderBufferResetTXN() for our purpose? > > On second thoughts, I think the confusion related to txn_streaming > came from the fact that ReorderBufferTruncateTXN() does both > discarding changes and marking the transaction as streamed. If we make > the function do just discarding changes, we don't need to introduce > the txn_streaming function argument. Instead, we need to have a > separate function to mark the transaction as streamed and call it > before ReorderBufferTruncateTXN() where appropriate. And > ReorderBufferCheckAndTruncateAbortedTXN() just calls > ReorderBufferTruncateTXN(). > That sounds good to me. IIRC, initially, ReorderBufferTruncateTXN() was used to truncate changes only for streaming transactions. Later, it evolved for prepared facts and now for facts where we explicitly detect whether they are aborted. So, I think it makes sense to improve it by following your suggestion. > > > > > * > > + * Since we don't check the transaction status while replaying the > > + * transaction, we don't need to reset toast reconstruction data here. > > + */ > > + ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), false); > > + > > + /* All changes should be discarded */ > > + Assert(txn->size == 0); > > > > Can we expect the size to be zero without resetting the toast data? In > > ReorderBufferToastReset(), we call ReorderBufferReturnChange() which > > reduces the change size. So, won't that size still be accounted for in > > txn? > > IIUC the toast reconstruction data is created only while replaying the > transaction but the ReorderBufferCheckAndTruncateAbortedTXN() is not > called during that. So I think any toast data should not be > accumulated at that time. > How about the case where in the first pass, we streamed the transaction partially, where it has reconstructed toast data, and then, in the second pass, when memory becomes full, the reorder buffer contains some partial data, due to which it tries to spill the data and finds that the transaction is aborted? I could be wrong here because I haven't tried to test this code path, but I see that it is theoretically possible. -- With Regards, Amit Kapila.