Thread: Skip collecting decoded changes of already-aborted transactions

Skip collecting decoded changes of already-aborted transactions

From
Masahiko Sawada
Date:
Hi,

In logical decoding, we don't need to collect decoded changes of
aborted transactions. While streaming changes, we can detect
concurrent abort of the (sub)transaction but there is no mechanism to
skip decoding changes of transactions that are known to already be
aborted. With the attached WIP patch, we check CLOG when decoding the
transaction for the first time. If it's already known to be aborted,
we skip collecting decoded changes of such transactions. That way,
when the logical replication is behind or restarts, we don't need to
decode large transactions that already aborted, which helps improve
the decoding performance.

Feedback is very welcome.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Skip collecting decoded changes of already-aborted transactions

From
Andres Freund
Date:
Hi,

On 2023-06-09 14:16:44 +0900, Masahiko Sawada wrote:
> In logical decoding, we don't need to collect decoded changes of
> aborted transactions. While streaming changes, we can detect
> concurrent abort of the (sub)transaction but there is no mechanism to
> skip decoding changes of transactions that are known to already be
> aborted. With the attached WIP patch, we check CLOG when decoding the
> transaction for the first time. If it's already known to be aborted,
> we skip collecting decoded changes of such transactions. That way,
> when the logical replication is behind or restarts, we don't need to
> decode large transactions that already aborted, which helps improve
> the decoding performance.

It's very easy to get uses of TransactionIdDidAbort() wrong. For one, it won't
return true when a transaction was implicitly aborted due to a crash /
restart. You're also supposed to use it only after a preceding
TransactionIdIsInProgress() call.

I'm not sure there are issues with not checking TransactionIdIsInProgress()
first in this case, but I'm also not sure there aren't.

A separate issue is that TransactionIdDidAbort() can end up being very slow if
a lot of transactions are in progress concurrently. As soon as the clog
buffers are extended all time is spent copying pages from the kernel
pagecache.  I'd not at all be surprised if this changed causes a substantial
slowdown in workloads with lots of small transactions, where most transactions
commit.

Greetings,

Andres Freund



Re: Skip collecting decoded changes of already-aborted transactions

From
Masahiko Sawada
Date:
On Sun, Jun 11, 2023 at 5:31 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2023-06-09 14:16:44 +0900, Masahiko Sawada wrote:
> > In logical decoding, we don't need to collect decoded changes of
> > aborted transactions. While streaming changes, we can detect
> > concurrent abort of the (sub)transaction but there is no mechanism to
> > skip decoding changes of transactions that are known to already be
> > aborted. With the attached WIP patch, we check CLOG when decoding the
> > transaction for the first time. If it's already known to be aborted,
> > we skip collecting decoded changes of such transactions. That way,
> > when the logical replication is behind or restarts, we don't need to
> > decode large transactions that already aborted, which helps improve
> > the decoding performance.
>

Thank you for the comment.

> It's very easy to get uses of TransactionIdDidAbort() wrong. For one, it won't
> return true when a transaction was implicitly aborted due to a crash /
> restart.  You're also supposed to use it only after a preceding
> TransactionIdIsInProgress() call.
>
> I'm not sure there are issues with not checking TransactionIdIsInProgress()
> first in this case, but I'm also not sure there aren't.

Yeah, it seems to be better to use !TransactionIdDidCommit() with a
preceding TransactionIdIsInProgress() check.

>
> A separate issue is that TransactionIdDidAbort() can end up being very slow if
> a lot of transactions are in progress concurrently. As soon as the clog
> buffers are extended all time is spent copying pages from the kernel
> pagecache.  I'd not at all be surprised if this changed causes a substantial
> slowdown in workloads with lots of small transactions, where most transactions
> commit.
>

Indeed. So it should check the transaction status less frequently. It
doesn't benefit much even if we can skip collecting decoded changes of
small transactions. Another idea is that we check the status of only
large transactions. That is, when the size of decoded changes of an
aborted transaction exceeds logical_decoding_work_mem, we mark it as
aborted , free its changes decoded so far, and skip further
collection.

Regards

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Skip collecting decoded changes of already-aborted transactions

From
Amit Kapila
Date:
On Tue, Jun 13, 2023 at 2:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Sun, Jun 11, 2023 at 5:31 AM Andres Freund <andres@anarazel.de> wrote:
> >
> > A separate issue is that TransactionIdDidAbort() can end up being very slow if
> > a lot of transactions are in progress concurrently. As soon as the clog
> > buffers are extended all time is spent copying pages from the kernel
> > pagecache.  I'd not at all be surprised if this changed causes a substantial
> > slowdown in workloads with lots of small transactions, where most transactions
> > commit.
> >
>
> Indeed. So it should check the transaction status less frequently. It
> doesn't benefit much even if we can skip collecting decoded changes of
> small transactions. Another idea is that we check the status of only
> large transactions. That is, when the size of decoded changes of an
> aborted transaction exceeds logical_decoding_work_mem, we mark it as
> aborted , free its changes decoded so far, and skip further
> collection.
>

Your idea might work for large transactions but I have not come across
reports where this is reported as a problem. Do you see any such
reports and can we see how much is the benefit with large
transactions? Because we do have the handling of concurrent aborts
during sys table scans and that might help sometimes for large
transactions.

--
With Regards,
Amit Kapila.



Re: Skip collecting decoded changes of already-aborted transactions

From
Masahiko Sawada
Date:
On Thu, Jun 15, 2023 at 7:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jun 13, 2023 at 2:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Sun, Jun 11, 2023 at 5:31 AM Andres Freund <andres@anarazel.de> wrote:
> > >
> > > A separate issue is that TransactionIdDidAbort() can end up being very slow if
> > > a lot of transactions are in progress concurrently. As soon as the clog
> > > buffers are extended all time is spent copying pages from the kernel
> > > pagecache.  I'd not at all be surprised if this changed causes a substantial
> > > slowdown in workloads with lots of small transactions, where most transactions
> > > commit.
> > >
> >
> > Indeed. So it should check the transaction status less frequently. It
> > doesn't benefit much even if we can skip collecting decoded changes of
> > small transactions. Another idea is that we check the status of only
> > large transactions. That is, when the size of decoded changes of an
> > aborted transaction exceeds logical_decoding_work_mem, we mark it as
> > aborted , free its changes decoded so far, and skip further
> > collection.
> >
>
> Your idea might work for large transactions but I have not come across
> reports where this is reported as a problem. Do you see any such
> reports and can we see how much is the benefit with large
> transactions? Because we do have the handling of concurrent aborts
> during sys table scans and that might help sometimes for large
> transactions.

I've heard there was a case where a user had 29 million deletes in a
single transaction with each one wrapped in a savepoint and rolled it
back, which led to 11TB of spill files. If decoding such a large
transaction fails for some reasons (e.g. a disk full), it would try
decoding the same transaction again and again.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Skip collecting decoded changes of already-aborted transactions

From
Amit Kapila
Date:
On Wed, Jun 21, 2023 at 8:12 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Jun 15, 2023 at 7:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Jun 13, 2023 at 2:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Sun, Jun 11, 2023 at 5:31 AM Andres Freund <andres@anarazel.de> wrote:
> > > >
> > > > A separate issue is that TransactionIdDidAbort() can end up being very slow if
> > > > a lot of transactions are in progress concurrently. As soon as the clog
> > > > buffers are extended all time is spent copying pages from the kernel
> > > > pagecache.  I'd not at all be surprised if this changed causes a substantial
> > > > slowdown in workloads with lots of small transactions, where most transactions
> > > > commit.
> > > >
> > >
> > > Indeed. So it should check the transaction status less frequently. It
> > > doesn't benefit much even if we can skip collecting decoded changes of
> > > small transactions. Another idea is that we check the status of only
> > > large transactions. That is, when the size of decoded changes of an
> > > aborted transaction exceeds logical_decoding_work_mem, we mark it as
> > > aborted , free its changes decoded so far, and skip further
> > > collection.
> > >
> >
> > Your idea might work for large transactions but I have not come across
> > reports where this is reported as a problem. Do you see any such
> > reports and can we see how much is the benefit with large
> > transactions? Because we do have the handling of concurrent aborts
> > during sys table scans and that might help sometimes for large
> > transactions.
>
> I've heard there was a case where a user had 29 million deletes in a
> single transaction with each one wrapped in a savepoint and rolled it
> back, which led to 11TB of spill files. If decoding such a large
> transaction fails for some reasons (e.g. a disk full), it would try
> decoding the same transaction again and again.
>

I was thinking why the existing handling of concurrent aborts doesn't
handle such a case and it seems that we check that only on catalog
access. However, in your case, the user probably is accessing the same
relation without any concurrent DDL on the same table, so it would
just be a cache look-up for catalogs. Your idea of checking aborts
every logical_decoding_work_mem should work for such cases.

--
With Regards,
Amit Kapila.



Re: Skip collecting decoded changes of already-aborted transactions

From
Dilip Kumar
Date:
On Fri, Jun 9, 2023 at 10:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Hi,
>
> In logical decoding, we don't need to collect decoded changes of
> aborted transactions. While streaming changes, we can detect
> concurrent abort of the (sub)transaction but there is no mechanism to
> skip decoding changes of transactions that are known to already be
> aborted. With the attached WIP patch, we check CLOG when decoding the
> transaction for the first time. If it's already known to be aborted,
> we skip collecting decoded changes of such transactions. That way,
> when the logical replication is behind or restarts, we don't need to
> decode large transactions that already aborted, which helps improve
> the decoding performance.
>
+1 for the idea of checking the transaction status only when we need
to flush it to the disk or send it downstream (if streaming in
progress is enabled).   Although this check is costly since we are
planning only for large transactions then it is worth it if we can
occasionally avoid disk or network I/O for the aborted transactions.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Skip collecting decoded changes of already-aborted transactions

From
Masahiko Sawada
Date:
On Fri, Jun 23, 2023 at 12:39 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Jun 9, 2023 at 10:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > Hi,
> >
> > In logical decoding, we don't need to collect decoded changes of
> > aborted transactions. While streaming changes, we can detect
> > concurrent abort of the (sub)transaction but there is no mechanism to
> > skip decoding changes of transactions that are known to already be
> > aborted. With the attached WIP patch, we check CLOG when decoding the
> > transaction for the first time. If it's already known to be aborted,
> > we skip collecting decoded changes of such transactions. That way,
> > when the logical replication is behind or restarts, we don't need to
> > decode large transactions that already aborted, which helps improve
> > the decoding performance.
> >
> +1 for the idea of checking the transaction status only when we need
> to flush it to the disk or send it downstream (if streaming in
> progress is enabled).   Although this check is costly since we are
> planning only for large transactions then it is worth it if we can
> occasionally avoid disk or network I/O for the aborted transactions.
>

Thanks.

I've attached the updated patch. With this patch, we check the
transaction status for only large-transactions when eviction. For
regression test purposes, I disable this transaction status check when
logical_replication_mode is set to 'immediate'.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Skip collecting decoded changes of already-aborted transactions

From
vignesh C
Date:
On Mon, 3 Jul 2023 at 07:16, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Jun 23, 2023 at 12:39 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, Jun 9, 2023 at 10:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > In logical decoding, we don't need to collect decoded changes of
> > > aborted transactions. While streaming changes, we can detect
> > > concurrent abort of the (sub)transaction but there is no mechanism to
> > > skip decoding changes of transactions that are known to already be
> > > aborted. With the attached WIP patch, we check CLOG when decoding the
> > > transaction for the first time. If it's already known to be aborted,
> > > we skip collecting decoded changes of such transactions. That way,
> > > when the logical replication is behind or restarts, we don't need to
> > > decode large transactions that already aborted, which helps improve
> > > the decoding performance.
> > >
> > +1 for the idea of checking the transaction status only when we need
> > to flush it to the disk or send it downstream (if streaming in
> > progress is enabled).   Although this check is costly since we are
> > planning only for large transactions then it is worth it if we can
> > occasionally avoid disk or network I/O for the aborted transactions.
> >
>
> Thanks.
>
> I've attached the updated patch. With this patch, we check the
> transaction status for only large-transactions when eviction. For
> regression test purposes, I disable this transaction status check when
> logical_replication_mode is set to 'immediate'.

May be there is some changes that are missing in the patch, which is
giving the following errors:
reorderbuffer.c: In function ‘ReorderBufferCheckTXNAbort’:
reorderbuffer.c:3584:22: error: ‘logical_replication_mode’ undeclared
(first use in this function)
 3584 |         if (unlikely(logical_replication_mode ==
LOGICAL_REP_MODE_IMMEDIATE))
      |                      ^~~~~~~~~~~~~~~~~~~~~~~~

Regards,
Vignesh



Re: Skip collecting decoded changes of already-aborted transactions

From
vignesh C
Date:
On Tue, 3 Oct 2023 at 15:54, vignesh C <vignesh21@gmail.com> wrote:
>
> On Mon, 3 Jul 2023 at 07:16, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Fri, Jun 23, 2023 at 12:39 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Fri, Jun 9, 2023 at 10:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > In logical decoding, we don't need to collect decoded changes of
> > > > aborted transactions. While streaming changes, we can detect
> > > > concurrent abort of the (sub)transaction but there is no mechanism to
> > > > skip decoding changes of transactions that are known to already be
> > > > aborted. With the attached WIP patch, we check CLOG when decoding the
> > > > transaction for the first time. If it's already known to be aborted,
> > > > we skip collecting decoded changes of such transactions. That way,
> > > > when the logical replication is behind or restarts, we don't need to
> > > > decode large transactions that already aborted, which helps improve
> > > > the decoding performance.
> > > >
> > > +1 for the idea of checking the transaction status only when we need
> > > to flush it to the disk or send it downstream (if streaming in
> > > progress is enabled).   Although this check is costly since we are
> > > planning only for large transactions then it is worth it if we can
> > > occasionally avoid disk or network I/O for the aborted transactions.
> > >
> >
> > Thanks.
> >
> > I've attached the updated patch. With this patch, we check the
> > transaction status for only large-transactions when eviction. For
> > regression test purposes, I disable this transaction status check when
> > logical_replication_mode is set to 'immediate'.
>
> May be there is some changes that are missing in the patch, which is
> giving the following errors:
> reorderbuffer.c: In function ‘ReorderBufferCheckTXNAbort’:
> reorderbuffer.c:3584:22: error: ‘logical_replication_mode’ undeclared
> (first use in this function)
>  3584 |         if (unlikely(logical_replication_mode ==
> LOGICAL_REP_MODE_IMMEDIATE))
>       |                      ^~~~~~~~~~~~~~~~~~~~~~~~

With no update to the thread and the compilation still failing I'm
marking this as returned with feedback.  Please feel free to resubmit
to the next CF when there is a new version of the patch.

Regards,
Vignesh



Re: Skip collecting decoded changes of already-aborted transactions

From
Masahiko Sawada
Date:
On Fri, Feb 2, 2024 at 12:48 AM vignesh C <vignesh21@gmail.com> wrote:
>
> On Tue, 3 Oct 2023 at 15:54, vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Mon, 3 Jul 2023 at 07:16, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Fri, Jun 23, 2023 at 12:39 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Fri, Jun 9, 2023 at 10:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > In logical decoding, we don't need to collect decoded changes of
> > > > > aborted transactions. While streaming changes, we can detect
> > > > > concurrent abort of the (sub)transaction but there is no mechanism to
> > > > > skip decoding changes of transactions that are known to already be
> > > > > aborted. With the attached WIP patch, we check CLOG when decoding the
> > > > > transaction for the first time. If it's already known to be aborted,
> > > > > we skip collecting decoded changes of such transactions. That way,
> > > > > when the logical replication is behind or restarts, we don't need to
> > > > > decode large transactions that already aborted, which helps improve
> > > > > the decoding performance.
> > > > >
> > > > +1 for the idea of checking the transaction status only when we need
> > > > to flush it to the disk or send it downstream (if streaming in
> > > > progress is enabled).   Although this check is costly since we are
> > > > planning only for large transactions then it is worth it if we can
> > > > occasionally avoid disk or network I/O for the aborted transactions.
> > > >
> > >
> > > Thanks.
> > >
> > > I've attached the updated patch. With this patch, we check the
> > > transaction status for only large-transactions when eviction. For
> > > regression test purposes, I disable this transaction status check when
> > > logical_replication_mode is set to 'immediate'.
> >
> > May be there is some changes that are missing in the patch, which is
> > giving the following errors:
> > reorderbuffer.c: In function ‘ReorderBufferCheckTXNAbort’:
> > reorderbuffer.c:3584:22: error: ‘logical_replication_mode’ undeclared
> > (first use in this function)
> >  3584 |         if (unlikely(logical_replication_mode ==
> > LOGICAL_REP_MODE_IMMEDIATE))
> >       |                      ^~~~~~~~~~~~~~~~~~~~~~~~
>
> With no update to the thread and the compilation still failing I'm
> marking this as returned with feedback.  Please feel free to resubmit
> to the next CF when there is a new version of the patch.
>

I resumed working on this item. I've attached the new version patch.

I rebased the patch to the current HEAD and updated comments and
commit messages. The patch is straightforward and I'm somewhat
satisfied with it, but I'm thinking of adding some tests for it.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Skip collecting decoded changes of already-aborted transactions

From
Ajin Cherian
Date:


On Fri, Mar 15, 2024 at 3:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I resumed working on this item. I've attached the new version patch.

I rebased the patch to the current HEAD and updated comments and
commit messages. The patch is straightforward and I'm somewhat
satisfied with it, but I'm thinking of adding some tests for it.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

I just had a look at the patch, the patch no longer applies because of a removal of a header in a recent commit. Overall the patch looks fine, and I didn't find any issues. Some cosmetic comments:
in ReorderBufferCheckTXNAbort()
+ /* Quick return if we've already knew the transaction status */
+ if (txn->aborted)
+ return true;

knew/know

/*
+ * If logical_replication_mode is "immediate", we don't check the
+ * transaction status so the caller always process this transaction.
+ */
+ if (debug_logical_replication_streaming == DEBUG_LOGICAL_REP_STREAMING_IMMEDIATE)
+ return false;

/process/processes

regards,
Ajin Cherian
Fujitsu Australia

Re: Skip collecting decoded changes of already-aborted transactions

From
Masahiko Sawada
Date:
On Fri, Mar 15, 2024 at 1:21 PM Ajin Cherian <itsajin@gmail.com> wrote:
>
>
>
> On Fri, Mar 15, 2024 at 3:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>>
>> I resumed working on this item. I've attached the new version patch.
>>
>> I rebased the patch to the current HEAD and updated comments and
>> commit messages. The patch is straightforward and I'm somewhat
>> satisfied with it, but I'm thinking of adding some tests for it.
>>
>> Regards,
>>
>> --
>> Masahiko Sawada
>> Amazon Web Services: https://aws.amazon.com
>
>
> I just had a look at the patch, the patch no longer applies because of a removal of a header in a recent commit.
Overallthe patch looks fine, and I didn't find any issues. Some cosmetic comments: 

Thank you for your review comments.

> in ReorderBufferCheckTXNAbort()
> + /* Quick return if we've already knew the transaction status */
> + if (txn->aborted)
> + return true;
>
> knew/know

Maybe it should be "known"?

>
> /*
> + * If logical_replication_mode is "immediate", we don't check the
> + * transaction status so the caller always process this transaction.
> + */
> + if (debug_logical_replication_streaming == DEBUG_LOGICAL_REP_STREAMING_IMMEDIATE)
> + return false;
>
> /process/processes
>

Fixed.

In addition to these changes, I've made some changes to the latest
patch. Here is the summary:

- Use txn_flags field to record the transaction status instead of two
'committed' and 'aborted' flags.
- Add regression tests.
- Update commit message.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Skip collecting decoded changes of already-aborted transactions

From
Ajin Cherian
Date:


On Mon, Mar 18, 2024 at 7:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

In addition to these changes, I've made some changes to the latest
patch. Here is the summary:

- Use txn_flags field to record the transaction status instead of two
'committed' and 'aborted' flags.
- Add regression tests.
- Update commit message.

Regards,


Hi Sawada-san,

Thanks for the updated patch. Some comments:

1.
+ * already aborted, we discards all changes accumulated so far and ignore
+ * future changes, and return true. Otherwise return false.

we discards/we discard

2. In function ReorderBufferCheckTXNAbort(): I haven't tested this but I wonder how prepared transactions would be considered, they are neither committed, nor in progress.

regards,
Ajin Cherian
Fujitsu Australia

Re: Skip collecting decoded changes of already-aborted transactions

From
Masahiko Sawada
Date:
On Wed, Mar 27, 2024 at 8:49 PM Ajin Cherian <itsajin@gmail.com> wrote:
>
>
>
> On Mon, Mar 18, 2024 at 7:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>>
>> In addition to these changes, I've made some changes to the latest
>> patch. Here is the summary:
>>
>> - Use txn_flags field to record the transaction status instead of two
>> 'committed' and 'aborted' flags.
>> - Add regression tests.
>> - Update commit message.
>>
>> Regards,
>>
>
> Hi Sawada-san,
>
> Thanks for the updated patch. Some comments:

Thank you for the view comments!

>
> 1.
> + * already aborted, we discards all changes accumulated so far and ignore
> + * future changes, and return true. Otherwise return false.
>
> we discards/we discard

Will fix it.

>
> 2. In function ReorderBufferCheckTXNAbort(): I haven't tested this but I wonder how prepared transactions would be
considered,they are neither committed, nor in progress. 

The transaction that is prepared but not resolved yet is considered as
in-progress.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Hi, here are some review comments for your patch v4-0001.

======
contrib/test_decoding/sql/stats.sql

1.
Huh? The test fails because the "expected results" file for these new
tests is missing from the patch.

======
.../replication/logical/reorderbuffer.c

2.
 static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
- bool txn_prepared);
+ bool txn_prepared, bool mark_streamed);

IIUC this new 'mark_streamed' parameter is more like a prerequisite
for the other conditions to decide to mark the tx as streamed -- i.e.
it is more like 'can_mark_streamed', so I felt the name should be
changed to be like that (everywhere it is used).

~~~

3. ReorderBufferTruncateTXN

- * 'txn_prepared' indicates that we have decoded the transaction at prepare
- * time.
+ * If mark_streamed is true, we could mark the transaction as streamed.
+ *
+ * 'streaming_txn' indicates that the given transaction is a
streaming transaction.
  */
 static void
-ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
bool txn_prepared)
+ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
bool txn_prepared,
+ bool mark_streamed)

~

What's that new comment about 'streaming_txn' for? It seemed unrelated
to the patch code.

~~~

4.
/*
* Mark the transaction as streamed.
*
* The top-level transaction, is marked as streamed always, even if it
* does not contain any changes (that is, when all the changes are in
* subtransactions).
*
* For subtransactions, we only mark them as streamed when there are
* changes in them.
*
* We do it this way because of aborts - we don't want to send aborts for
* XIDs the downstream is not aware of. And of course, it always knows
* about the toplevel xact (we send the XID in all messages), but we never
* stream XIDs of empty subxacts.
*/
if (mark_streamed && (!txn_prepared) &&
(rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
txn->txn_flags |= RBTXN_IS_STREAMED;

~~

With the patch introduction of the new parameter, I felt this code
might be better if it was refactored as follows:

/* Mark the transaction as streamed, if appropriate. */
if (can_mark_streamed)
{
  /*
   ... large comment
   */
  if ((!txn_prepared) && (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
    txn->txn_flags |= RBTXN_IS_STREAMED;
}

~~~

5. ReorderBufferPrepare

- if (txn->concurrent_abort && !rbtxn_is_streamed(txn))
+ if (!txn_aborted && rbtxn_did_abort(txn) && !rbtxn_is_streamed(txn))
  rb->prepare(rb, txn, txn->final_lsn);

~

Maybe I misunderstood this logic, but won't a "concurrent abort" cause
your new Assert added in ReorderBufferProcessTXN to fail?

+ /* Update transaction status */
+ Assert((curtxn->txn_flags & (RBTXN_COMMITTED | RBTXN_ABORTED)) == 0);

~~~

6. ReorderBufferCheckTXNAbort

+ /* Check the transaction status using CLOG lookup */
+ if (TransactionIdIsInProgress(txn->xid))
+ return false;
+
+ if (TransactionIdDidCommit(txn->xid))
+ {
+ /*
+ * Remember the transaction is committed so that we can skip CLOG
+ * check next time, avoiding the pressure on CLOG lookup.
+ */
+ txn->txn_flags |= RBTXN_COMMITTED;
+ return false;
+ }

IIUC the purpose of the TransactionIdDidCommit() was to avoid the
overhead of calling the TransactionIdIsInProgress(). So, shouldn't the
order of these checks be swapped? Otherwise, there might be 1 extra
unnecessary call to TransactionIdIsInProgress() next time.

======
src/include/replication/reorderbuffer.h

7.
 #define RBTXN_PREPARE              0x0040
 #define RBTXN_SKIPPED_PREPARE   0x0080
 #define RBTXN_HAS_STREAMABLE_CHANGE 0x0100
+#define RBTXN_COMMITTED 0x0200
+#define RBTXN_ABORTED 0x0400

For consistency with the existing bitmask names, I guess these should be named:
- RBTXN_COMMITTED --> RBTXN_IS_COMMITTED
- RBTXN_ABORTED --> RBTXN_IS_ABORTED

~~~

8.
Similarly, IMO the macros should have the same names as the bitmasks,
like the other nearby ones generally seem to.

rbtxn_did_commit --> rbtxn_is_committed
rbtxn_did_abort --> rbtxn_is_aborted

======

9.
Also, attached is a top-up patch for other cosmetic nitpicks:
- comment wording
- typos in comments
- excessive or missing blank lines
- etc.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment