Thread: Skip collecting decoded changes of already-aborted transactions

Skip collecting decoded changes of already-aborted transactions

From

Masahiko Sawada

Date:

09 June 2023, 05:16:44

Hi,

In logical decoding, we don't need to collect decoded changes of
aborted transactions. While streaming changes, we can detect
concurrent abort of the (sub)transaction but there is no mechanism to
skip decoding changes of transactions that are known to already be
aborted. With the attached WIP patch, we check CLOG when decoding the
transaction for the first time. If it's already known to be aborted,
we skip collecting decoded changes of such transactions. That way,
when the logical replication is behind or restarts, we don't need to
decode large transactions that already aborted, which helps improve
the decoding performance.

Feedback is very welcome.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment

skip_decoding_already_aborted_txn.patch

Re: Skip collecting decoded changes of already-aborted transactions

From

Andres Freund

Date:

10 June 2023, 20:31:17

Hi,

On 2023-06-09 14:16:44 +0900, Masahiko Sawada wrote:
> In logical decoding, we don't need to collect decoded changes of
> aborted transactions. While streaming changes, we can detect
> concurrent abort of the (sub)transaction but there is no mechanism to
> skip decoding changes of transactions that are known to already be
> aborted. With the attached WIP patch, we check CLOG when decoding the
> transaction for the first time. If it's already known to be aborted,
> we skip collecting decoded changes of such transactions. That way,
> when the logical replication is behind or restarts, we don't need to
> decode large transactions that already aborted, which helps improve
> the decoding performance.

It's very easy to get uses of TransactionIdDidAbort() wrong. For one, it won't
return true when a transaction was implicitly aborted due to a crash /
restart. You're also supposed to use it only after a preceding
TransactionIdIsInProgress() call.

I'm not sure there are issues with not checking TransactionIdIsInProgress()
first in this case, but I'm also not sure there aren't.

A separate issue is that TransactionIdDidAbort() can end up being very slow if
a lot of transactions are in progress concurrently. As soon as the clog
buffers are extended all time is spent copying pages from the kernel
pagecache.  I'd not at all be surprised if this changed causes a substantial
slowdown in workloads with lots of small transactions, where most transactions
commit.

Greetings,

Andres Freund

Re: Skip collecting decoded changes of already-aborted transactions

From

Masahiko Sawada

Date:

13 June 2023, 08:35:45

On Sun, Jun 11, 2023 at 5:31 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2023-06-09 14:16:44 +0900, Masahiko Sawada wrote:
> > In logical decoding, we don't need to collect decoded changes of
> > aborted transactions. While streaming changes, we can detect
> > concurrent abort of the (sub)transaction but there is no mechanism to
> > skip decoding changes of transactions that are known to already be
> > aborted. With the attached WIP patch, we check CLOG when decoding the
> > transaction for the first time. If it's already known to be aborted,
> > we skip collecting decoded changes of such transactions. That way,
> > when the logical replication is behind or restarts, we don't need to
> > decode large transactions that already aborted, which helps improve
> > the decoding performance.
>

Thank you for the comment.

> It's very easy to get uses of TransactionIdDidAbort() wrong. For one, it won't
> return true when a transaction was implicitly aborted due to a crash /
> restart.  You're also supposed to use it only after a preceding
> TransactionIdIsInProgress() call.
>
> I'm not sure there are issues with not checking TransactionIdIsInProgress()
> first in this case, but I'm also not sure there aren't.

Yeah, it seems to be better to use !TransactionIdDidCommit() with a
preceding TransactionIdIsInProgress() check.

>
> A separate issue is that TransactionIdDidAbort() can end up being very slow if
> a lot of transactions are in progress concurrently. As soon as the clog
> buffers are extended all time is spent copying pages from the kernel
> pagecache.  I'd not at all be surprised if this changed causes a substantial
> slowdown in workloads with lots of small transactions, where most transactions
> commit.
>

Indeed. So it should check the transaction status less frequently. It
doesn't benefit much even if we can skip collecting decoded changes of
small transactions. Another idea is that we check the status of only
large transactions. That is, when the size of decoded changes of an
aborted transaction exceeds logical_decoding_work_mem, we mark it as
aborted , free its changes decoded so far, and skip further
collection.

Regards

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Skip collecting decoded changes of already-aborted transactions

From

Amit Kapila

Date:

15 June 2023, 10:50:12

On Tue, Jun 13, 2023 at 2:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Sun, Jun 11, 2023 at 5:31 AM Andres Freund <andres@anarazel.de> wrote:
> >
> > A separate issue is that TransactionIdDidAbort() can end up being very slow if
> > a lot of transactions are in progress concurrently. As soon as the clog
> > buffers are extended all time is spent copying pages from the kernel
> > pagecache.  I'd not at all be surprised if this changed causes a substantial
> > slowdown in workloads with lots of small transactions, where most transactions
> > commit.
> >
>
> Indeed. So it should check the transaction status less frequently. It
> doesn't benefit much even if we can skip collecting decoded changes of
> small transactions. Another idea is that we check the status of only
> large transactions. That is, when the size of decoded changes of an
> aborted transaction exceeds logical_decoding_work_mem, we mark it as
> aborted , free its changes decoded so far, and skip further
> collection.
>

Your idea might work for large transactions but I have not come across
reports where this is reported as a problem. Do you see any such
reports and can we see how much is the benefit with large
transactions? Because we do have the handling of concurrent aborts
during sys table scans and that might help sometimes for large
transactions.

--
With Regards,
Amit Kapila.

Re: Skip collecting decoded changes of already-aborted transactions

From

Masahiko Sawada

Date:

21 June 2023, 02:41:52

On Thu, Jun 15, 2023 at 7:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jun 13, 2023 at 2:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Sun, Jun 11, 2023 at 5:31 AM Andres Freund <andres@anarazel.de> wrote:
> > >
> > > A separate issue is that TransactionIdDidAbort() can end up being very slow if
> > > a lot of transactions are in progress concurrently. As soon as the clog
> > > buffers are extended all time is spent copying pages from the kernel
> > > pagecache.  I'd not at all be surprised if this changed causes a substantial
> > > slowdown in workloads with lots of small transactions, where most transactions
> > > commit.
> > >
> >
> > Indeed. So it should check the transaction status less frequently. It
> > doesn't benefit much even if we can skip collecting decoded changes of
> > small transactions. Another idea is that we check the status of only
> > large transactions. That is, when the size of decoded changes of an
> > aborted transaction exceeds logical_decoding_work_mem, we mark it as
> > aborted , free its changes decoded so far, and skip further
> > collection.
> >
>
> Your idea might work for large transactions but I have not come across
> reports where this is reported as a problem. Do you see any such
> reports and can we see how much is the benefit with large
> transactions? Because we do have the handling of concurrent aborts
> during sys table scans and that might help sometimes for large
> transactions.

I've heard there was a case where a user had 29 million deletes in a
single transaction with each one wrapped in a savepoint and rolled it
back, which led to 11TB of spill files. If decoding such a large
transaction fails for some reasons (e.g. a disk full), it would try
decoding the same transaction again and again.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Skip collecting decoded changes of already-aborted transactions

From

Amit Kapila

Date:

22 June 2023, 09:07:42

On Wed, Jun 21, 2023 at 8:12 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Jun 15, 2023 at 7:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Jun 13, 2023 at 2:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Sun, Jun 11, 2023 at 5:31 AM Andres Freund <andres@anarazel.de> wrote:
> > > >
> > > > A separate issue is that TransactionIdDidAbort() can end up being very slow if
> > > > a lot of transactions are in progress concurrently. As soon as the clog
> > > > buffers are extended all time is spent copying pages from the kernel
> > > > pagecache.  I'd not at all be surprised if this changed causes a substantial
> > > > slowdown in workloads with lots of small transactions, where most transactions
> > > > commit.
> > > >
> > >
> > > Indeed. So it should check the transaction status less frequently. It
> > > doesn't benefit much even if we can skip collecting decoded changes of
> > > small transactions. Another idea is that we check the status of only
> > > large transactions. That is, when the size of decoded changes of an
> > > aborted transaction exceeds logical_decoding_work_mem, we mark it as
> > > aborted , free its changes decoded so far, and skip further
> > > collection.
> > >
> >
> > Your idea might work for large transactions but I have not come across
> > reports where this is reported as a problem. Do you see any such
> > reports and can we see how much is the benefit with large
> > transactions? Because we do have the handling of concurrent aborts
> > during sys table scans and that might help sometimes for large
> > transactions.
>
> I've heard there was a case where a user had 29 million deletes in a
> single transaction with each one wrapped in a savepoint and rolled it
> back, which led to 11TB of spill files. If decoding such a large
> transaction fails for some reasons (e.g. a disk full), it would try
> decoding the same transaction again and again.
>

I was thinking why the existing handling of concurrent aborts doesn't
handle such a case and it seems that we check that only on catalog
access. However, in your case, the user probably is accessing the same
relation without any concurrent DDL on the same table, so it would
just be a cache look-up for catalogs. Your idea of checking aborts
every logical_decoding_work_mem should work for such cases.

--
With Regards,
Amit Kapila.

Re: Skip collecting decoded changes of already-aborted transactions

From

Dilip Kumar

Date:

23 June 2023, 03:38:57

On Fri, Jun 9, 2023 at 10:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Hi,
>
> In logical decoding, we don't need to collect decoded changes of
> aborted transactions. While streaming changes, we can detect
> concurrent abort of the (sub)transaction but there is no mechanism to
> skip decoding changes of transactions that are known to already be
> aborted. With the attached WIP patch, we check CLOG when decoding the
> transaction for the first time. If it's already known to be aborted,
> we skip collecting decoded changes of such transactions. That way,
> when the logical replication is behind or restarts, we don't need to
> decode large transactions that already aborted, which helps improve
> the decoding performance.
>
+1 for the idea of checking the transaction status only when we need
to flush it to the disk or send it downstream (if streaming in
progress is enabled).   Although this check is costly since we are
planning only for large transactions then it is worth it if we can
occasionally avoid disk or network I/O for the aborted transactions.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Skip collecting decoded changes of already-aborted transactions

From

Masahiko Sawada

Date:

03 July 2023, 01:45:52

On Fri, Jun 23, 2023 at 12:39 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Jun 9, 2023 at 10:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > Hi,
> >
> > In logical decoding, we don't need to collect decoded changes of
> > aborted transactions. While streaming changes, we can detect
> > concurrent abort of the (sub)transaction but there is no mechanism to
> > skip decoding changes of transactions that are known to already be
> > aborted. With the attached WIP patch, we check CLOG when decoding the
> > transaction for the first time. If it's already known to be aborted,
> > we skip collecting decoded changes of such transactions. That way,
> > when the logical replication is behind or restarts, we don't need to
> > decode large transactions that already aborted, which helps improve
> > the decoding performance.
> >
> +1 for the idea of checking the transaction status only when we need
> to flush it to the disk or send it downstream (if streaming in
> progress is enabled).   Although this check is costly since we are
> planning only for large transactions then it is worth it if we can
> occasionally avoid disk or network I/O for the aborted transactions.
>

Thanks.

I've attached the updated patch. With this patch, we check the
transaction status for only large-transactions when eviction. For
regression test purposes, I disable this transaction status check when
logical_replication_mode is set to 'immediate'.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment

v2-0001-Skip-decoding-already-aborted-transactions.patch

Re: Skip collecting decoded changes of already-aborted transactions

From

vignesh C

Date:

03 October 2023, 10:24:17

On Mon, 3 Jul 2023 at 07:16, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Jun 23, 2023 at 12:39 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, Jun 9, 2023 at 10:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > In logical decoding, we don't need to collect decoded changes of
> > > aborted transactions. While streaming changes, we can detect
> > > concurrent abort of the (sub)transaction but there is no mechanism to
> > > skip decoding changes of transactions that are known to already be
> > > aborted. With the attached WIP patch, we check CLOG when decoding the
> > > transaction for the first time. If it's already known to be aborted,
> > > we skip collecting decoded changes of such transactions. That way,
> > > when the logical replication is behind or restarts, we don't need to
> > > decode large transactions that already aborted, which helps improve
> > > the decoding performance.
> > >
> > +1 for the idea of checking the transaction status only when we need
> > to flush it to the disk or send it downstream (if streaming in
> > progress is enabled).   Although this check is costly since we are
> > planning only for large transactions then it is worth it if we can
> > occasionally avoid disk or network I/O for the aborted transactions.
> >
>
> Thanks.
>
> I've attached the updated patch. With this patch, we check the
> transaction status for only large-transactions when eviction. For
> regression test purposes, I disable this transaction status check when
> logical_replication_mode is set to 'immediate'.

May be there is some changes that are missing in the patch, which is
giving the following errors:
reorderbuffer.c: In function ‘ReorderBufferCheckTXNAbort’:
reorderbuffer.c:3584:22: error: ‘logical_replication_mode’ undeclared
(first use in this function)
 3584 |         if (unlikely(logical_replication_mode ==
LOGICAL_REP_MODE_IMMEDIATE))
      |                      ^~~~~~~~~~~~~~~~~~~~~~~~

Regards,
Vignesh

Re: Skip collecting decoded changes of already-aborted transactions

From

vignesh C

Date:

01 February 2024, 15:47:59

On Tue, 3 Oct 2023 at 15:54, vignesh C <vignesh21@gmail.com> wrote:
>
> On Mon, 3 Jul 2023 at 07:16, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Fri, Jun 23, 2023 at 12:39 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Fri, Jun 9, 2023 at 10:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > In logical decoding, we don't need to collect decoded changes of
> > > > aborted transactions. While streaming changes, we can detect
> > > > concurrent abort of the (sub)transaction but there is no mechanism to
> > > > skip decoding changes of transactions that are known to already be
> > > > aborted. With the attached WIP patch, we check CLOG when decoding the
> > > > transaction for the first time. If it's already known to be aborted,
> > > > we skip collecting decoded changes of such transactions. That way,
> > > > when the logical replication is behind or restarts, we don't need to
> > > > decode large transactions that already aborted, which helps improve
> > > > the decoding performance.
> > > >
> > > +1 for the idea of checking the transaction status only when we need
> > > to flush it to the disk or send it downstream (if streaming in
> > > progress is enabled).   Although this check is costly since we are
> > > planning only for large transactions then it is worth it if we can
> > > occasionally avoid disk or network I/O for the aborted transactions.
> > >
> >
> > Thanks.
> >
> > I've attached the updated patch. With this patch, we check the
> > transaction status for only large-transactions when eviction. For
> > regression test purposes, I disable this transaction status check when
> > logical_replication_mode is set to 'immediate'.
>
> May be there is some changes that are missing in the patch, which is
> giving the following errors:
> reorderbuffer.c: In function ‘ReorderBufferCheckTXNAbort’:
> reorderbuffer.c:3584:22: error: ‘logical_replication_mode’ undeclared
> (first use in this function)
>  3584 |         if (unlikely(logical_replication_mode ==
> LOGICAL_REP_MODE_IMMEDIATE))
>       |                      ^~~~~~~~~~~~~~~~~~~~~~~~

With no update to the thread and the compilation still failing I'm
marking this as returned with feedback.  Please feel free to resubmit
to the next CF when there is a new version of the patch.

Regards,
Vignesh

Re: Skip collecting decoded changes of already-aborted transactions

From

Masahiko Sawada

Date:

15 February 2024, 05:49:55

On Fri, Feb 2, 2024 at 12:48 AM vignesh C <vignesh21@gmail.com> wrote:
>
> On Tue, 3 Oct 2023 at 15:54, vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Mon, 3 Jul 2023 at 07:16, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Fri, Jun 23, 2023 at 12:39 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Fri, Jun 9, 2023 at 10:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > In logical decoding, we don't need to collect decoded changes of
> > > > > aborted transactions. While streaming changes, we can detect
> > > > > concurrent abort of the (sub)transaction but there is no mechanism to
> > > > > skip decoding changes of transactions that are known to already be
> > > > > aborted. With the attached WIP patch, we check CLOG when decoding the
> > > > > transaction for the first time. If it's already known to be aborted,
> > > > > we skip collecting decoded changes of such transactions. That way,
> > > > > when the logical replication is behind or restarts, we don't need to
> > > > > decode large transactions that already aborted, which helps improve
> > > > > the decoding performance.
> > > > >
> > > > +1 for the idea of checking the transaction status only when we need
> > > > to flush it to the disk or send it downstream (if streaming in
> > > > progress is enabled).   Although this check is costly since we are
> > > > planning only for large transactions then it is worth it if we can
> > > > occasionally avoid disk or network I/O for the aborted transactions.
> > > >
> > >
> > > Thanks.
> > >
> > > I've attached the updated patch. With this patch, we check the
> > > transaction status for only large-transactions when eviction. For
> > > regression test purposes, I disable this transaction status check when
> > > logical_replication_mode is set to 'immediate'.
> >
> > May be there is some changes that are missing in the patch, which is
> > giving the following errors:
> > reorderbuffer.c: In function ‘ReorderBufferCheckTXNAbort’:
> > reorderbuffer.c:3584:22: error: ‘logical_replication_mode’ undeclared
> > (first use in this function)
> >  3584 |         if (unlikely(logical_replication_mode ==
> > LOGICAL_REP_MODE_IMMEDIATE))
> >       |                      ^~~~~~~~~~~~~~~~~~~~~~~~
>
> With no update to the thread and the compilation still failing I'm
> marking this as returned with feedback.  Please feel free to resubmit
> to the next CF when there is a new version of the patch.
>

I resumed working on this item. I've attached the new version patch.

I rebased the patch to the current HEAD and updated comments and
commit messages. The patch is straightforward and I'm somewhat
satisfied with it, but I'm thinking of adding some tests for it.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment

v3-0001-Skip-logical-decoding-of-already-aborted-transact.patch

Re: Skip collecting decoded changes of already-aborted transactions

From

Ajin Cherian

Date:

15 March 2024, 04:20:51

On Fri, Mar 15, 2024 at 3:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I resumed working on this item. I've attached the new version patch.

I rebased the patch to the current HEAD and updated comments and
commit messages. The patch is straightforward and I'm somewhat
satisfied with it, but I'm thinking of adding some tests for it.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

I just had a look at the patch, the patch no longer applies because of a removal of a header in a recent commit. Overall the patch looks fine, and I didn't find any issues. Some cosmetic comments:

in ReorderBufferCheckTXNAbort()

+ /* Quick return if we've already knew the transaction status */
+ if (txn->aborted)
+ return true;

knew/know

/*
+ * If logical_replication_mode is "immediate", we don't check the
+ * transaction status so the caller always process this transaction.
+ */
+ if (debug_logical_replication_streaming == DEBUG_LOGICAL_REP_STREAMING_IMMEDIATE)
+ return false;

/process/processes

regards,

Ajin Cherian

Fujitsu Australia

Re: Skip collecting decoded changes of already-aborted transactions

From

Masahiko Sawada

Date:

18 March 2024, 08:49:41

On Fri, Mar 15, 2024 at 1:21 PM Ajin Cherian <itsajin@gmail.com> wrote:
>
>
>
> On Fri, Mar 15, 2024 at 3:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>>
>> I resumed working on this item. I've attached the new version patch.
>>
>> I rebased the patch to the current HEAD and updated comments and
>> commit messages. The patch is straightforward and I'm somewhat
>> satisfied with it, but I'm thinking of adding some tests for it.
>>
>> Regards,
>>
>> --
>> Masahiko Sawada
>> Amazon Web Services: https://aws.amazon.com
>
>
> I just had a look at the patch, the patch no longer applies because of a removal of a header in a recent commit.
Overallthe patch looks fine, and I didn't find any issues. Some cosmetic comments: 

Thank you for your review comments.

> in ReorderBufferCheckTXNAbort()
> + /* Quick return if we've already knew the transaction status */
> + if (txn->aborted)
> + return true;
>
> knew/know

Maybe it should be "known"?

>
> /*
> + * If logical_replication_mode is "immediate", we don't check the
> + * transaction status so the caller always process this transaction.
> + */
> + if (debug_logical_replication_streaming == DEBUG_LOGICAL_REP_STREAMING_IMMEDIATE)
> + return false;
>
> /process/processes
>

Fixed.

In addition to these changes, I've made some changes to the latest
patch. Here is the summary:

- Use txn_flags field to record the transaction status instead of two
'committed' and 'aborted' flags.
- Add regression tests.
- Update commit message.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment

v4-0001-Skip-logical-decoding-of-already-aborted-transact.patch

Re: Skip collecting decoded changes of already-aborted transactions

From

Ajin Cherian

Date:

27 March 2024, 11:49:14

On Mon, Mar 18, 2024 at 7:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

In addition to these changes, I've made some changes to the latest
patch. Here is the summary:

- Use txn_flags field to record the transaction status instead of two
'committed' and 'aborted' flags.
- Add regression tests.
- Update commit message.

Regards,

Hi Sawada-san,

Thanks for the updated patch. Some comments:

1.
+ * already aborted, we discards all changes accumulated so far and ignore
+ * future changes, and return true. Otherwise return false.

we discards/we discard

2. In function ReorderBufferCheckTXNAbort(): I haven't tested this but I wonder how prepared transactions would be considered, they are neither committed, nor in progress.

regards,

Ajin Cherian

Fujitsu Australia

Re: Skip collecting decoded changes of already-aborted transactions

From

Masahiko Sawada

Date:

27 March 2024, 12:22:43

On Wed, Mar 27, 2024 at 8:49 PM Ajin Cherian <itsajin@gmail.com> wrote:
>
>
>
> On Mon, Mar 18, 2024 at 7:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>>
>> In addition to these changes, I've made some changes to the latest
>> patch. Here is the summary:
>>
>> - Use txn_flags field to record the transaction status instead of two
>> 'committed' and 'aborted' flags.
>> - Add regression tests.
>> - Update commit message.
>>
>> Regards,
>>
>
> Hi Sawada-san,
>
> Thanks for the updated patch. Some comments:

Thank you for the view comments!

>
> 1.
> + * already aborted, we discards all changes accumulated so far and ignore
> + * future changes, and return true. Otherwise return false.
>
> we discards/we discard

Will fix it.

>
> 2. In function ReorderBufferCheckTXNAbort(): I haven't tested this but I wonder how prepared transactions would be
considered,they are neither committed, nor in progress. 

The transaction that is prepared but not resolved yet is considered as
in-progress.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Skip collecting decoded changes of already-aborted transactions

From

Peter Smith

Date:

12 June 2024, 02:41:02

Hi, here are some review comments for your patch v4-0001.

======
contrib/test_decoding/sql/stats.sql

1.
Huh? The test fails because the "expected results" file for these new
tests is missing from the patch.

======
.../replication/logical/reorderbuffer.c

2.
 static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
- bool txn_prepared);
+ bool txn_prepared, bool mark_streamed);

IIUC this new 'mark_streamed' parameter is more like a prerequisite
for the other conditions to decide to mark the tx as streamed -- i.e.
it is more like 'can_mark_streamed', so I felt the name should be
changed to be like that (everywhere it is used).

~~~

3. ReorderBufferTruncateTXN

- * 'txn_prepared' indicates that we have decoded the transaction at prepare
- * time.
+ * If mark_streamed is true, we could mark the transaction as streamed.
+ *
+ * 'streaming_txn' indicates that the given transaction is a
streaming transaction.
  */
 static void
-ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
bool txn_prepared)
+ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
bool txn_prepared,
+ bool mark_streamed)

~

What's that new comment about 'streaming_txn' for? It seemed unrelated
to the patch code.

~~~

4.
/*
* Mark the transaction as streamed.
*
* The top-level transaction, is marked as streamed always, even if it
* does not contain any changes (that is, when all the changes are in
* subtransactions).
*
* For subtransactions, we only mark them as streamed when there are
* changes in them.
*
* We do it this way because of aborts - we don't want to send aborts for
* XIDs the downstream is not aware of. And of course, it always knows
* about the toplevel xact (we send the XID in all messages), but we never
* stream XIDs of empty subxacts.
*/
if (mark_streamed && (!txn_prepared) &&
(rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
txn->txn_flags |= RBTXN_IS_STREAMED;

~~

With the patch introduction of the new parameter, I felt this code
might be better if it was refactored as follows:

/* Mark the transaction as streamed, if appropriate. */
if (can_mark_streamed)
{
  /*
   ... large comment
   */
  if ((!txn_prepared) && (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
    txn->txn_flags |= RBTXN_IS_STREAMED;
}

~~~

5. ReorderBufferPrepare

- if (txn->concurrent_abort && !rbtxn_is_streamed(txn))
+ if (!txn_aborted && rbtxn_did_abort(txn) && !rbtxn_is_streamed(txn))
  rb->prepare(rb, txn, txn->final_lsn);

~

Maybe I misunderstood this logic, but won't a "concurrent abort" cause
your new Assert added in ReorderBufferProcessTXN to fail?

+ /* Update transaction status */
+ Assert((curtxn->txn_flags & (RBTXN_COMMITTED | RBTXN_ABORTED)) == 0);

~~~

6. ReorderBufferCheckTXNAbort

+ /* Check the transaction status using CLOG lookup */
+ if (TransactionIdIsInProgress(txn->xid))
+ return false;
+
+ if (TransactionIdDidCommit(txn->xid))
+ {
+ /*
+ * Remember the transaction is committed so that we can skip CLOG
+ * check next time, avoiding the pressure on CLOG lookup.
+ */
+ txn->txn_flags |= RBTXN_COMMITTED;
+ return false;
+ }

IIUC the purpose of the TransactionIdDidCommit() was to avoid the
overhead of calling the TransactionIdIsInProgress(). So, shouldn't the
order of these checks be swapped? Otherwise, there might be 1 extra
unnecessary call to TransactionIdIsInProgress() next time.

======
src/include/replication/reorderbuffer.h

7.
 #define RBTXN_PREPARE              0x0040
 #define RBTXN_SKIPPED_PREPARE   0x0080
 #define RBTXN_HAS_STREAMABLE_CHANGE 0x0100
+#define RBTXN_COMMITTED 0x0200
+#define RBTXN_ABORTED 0x0400

For consistency with the existing bitmask names, I guess these should be named:
- RBTXN_COMMITTED --> RBTXN_IS_COMMITTED
- RBTXN_ABORTED --> RBTXN_IS_ABORTED

~~~

8.
Similarly, IMO the macros should have the same names as the bitmasks,
like the other nearby ones generally seem to.

rbtxn_did_commit --> rbtxn_is_committed
rbtxn_did_abort --> rbtxn_is_aborted

======

9.
Also, attached is a top-up patch for other cosmetic nitpicks:
- comment wording
- typos in comments
- excessive or missing blank lines
- etc.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

20240612_PS_nitpicks_for_v4.txt

Re: Skip collecting decoded changes of already-aborted transactions

From

Masahiko Sawada

Date:

29 October 2024, 23:29:52

On Wed, Mar 27, 2024 at 4:49 AM Ajin Cherian <itsajin@gmail.com> wrote:
>
>
>
> On Mon, Mar 18, 2024 at 7:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>>
>> In addition to these changes, I've made some changes to the latest
>> patch. Here is the summary:
>>
>> - Use txn_flags field to record the transaction status instead of two
>> 'committed' and 'aborted' flags.
>> - Add regression tests.
>> - Update commit message.
>>
>> Regards,
>>
>
> Hi Sawada-san,
>
> Thanks for the updated patch. Some comments:
>
> 1.
> + * already aborted, we discards all changes accumulated so far and ignore
> + * future changes, and return true. Otherwise return false.
>
> we discards/we discard

This comment is incorporated into the latest v5 patch I've just sent[1].

>
> 2. In function ReorderBufferCheckTXNAbort(): I haven't tested this but I wonder how prepared transactions would be
considered,they are neither committed, nor in progress. 
>

IIUC prepared transactions are considered as in-progress.

Regards,

[1] https://www.postgresql.org/message-id/CAD21AoDJE-bLdxt9T_z1rw74RN%3DE0n0%2BesYU0eo%2B-_P32EbuVg%40mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Skip collecting decoded changes of already-aborted transactions

From

Peter Smith

Date:

11 November 2024, 10:24:17

Hi Sawada-San, here are some review comments for the patch v5-0001.

======
Commit message.

1.
This commit introduces an additional check to determine if a
transaction is already aborted by a CLOG lookup, so the logical
decoding skips further change also when it doesn't touch system
catalogs.

~

Is that wording backwards? Is it meant to say:

This commit introduces an additional CLOG lookup check to determine if
a transaction is already aborted, so the ...

======
contrib/test_decoding/sql/stats.sql

2
+SELECT slot_name, spill_txns = 0 AS spill_txn, spill_count = 0 AS
spill_count FROM pg_stat_replication_slots WHERE slot_name =
'regression_slot_stats4_twophase';

Why do the SELECT "= 0" like this, instead of just having zeros in the
"expected" results?

======
.../replication/logical/reorderbuffer.c

3.
 static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
- bool txn_prepared);
+ bool txn_prepared, bool mark_streamed);

That last parameter name ('mark_streamed') does not match the same
parameter name in this function's definition.

~~~

ReorderBufferTruncateTXN:

4.
if (txn_streaming && (!txn_prepared) &&
  (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
  txn->txn_flags |= RBTXN_IS_STREAMED;

if (txn_prepared)
{
~

Since the following condition was already "if (txn_prepared)" would it
be better remove the "(!txn_prepared)" here and instead just refactor
the code like:

if (txn_prepared)
{
  ...
}
else if (txn_streaming && (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
{
  ...
}

~~~

ReorderBufferProcessTXN:

5.
+
+ /* Remember the transaction is aborted */
+ Assert((curtxn->txn_flags & RBTXN_IS_COMMITTED) == 0);
+ curtxn->txn_flags |= RBTXN_IS_ABORTED;

Missing period on comment.

~~~

ReorderBufferCheckTXNAbort:

6.
+ * If GUC 'debug_logical_replication_streaming' is "immediate", we don't
+ * check the transaction status, so the caller always processes this
+ * transaction. This is to disable this check for regression tests.
+ */
+static bool
+ReorderBufferCheckTXNAbort(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{
+ /*
+ * If GUC 'debug_logical_replication_streaming' is "immediate", we don't
+ * check the transaction status, so the caller always processes this
+ * transaction.
+ */
+ if (unlikely(debug_logical_replication_streaming ==
DEBUG_LOGICAL_REP_STREAMING_IMMEDIATE))
+ return false;
+

The wording of the sentence "This is to disable..." seemed a bit
confusing. Maybe this area can be simplified by doing the following.

6a.
Change the function comment to say more like below:

When the GUC 'debug_logical_replication_streaming' is set to
"immediate", we don't check the transaction status, meaning the caller
will always process this transaction. This mode is used by regression
tests to avoid unnecessary transaction status checking.

~

6b.
It is not necessary for this 2nd comment to repeat everything that was
already said in the function comment. A simpler comment here might be
all you need:

SUGGESTION:
Quick return for regression tests.

~~~

7.
Is it worth mentioning about this skipping of the transaction status
check in the docs for this GUC? [1]

======
[1] https://www.postgresql.org/docs/devel/runtime-config-developer.html

Kind Regards,
Peter Smith.
Fujitsu Australia.

Re: Skip collecting decoded changes of already-aborted transactions

From

Peter Smith

Date:

12 November 2024, 04:39:34

On Tue, Nov 12, 2024 at 5:00 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> I've attached the updated patch.
>

Hi, here are some review comments for the latest v6-0001.

======
contrib/test_decoding/sql/stats.sql

1.
+INSERT INTO stats_test SELECT 'serialize-topbig--1:'||g.i FROM
generate_series(1, 5000) g(i);

I didn't understand the meaning of "serialize-topbig--1". My guess is
it is a typo that was supposed to say "toobig".

Perhaps there should also be some comment to explain that this
"toobig" stuff was done deliberately like this to exceed
'logical_decoding_work_mem' because that would normally (if it was not
aborted) cause a spill to disk.

~~~

2.
+-- Check stats. We should not spill anything as the transaction is already
+-- aborted.
+SELECT pg_stat_force_next_flush();
+SELECT slot_name, spill_txns AS spill_txn, spill_count AS spill_count
FROM pg_stat_replication_slots WHERE slot_name =
'regression_slot_stats4_twophase';
+

Those aliases seem unnecessary: "spill_txns AS spill_txn" and
"spill_count AS spill_count"

======
.../replication/logical/reorderbuffer.c

ReorderBufferCheckTXNAbort:

3.
Other static functions are also declared at the top of this module.
For consistency, shouldn't this be the same?

~~~

4.
+ * We don't mark the transaction as streamed since this function can be
+ * called for non-streamed transactions too.
+ */
+ ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), false);
+ ReorderBufferToastReset(rb, txn);

Given the comment says "since this function can be called for
non-streamed transactions too", would it be easier to pass
rbtxn_is_streamed(txn) here instead of 'false', and then just remove
the comment?

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Skip collecting decoded changes of already-aborted transactions

From

vignesh C

Date:

13 November 2024, 06:28:55

On Mon, 11 Nov 2024 at 23:30, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Sun, Nov 10, 2024 at 11:24 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Hi Sawada-San, here are some review comments for the patch v5-0001.
> >
>
> Thank you for reviewing the patch!
>
> > ======
> > Commit message.
> >
> > 1.
> > This commit introduces an additional check to determine if a
> > transaction is already aborted by a CLOG lookup, so the logical
> > decoding skips further change also when it doesn't touch system
> > catalogs.
> >
> > ~
> >
> > Is that wording backwards? Is it meant to say:
> >
> > This commit introduces an additional CLOG lookup check to determine if
> > a transaction is already aborted, so the ...
>
> Fixed.
>
> >
> > ======
> > contrib/test_decoding/sql/stats.sql
> >
> > 2
> > +SELECT slot_name, spill_txns = 0 AS spill_txn, spill_count = 0 AS
> > spill_count FROM pg_stat_replication_slots WHERE slot_name =
> > 'regression_slot_stats4_twophase';
> >
> > Why do the SELECT "= 0" like this, instead of just having zeros in the
> > "expected" results?
>
> Indeed. I used "=0" like other queries in the same file do, but it
> makes sense to me just to have zeros in the expected file. That way,
> it would make it a bit easier to investigate in case of failures.
>
> >
> > ======
> > .../replication/logical/reorderbuffer.c
> >
> > 3.
> >  static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
> > - bool txn_prepared);
> > + bool txn_prepared, bool mark_streamed);
> >
> > That last parameter name ('mark_streamed') does not match the same
> > parameter name in this function's definition.
>
> Fixed.
>
> >
> > ~~~
> >
> > ReorderBufferTruncateTXN:
> >
> > 4.
> > if (txn_streaming && (!txn_prepared) &&
> >   (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
> >   txn->txn_flags |= RBTXN_IS_STREAMED;
> >
> > if (txn_prepared)
> > {
> > ~
> >
> > Since the following condition was already "if (txn_prepared)" would it
> > be better remove the "(!txn_prepared)" here and instead just refactor
> > the code like:
> >
> > if (txn_prepared)
> > {
> >   ...
> > }
> > else if (txn_streaming && (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
> > {
> >   ...
> > }
>
> Good idea.
>
> >
> > ~~~
> >
> > ReorderBufferProcessTXN:
> >
> > 5.
> > +
> > + /* Remember the transaction is aborted */
> > + Assert((curtxn->txn_flags & RBTXN_IS_COMMITTED) == 0);
> > + curtxn->txn_flags |= RBTXN_IS_ABORTED;
> >
> > Missing period on comment.
>
> Fixed.
>
> >
> > ~~~
> >
> > ReorderBufferCheckTXNAbort:
> >
> > 6.
> > + * If GUC 'debug_logical_replication_streaming' is "immediate", we don't
> > + * check the transaction status, so the caller always processes this
> > + * transaction. This is to disable this check for regression tests.
> > + */
> > +static bool
> > +ReorderBufferCheckTXNAbort(ReorderBuffer *rb, ReorderBufferTXN *txn)
> > +{
> > + /*
> > + * If GUC 'debug_logical_replication_streaming' is "immediate", we don't
> > + * check the transaction status, so the caller always processes this
> > + * transaction.
> > + */
> > + if (unlikely(debug_logical_replication_streaming ==
> > DEBUG_LOGICAL_REP_STREAMING_IMMEDIATE))
> > + return false;
> > +
> >
> > The wording of the sentence "This is to disable..." seemed a bit
> > confusing. Maybe this area can be simplified by doing the following.
> >
> > 6a.
> > Change the function comment to say more like below:
> >
> > When the GUC 'debug_logical_replication_streaming' is set to
> > "immediate", we don't check the transaction status, meaning the caller
> > will always process this transaction. This mode is used by regression
> > tests to avoid unnecessary transaction status checking.
> >
> > ~
> >
> > 6b.
> > It is not necessary for this 2nd comment to repeat everything that was
> > already said in the function comment. A simpler comment here might be
> > all you need:
> >
> > SUGGESTION:
> > Quick return for regression tests.
>
> Agreed with the above two comments. Fixed.
>
> >
> > ~~~
> >
> > 7.
> > Is it worth mentioning about this skipping of the transaction status
> > check in the docs for this GUC? [1]
>
> If we want to mention this optimization in the docs, we have to
> explain how the optimization works too. I think it's too detailed.
>
> I've attached the updated patch.

Few minor suggestions:
1) Can we use rbtxn_is_committed here?
+                       /* Remember the transaction is aborted. */
+                       Assert((curtxn->txn_flags & RBTXN_IS_COMMITTED) == 0);
+                       curtxn->txn_flags |= RBTXN_IS_ABORTED;

2) Similarly here too:
+       /*
+        * Mark the transaction as aborted so we ignore future changes of this
+        * transaction.
+        */
+       Assert((txn->txn_flags & RBTXN_IS_COMMITTED) == 0);
+       txn->txn_flags |= RBTXN_IS_ABORTED;

3) Can we use rbtxn_is_aborted here?
+               /*
+                * Remember the transaction is committed so that we
can skip CLOG
+                * check next time, avoiding the pressure on CLOG lookup.
+                */
+               Assert((txn->txn_flags & RBTXN_IS_ABORTED) == 0);

Regards,
Vignesh

Re: Skip collecting decoded changes of already-aborted transactions

From

Masahiko Sawada

Date:

13 November 2024, 21:00:18

On Tue, Nov 12, 2024 at 7:29 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Mon, 11 Nov 2024 at 23:30, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Sun, Nov 10, 2024 at 11:24 PM Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > > Hi Sawada-San, here are some review comments for the patch v5-0001.
> > >
> >
> > Thank you for reviewing the patch!
> >
> > > ======
> > > Commit message.
> > >
> > > 1.
> > > This commit introduces an additional check to determine if a
> > > transaction is already aborted by a CLOG lookup, so the logical
> > > decoding skips further change also when it doesn't touch system
> > > catalogs.
> > >
> > > ~
> > >
> > > Is that wording backwards? Is it meant to say:
> > >
> > > This commit introduces an additional CLOG lookup check to determine if
> > > a transaction is already aborted, so the ...
> >
> > Fixed.
> >
> > >
> > > ======
> > > contrib/test_decoding/sql/stats.sql
> > >
> > > 2
> > > +SELECT slot_name, spill_txns = 0 AS spill_txn, spill_count = 0 AS
> > > spill_count FROM pg_stat_replication_slots WHERE slot_name =
> > > 'regression_slot_stats4_twophase';
> > >
> > > Why do the SELECT "= 0" like this, instead of just having zeros in the
> > > "expected" results?
> >
> > Indeed. I used "=0" like other queries in the same file do, but it
> > makes sense to me just to have zeros in the expected file. That way,
> > it would make it a bit easier to investigate in case of failures.
> >
> > >
> > > ======
> > > .../replication/logical/reorderbuffer.c
> > >
> > > 3.
> > >  static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
> > > - bool txn_prepared);
> > > + bool txn_prepared, bool mark_streamed);
> > >
> > > That last parameter name ('mark_streamed') does not match the same
> > > parameter name in this function's definition.
> >
> > Fixed.
> >
> > >
> > > ~~~
> > >
> > > ReorderBufferTruncateTXN:
> > >
> > > 4.
> > > if (txn_streaming && (!txn_prepared) &&
> > >   (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
> > >   txn->txn_flags |= RBTXN_IS_STREAMED;
> > >
> > > if (txn_prepared)
> > > {
> > > ~
> > >
> > > Since the following condition was already "if (txn_prepared)" would it
> > > be better remove the "(!txn_prepared)" here and instead just refactor
> > > the code like:
> > >
> > > if (txn_prepared)
> > > {
> > >   ...
> > > }
> > > else if (txn_streaming && (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
> > > {
> > >   ...
> > > }
> >
> > Good idea.
> >
> > >
> > > ~~~
> > >
> > > ReorderBufferProcessTXN:
> > >
> > > 5.
> > > +
> > > + /* Remember the transaction is aborted */
> > > + Assert((curtxn->txn_flags & RBTXN_IS_COMMITTED) == 0);
> > > + curtxn->txn_flags |= RBTXN_IS_ABORTED;
> > >
> > > Missing period on comment.
> >
> > Fixed.
> >
> > >
> > > ~~~
> > >
> > > ReorderBufferCheckTXNAbort:
> > >
> > > 6.
> > > + * If GUC 'debug_logical_replication_streaming' is "immediate", we don't
> > > + * check the transaction status, so the caller always processes this
> > > + * transaction. This is to disable this check for regression tests.
> > > + */
> > > +static bool
> > > +ReorderBufferCheckTXNAbort(ReorderBuffer *rb, ReorderBufferTXN *txn)
> > > +{
> > > + /*
> > > + * If GUC 'debug_logical_replication_streaming' is "immediate", we don't
> > > + * check the transaction status, so the caller always processes this
> > > + * transaction.
> > > + */
> > > + if (unlikely(debug_logical_replication_streaming ==
> > > DEBUG_LOGICAL_REP_STREAMING_IMMEDIATE))
> > > + return false;
> > > +
> > >
> > > The wording of the sentence "This is to disable..." seemed a bit
> > > confusing. Maybe this area can be simplified by doing the following.
> > >
> > > 6a.
> > > Change the function comment to say more like below:
> > >
> > > When the GUC 'debug_logical_replication_streaming' is set to
> > > "immediate", we don't check the transaction status, meaning the caller
> > > will always process this transaction. This mode is used by regression
> > > tests to avoid unnecessary transaction status checking.
> > >
> > > ~
> > >
> > > 6b.
> > > It is not necessary for this 2nd comment to repeat everything that was
> > > already said in the function comment. A simpler comment here might be
> > > all you need:
> > >
> > > SUGGESTION:
> > > Quick return for regression tests.
> >
> > Agreed with the above two comments. Fixed.
> >
> > >
> > > ~~~
> > >
> > > 7.
> > > Is it worth mentioning about this skipping of the transaction status
> > > check in the docs for this GUC? [1]
> >
> > If we want to mention this optimization in the docs, we have to
> > explain how the optimization works too. I think it's too detailed.
> >
> > I've attached the updated patch.
>
> Few minor suggestions:
> 1) Can we use rbtxn_is_committed here?
> +                       /* Remember the transaction is aborted. */
> +                       Assert((curtxn->txn_flags & RBTXN_IS_COMMITTED) == 0);
> +                       curtxn->txn_flags |= RBTXN_IS_ABORTED;
>
> 2) Similarly here too:
> +       /*
> +        * Mark the transaction as aborted so we ignore future changes of this
> +        * transaction.
> +        */
> +       Assert((txn->txn_flags & RBTXN_IS_COMMITTED) == 0);
> +       txn->txn_flags |= RBTXN_IS_ABORTED;
>
> 3) Can we use rbtxn_is_aborted here?
> +               /*
> +                * Remember the transaction is committed so that we
> can skip CLOG
> +                * check next time, avoiding the pressure on CLOG lookup.
> +                */
> +               Assert((txn->txn_flags & RBTXN_IS_ABORTED) == 0);
>

Thank you for reviewing the patch!

These comments are incorporated into the latest v6 patch I just sent[1].

Regards,

[1] https://www.postgresql.org/message-id/CAD21AoDtMjbc8YCQiX1K8%2BRKeahcX2MLt3gwApm5BWGfv14i5A%40mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Skip collecting decoded changes of already-aborted transactions

From

Peter Smith

Date:

14 November 2024, 07:22:57

Hi Sawda-San,

Here are some more review comments for the latest (accidentally called
v6 again?) v6-0001 patch.

======
contrib/test_decoding/sql/stats.sql

1.
+-- Execute a transaction that is prepared and aborted. We detect that the
+-- transaction is aborted before spilling changes, and then skip collecting
+-- further changes.

You had replied (referring to the above comment):
I think we already mentioned the transaction is going to be spilled
but actually not.

~

Yes, spilling was already mentioned in the current comment but I felt
it assumes the reader is expected to know details of why it was going
to be spilled in the first place.

In other words, I thought the comment could include a bit more
explanatory background info:
(Also, it's not really "we detect" the abort -- it's the new postgres
code of this patch that detects it.)

SUGGESTION:
Execute a transaction that is prepared but then aborted. The INSERT
data exceeds the 'logical_decoding_work_mem limit' limit which
normally would result in the transaction being spilled to disk, but
now when Postgres detects the abort it skips the spilling and also
skips collecting further changes.

~~~

2.
+-- Check if the transaction is not spilled as it's already aborted.
+SELECT count(*) FROM
pg_logical_slot_get_changes('regression_slot_stats4_twophase', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT pg_stat_force_next_flush();
+SELECT slot_name, spill_txns, spill_count FROM
pg_stat_replication_slots WHERE slot_name =
'regression_slot_stats4_twophase';
+

/Check if the transaction is not spilled/Verify that the transaction
was not spilled/

======
.../replication/logical/reorderbuffer.c

ReorderBufferResetTXN:

3.
  /* Discard the changes that we just streamed */
- ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn));
+ ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), true);

Looking at the calling code for ReorderBufferResetTXN it seems this
function can called for streaming OR prepared. So is it OK here to be
passing hardwired 'true' as the txn_streaming parameter, or should
that be passing rbtxn_is_streamed(txn)?

~~~

ReorderBufferLargestStreamableTopTXN:

4.
  if ((largest == NULL || txn->total_size > largest_size) &&
  (txn->total_size > 0) && !(rbtxn_has_partial_change(txn)) &&
- rbtxn_has_streamable_change(txn))
+ rbtxn_has_streamable_change(txn) && !(rbtxn_is_aborted(txn)))
  {
  largest = txn;
  largest_size = txn->total_size;

I felt that this increasingly complicated code would be a lot easier
to understand if you just separate the conditions into: (a) the ones
that filter out transaction you don't care about; (b) the ones that
check for the largest size. For example,

SUGGESTION:
dlist_foreach(...)
{
  ...

  /* Don't consider these kinds of transactions for eviction. */
  if (rbtxn_has_partial_change(txn) ||
!rbtxn_has_streamable_change(txn) || rbtxn_is_aborted(txn))
    continue;

  /* Find the largest of the eviction candidates. */
  if ((largest == NULL || txn->total_size > largest_size) &&
(txn->total_size > 0))
  {
    largest = txn;
    largest_size = txn->total_size;
  }
}

~~~

ReorderBufferCheckMemoryLimit:

5.
+ /* skip the transaction if already aborted */
+ if (ReorderBufferCheckTXNAbort(rb, txn))
+ {
+ /* All changes should be truncated */
+ Assert(txn->size == 0 && txn->total_size == 0);
+ continue;
+ }

The "discard all changes accumulated so far" side-effect happening
here is not very apparent from the function name. Maybe a better name
for ReorderBufferCheckTXNAbort() would be something like
'ReorderBufferCleanupIfAbortedTXN()'.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Skip collecting decoded changes of already-aborted transactions

From

Peter Smith

Date:

15 November 2024, 06:06:54

Hi Sawada-Sn,

Here are some review comments for patch v8-0001.

======
contrib/test_decoding/sql/stats.sql

1.
+-- The INSERT changes are large enough to be spilled but not, because the
+-- transaction is aborted. The logical decoding skips collecting further
+-- changes too. The transaction is prepared to make sure the decoding processes
+-- the aborted transaction.

/to be spilled but not/to be spilled but will not be/

======
.../replication/logical/reorderbuffer.c

ReorderBufferTruncateTXN:

2.
 /*
  * Discard changes from a transaction (and subtransactions), either after
- * streaming or decoding them at PREPARE. Keep the remaining info -
- * transactions, tuplecids, invalidations and snapshots.
+ * streaming, decoding them at PREPARE, or detecting the transaction abort.
+ * Keep the remaining info - transactions, tuplecids, invalidations and
+ * snapshots.
  *
  * We additionally remove tuplecids after decoding the transaction at prepare
  * time as we only need to perform invalidation at rollback or commit prepared.
  *
+ * The given transaction is marked as streamed if appropriate and the caller
+ * asked it by passing 'mark_txn_streaming' being true.
+ *
  * 'txn_prepared' indicates that we have decoded the transaction at prepare
  * time.
  */
 static void
-ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
bool txn_prepared)
+ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
bool txn_prepared,
+ bool mark_txn_streaming)

I think the function comment should describe the parameters in the
same order that they appear in the function signature.

~~~

3.
+ else if (mark_txn_streaming && (rbtxn_is_toptxn(txn) ||
(txn->nentries_mem != 0)))
+ {
...
+ txn->txn_flags |= RBTXN_IS_STREAMED;
+ }

I guess it doesn't matter much, but for the sake of readability,
should the condition also be checking !rbtxn_is_streamed(txn) to avoid
overwriting the RBTXN_IS_STREAMED bit when it was set already?

~~~

ReorderBufferTruncateTXNIfAborted:

4.
+ /*
+ * The transaction aborted. We discard the changes we've collected so far,
+ * and free all resources allocated for toast reconstruction. The full
+ * cleanup will happen as part of decoding ABORT record of this
+ * transaction.
+ *
+ * Since we don't check the transaction status while replaying the
+ * transaction, we don't need to reset toast reconstruction data here.
+ */
+ ReorderBufferTruncateTXN(rb, txn, false, false);

4a.
The first part of the comment says "... and free all resources
allocated for toast reconstruction", but the second part says "we
don't need to reset toast reconstruction data here". Is that a
contradiction?

~

4b.
Shouldn't this call still be passing rbtxn_prepared(txn) as the 2nd
last param, like it used to?

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Skip collecting decoded changes of already-aborted transactions

From

vignesh C

Date:

19 November 2024, 10:12:44

On Fri, 15 Nov 2024 at 23:32, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Nov 14, 2024 at 7:07 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Hi Sawada-Sn,
> >
> > Here are some review comments for patch v8-0001.
>
> Thank you for the comments.
>
> >
> > ======
> > contrib/test_decoding/sql/stats.sql
> >
> > 1.
> > +-- The INSERT changes are large enough to be spilled but not, because the
> > +-- transaction is aborted. The logical decoding skips collecting further
> > +-- changes too. The transaction is prepared to make sure the decoding processes
> > +-- the aborted transaction.
> >
> > /to be spilled but not/to be spilled but will not be/
>
> Fixed.
>
> >
> > ======
> > .../replication/logical/reorderbuffer.c
> >
> > ReorderBufferTruncateTXN:
> >
> > 2.
> >  /*
> >   * Discard changes from a transaction (and subtransactions), either after
> > - * streaming or decoding them at PREPARE. Keep the remaining info -
> > - * transactions, tuplecids, invalidations and snapshots.
> > + * streaming, decoding them at PREPARE, or detecting the transaction abort.
> > + * Keep the remaining info - transactions, tuplecids, invalidations and
> > + * snapshots.
> >   *
> >   * We additionally remove tuplecids after decoding the transaction at prepare
> >   * time as we only need to perform invalidation at rollback or commit prepared.
> >   *
> > + * The given transaction is marked as streamed if appropriate and the caller
> > + * asked it by passing 'mark_txn_streaming' being true.
> > + *
> >   * 'txn_prepared' indicates that we have decoded the transaction at prepare
> >   * time.
> >   */
> >  static void
> > -ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
> > bool txn_prepared)
> > +ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
> > bool txn_prepared,
> > + bool mark_txn_streaming)
> >
> > I think the function comment should describe the parameters in the
> > same order that they appear in the function signature.
>
> Not sure it should be. We sometimes describe the overall idea of the
> function first while using arguments names, and then describe what
> other arguments mean.
>
> >
> > ~~~
> >
> > 3.
> > + else if (mark_txn_streaming && (rbtxn_is_toptxn(txn) ||
> > (txn->nentries_mem != 0)))
> > + {
> > ...
> > + txn->txn_flags |= RBTXN_IS_STREAMED;
> > + }
> >
> > I guess it doesn't matter much, but for the sake of readability,
> > should the condition also be checking !rbtxn_is_streamed(txn) to avoid
> > overwriting the RBTXN_IS_STREAMED bit when it was set already?
>
> Not sure it improves readability because it adds one more check there.
> If it's important not to re-set RBTXN_IS_STREAMED, it makes sense to
> have that check and describe in the comment. But in this case, I think
> we don't necessarily need to do that.
>
> > ~~~
> >
> > ReorderBufferTruncateTXNIfAborted:
> >
> > 4.
> > + /*
> > + * The transaction aborted. We discard the changes we've collected so far,
> > + * and free all resources allocated for toast reconstruction. The full
> > + * cleanup will happen as part of decoding ABORT record of this
> > + * transaction.
> > + *
> > + * Since we don't check the transaction status while replaying the
> > + * transaction, we don't need to reset toast reconstruction data here.
> > + */
> > + ReorderBufferTruncateTXN(rb, txn, false, false);
> >
> > 4a.
> > The first part of the comment says "... and free all resources
> > allocated for toast reconstruction", but the second part says "we
> > don't need to reset toast reconstruction data here". Is that a
> > contradiction?
>
> Yes, the comment is out-of-date. Since this function is not called
> while replaying the transaction, it should not have any toast
> reconstruction data.
>
> >
> > ~
> >
> > 4b.
> > Shouldn't this call still be passing rbtxn_prepared(txn) as the 2nd
> > last param, like it used to?
>
> Actually it's not necessary because it should always be false. But
> thinking more, it seems to be better to use rbtxn_preapred(txn) since
> it's consistent with other places and it's not necessary to put
> assumptions there.

Few comments:
1) Should we have the Assert inside ReorderBufferTruncateTXNIfAborted
instead of having it at multiple callers, ReorderBufferResetTXN also
has the Assert inside the function after truncate of the transaction:
@@ -3672,6 +3758,14 @@ ReorderBufferCheckMemoryLimit(ReorderBuffer *rb)
                        Assert(txn->total_size > 0);
                        Assert(rb->size >= txn->total_size);

+                       /* skip the transaction if aborted */
+                       if (ReorderBufferTruncateTXNIfAborted(rb, txn))
+                       {
+                               /* All changes should be discarded */
+                               Assert(txn->size == 0 && txn->total_size == 0);
+                               continue;
+                       }
+
                        ReorderBufferStreamTXN(rb, txn);
                }
                else
@@ -3687,6 +3781,14 @@ ReorderBufferCheckMemoryLimit(ReorderBuffer *rb)
                        Assert(txn->size > 0);
                        Assert(rb->size >= txn->size);

+                       /* skip the transaction if aborted */
+                       if (ReorderBufferTruncateTXNIfAborted(rb, txn))
+                       {
+                               /* All changes should be discarded */
+                               Assert(txn->size == 0 && txn->total_size == 0);
+                               continue;
+                       }

2) txn->txn_flags can be moved to the next line to keep it within 80
chars in this case:
 * Check the transaction status by looking CLOG and discard all changes if
 * the transaction is aborted. The transaction status is cached in
txn->txn_flags
 * so we can skip future changes and avoid CLOG lookups on the next call. Return

3) Is there any scenario where the Assert can fail as the toast is not reset:
+        * Since we don't check the transaction status while replaying the
+        * transaction, we don't need to reset toast reconstruction data here.
+        */
+       ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), false);

+                       if (ReorderBufferTruncateTXNIfAborted(rb, txn))
+                       {
+                               /* All changes should be discarded */
+                               Assert(txn->size == 0 && txn->total_size == 0);
+                               continue;
+                       }

4) This can be changed to a single line comment:
+       /*
+        * Quick return if the transaction status is already known.
+        */
+       if (rbtxn_is_committed(txn))
+               return false;

Regards,
Vignesh

Re: Skip collecting decoded changes of already-aborted transactions

From

Peter Smith

Date:

25 November 2024, 07:50:14

Hi, Here are my review comments for patch v9-0001.

These are only trivial nits for some code comments. Everything else
looked good to me.

======
.../replication/logical/reorderbuffer.c

ReorderBufferTruncateTXN:

1.
+ * The given transaction is marked as streamed if appropriate and the caller
+ * asked it by passing 'mark_txn_streaming' being true.

/asked it/requested it/

/being true/as true/

~~~

ReorderBufferPrepare:

2.
+ /*
+ * Remember if the transaction is already aborted to check if we detect
+ * that the transaction is concurrently aborted during the replay.
+ */

SUGGESTION:
Remember if the transaction is already aborted so we can detect when
the transaction is concurrently aborted during the replay.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Skip collecting decoded changes of already-aborted transactions

From

Masahiko Sawada

Date:

26 November 2024, 00:28:00

On Mon, Nov 18, 2024 at 11:12 PM vignesh C <vignesh21@gmail.com> wrote:
>
>
> Few comments:

Thank you for reviewing the patch!

> 1) Should we have the Assert inside ReorderBufferTruncateTXNIfAborted
> instead of having it at multiple callers, ReorderBufferResetTXN also
> has the Assert inside the function after truncate of the transaction:
> @@ -3672,6 +3758,14 @@ ReorderBufferCheckMemoryLimit(ReorderBuffer *rb)
>                         Assert(txn->total_size > 0);
>                         Assert(rb->size >= txn->total_size);
>
> +                       /* skip the transaction if aborted */
> +                       if (ReorderBufferTruncateTXNIfAborted(rb, txn))
> +                       {
> +                               /* All changes should be discarded */
> +                               Assert(txn->size == 0 && txn->total_size == 0);
> +                               continue;
> +                       }
> +
>                         ReorderBufferStreamTXN(rb, txn);
>                 }
>                 else
> @@ -3687,6 +3781,14 @@ ReorderBufferCheckMemoryLimit(ReorderBuffer *rb)
>                         Assert(txn->size > 0);
>                         Assert(rb->size >= txn->size);
>
> +                       /* skip the transaction if aborted */
> +                       if (ReorderBufferTruncateTXNIfAborted(rb, txn))
> +                       {
> +                               /* All changes should be discarded */
> +                               Assert(txn->size == 0 && txn->total_size == 0);
> +                               continue;
> +                       }

Moved.

>
> 2) txn->txn_flags can be moved to the next line to keep it within 80
> chars in this case:
>  * Check the transaction status by looking CLOG and discard all changes if
>  * the transaction is aborted. The transaction status is cached in
> txn->txn_flags
>  * so we can skip future changes and avoid CLOG lookups on the next call. Return

Fixed.

>
> 3) Is there any scenario where the Assert can fail as the toast is not reset:
> +        * Since we don't check the transaction status while replaying the
> +        * transaction, we don't need to reset toast reconstruction data here.
> +        */
> +       ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), false);
>
> +                       if (ReorderBufferTruncateTXNIfAborted(rb, txn))
> +                       {
> +                               /* All changes should be discarded */
> +                               Assert(txn->size == 0 && txn->total_size == 0);
> +                               continue;
> +                       }

IIUC we reconstruct TOAST data when replaying the transaction. On the
other hand, this function is called while adding a decoded change but
not when replaying the transaction. So we should not have any toast
reconstruction data at this point unless I'm missing something. Do you
have any scenario where we call ReorderBufferTruncateTXNIfAborted()
while a transaction has TOAST reconstruction data?

>
> 4) This can be changed to a single line comment:
> +       /*
> +        * Quick return if the transaction status is already known.
> +        */
> +       if (rbtxn_is_committed(txn))
> +               return false;

Fixed.

I'll post the updated version patch soon.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Skip collecting decoded changes of already-aborted transactions

From

vignesh C

Date:

27 November 2024, 09:01:30

On Tue, 26 Nov 2024 at 02:58, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Nov 18, 2024 at 11:12 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> >
> > Few comments:
>
> Thank you for reviewing the patch!
>
> > 1) Should we have the Assert inside ReorderBufferTruncateTXNIfAborted
> > instead of having it at multiple callers, ReorderBufferResetTXN also
> > has the Assert inside the function after truncate of the transaction:
> > @@ -3672,6 +3758,14 @@ ReorderBufferCheckMemoryLimit(ReorderBuffer *rb)
> >                         Assert(txn->total_size > 0);
> >                         Assert(rb->size >= txn->total_size);
> >
> > +                       /* skip the transaction if aborted */
> > +                       if (ReorderBufferTruncateTXNIfAborted(rb, txn))
> > +                       {
> > +                               /* All changes should be discarded */
> > +                               Assert(txn->size == 0 && txn->total_size == 0);
> > +                               continue;
> > +                       }
> > +
> >                         ReorderBufferStreamTXN(rb, txn);
> >                 }
> >                 else
> > @@ -3687,6 +3781,14 @@ ReorderBufferCheckMemoryLimit(ReorderBuffer *rb)
> >                         Assert(txn->size > 0);
> >                         Assert(rb->size >= txn->size);
> >
> > +                       /* skip the transaction if aborted */
> > +                       if (ReorderBufferTruncateTXNIfAborted(rb, txn))
> > +                       {
> > +                               /* All changes should be discarded */
> > +                               Assert(txn->size == 0 && txn->total_size == 0);
> > +                               continue;
> > +                       }
>
> Moved.
>
> >
> > 2) txn->txn_flags can be moved to the next line to keep it within 80
> > chars in this case:
> >  * Check the transaction status by looking CLOG and discard all changes if
> >  * the transaction is aborted. The transaction status is cached in
> > txn->txn_flags
> >  * so we can skip future changes and avoid CLOG lookups on the next call. Return
>
> Fixed.
>
> >
> > 3) Is there any scenario where the Assert can fail as the toast is not reset:
> > +        * Since we don't check the transaction status while replaying the
> > +        * transaction, we don't need to reset toast reconstruction data here.
> > +        */
> > +       ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), false);
> >
> > +                       if (ReorderBufferTruncateTXNIfAborted(rb, txn))
> > +                       {
> > +                               /* All changes should be discarded */
> > +                               Assert(txn->size == 0 && txn->total_size == 0);
> > +                               continue;
> > +                       }
>
> IIUC we reconstruct TOAST data when replaying the transaction. On the
> other hand, this function is called while adding a decoded change but
> not when replaying the transaction. So we should not have any toast
> reconstruction data at this point unless I'm missing something. Do you
> have any scenario where we call ReorderBufferTruncateTXNIfAborted()
> while a transaction has TOAST reconstruction data?

I have checked further regarding the toast and verified the population
of the toast hash. I agree with you on this. Overall, the patch
appears to be in good shape.

Regards,
Vignesh

Re: Skip collecting decoded changes of already-aborted transactions

From

Masahiko Sawada

Date:

10 December 2024, 04:31:09

On Tue, Nov 26, 2024 at 10:01 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Tue, 26 Nov 2024 at 02:58, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Nov 18, 2024 at 11:12 PM vignesh C <vignesh21@gmail.com> wrote:
> > >
> > >
> > > Few comments:
> >
> > Thank you for reviewing the patch!
> >
> > > 1) Should we have the Assert inside ReorderBufferTruncateTXNIfAborted
> > > instead of having it at multiple callers, ReorderBufferResetTXN also
> > > has the Assert inside the function after truncate of the transaction:
> > > @@ -3672,6 +3758,14 @@ ReorderBufferCheckMemoryLimit(ReorderBuffer *rb)
> > >                         Assert(txn->total_size > 0);
> > >                         Assert(rb->size >= txn->total_size);
> > >
> > > +                       /* skip the transaction if aborted */
> > > +                       if (ReorderBufferTruncateTXNIfAborted(rb, txn))
> > > +                       {
> > > +                               /* All changes should be discarded */
> > > +                               Assert(txn->size == 0 && txn->total_size == 0);
> > > +                               continue;
> > > +                       }
> > > +
> > >                         ReorderBufferStreamTXN(rb, txn);
> > >                 }
> > >                 else
> > > @@ -3687,6 +3781,14 @@ ReorderBufferCheckMemoryLimit(ReorderBuffer *rb)
> > >                         Assert(txn->size > 0);
> > >                         Assert(rb->size >= txn->size);
> > >
> > > +                       /* skip the transaction if aborted */
> > > +                       if (ReorderBufferTruncateTXNIfAborted(rb, txn))
> > > +                       {
> > > +                               /* All changes should be discarded */
> > > +                               Assert(txn->size == 0 && txn->total_size == 0);
> > > +                               continue;
> > > +                       }
> >
> > Moved.
> >
> > >
> > > 2) txn->txn_flags can be moved to the next line to keep it within 80
> > > chars in this case:
> > >  * Check the transaction status by looking CLOG and discard all changes if
> > >  * the transaction is aborted. The transaction status is cached in
> > > txn->txn_flags
> > >  * so we can skip future changes and avoid CLOG lookups on the next call. Return
> >
> > Fixed.
> >
> > >
> > > 3) Is there any scenario where the Assert can fail as the toast is not reset:
> > > +        * Since we don't check the transaction status while replaying the
> > > +        * transaction, we don't need to reset toast reconstruction data here.
> > > +        */
> > > +       ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), false);
> > >
> > > +                       if (ReorderBufferTruncateTXNIfAborted(rb, txn))
> > > +                       {
> > > +                               /* All changes should be discarded */
> > > +                               Assert(txn->size == 0 && txn->total_size == 0);
> > > +                               continue;
> > > +                       }
> >
> > IIUC we reconstruct TOAST data when replaying the transaction. On the
> > other hand, this function is called while adding a decoded change but
> > not when replaying the transaction. So we should not have any toast
> > reconstruction data at this point unless I'm missing something. Do you
> > have any scenario where we call ReorderBufferTruncateTXNIfAborted()
> > while a transaction has TOAST reconstruction data?
>
> I have checked further regarding the toast and verified the population
> of the toast hash. I agree with you on this. Overall, the patch
> appears to be in good shape.

Thank you for the confirmation!

I thought we'd done performance tests with this patch but Michael-san
pointed out we've not done yet. So I've done benchmark tests in two
scenarios:

A. Skip decoding large aborted transactions.

1. Preparation (SQL commands)

create table test (c int);
select pg_create_logical_replication_slot('s', 'test_decoding');
begin;
insert into test select generate_series(1, 1_000_000);
commit;
begin;
insert into test select generate_series(1, 1_000_000);
rollback;
begin;
insert into test select generate_series(1, 1_000_000);
rollback;

2. Performance tests (results are w/o patch vs. w/ patch)

-- causes some spill/streamed transactions
set logical_decoding_work_mem to '64MB';

select 'non-streaming', count(*) from
pg_logical_slot_peek_changes('s', null, null, 'stream-changes',
'false');
  -> 2636.208 ms vs. 2070.906 ms

select 'streaming', count(*) from pg_logical_slot_peek_changes('s',
null, null, 'stream-changes', 'true');
  -> 910.579 ms vs. 653.574 ms

 -- no spill/streamed transactions
set logical_decoding_work_mem to '5GB';

select 'non-streaming', count(*) from
pg_logical_slot_peek_changes('s', null, null, 'stream-changes',
'false');
   -> 962.863 ms vs. 956.910 ms

select 'streaming', count(*) from pg_logical_slot_peek_changes('s',
null, null, 'stream-changes', 'true');
   -> 973.426 ms vs. 973.033 ms

According to the results, skipping logical decoding of already-aborted
transactions contributes performance improvements.

B. Decoding medium-size transactions to check overheads of CLOG lookups.

1. Preparation (shell script)

pgbench -i -s 1 postgres
psql -c "create table test (c int)"
psql -c "select pg_create_logical_replication_slot('s', 'test_decoding')"
echo "insert into test select generate_series(1, 100)" > /tmp/bench.sql
pgbench -t 10000 -c 10 -j 5 -f /tmp/bench.sql postgres

2. Performance tests

 -- spill/streamed transactions
set logical_decoding_work_mem to '64';

select 'non-streaming', count(*) from
pg_logical_slot_peek_changes('s', null, null, 'stream-changes',
'false');
  -> 7230.537 ms vs. 7154.322 ms

select 'streaming', count(*) from pg_logical_slot_peek_changes('s',
null, null, 'stream-changes', 'true');
   -> 6702.438 ms vs. 6678.232 ms

Overall, I don't see noticeable overheads of CLOG lookups.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Skip collecting decoded changes of already-aborted transactions

From

Amit Kapila

Date:

10 December 2024, 08:09:31

On Tue, Nov 26, 2024 at 3:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> I've attached a new version patch that incorporates all comments I got so far.
>

Review comments:
===============
1.
+ * The given transaction is marked as streamed if appropriate and the caller
+ * requested it by passing 'mark_txn_streaming' as true.
+ *
  * 'txn_prepared' indicates that we have decoded the transaction at prepare
  * time.
  */
 static void
-ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
bool txn_prepared)
+ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
bool txn_prepared,
+ bool mark_txn_streaming)
 {
...
  }
+ else if (mark_txn_streaming && (rbtxn_is_toptxn(txn) ||
(txn->nentries_mem != 0)))
+ {
+ /*
+ * Mark the transaction as streamed, if appropriate.

The comments related to the above changes don't clarify in which cases
the 'mark_txn_streaming' should be set. Before this patch, it was
clear from the comments and code about the cases where we would decide
to mark it as streamed.

2.
+ /*
+ * Mark the transaction as aborted so we ignore future changes of this
+ * transaction.

/so we ignore/so we can ignore/

3.
* Helper function for ReorderBufferProcessTXN to handle the concurrent
- * abort of the streaming transaction.  This resets the TXN such that it
- * can be used to stream the remaining data of transaction being processed.
- * This can happen when the subtransaction is aborted and we still want to
- * continue processing the main or other subtransactions data.
+ * abort of the streaming (prepared) transaction.
...

In the above comment, "... streaming (prepared)...", you added
prepared to imply that this function handles concurrent abort for both
in-progress and prepared transactions. Am I correct? If so, the
current change makes it less clear. If you see the comments at its
caller, they are clearer.

4.
+ /*
+ * Remember if the transaction is already aborted so we can detect when
+ * the transaction is concurrently aborted during the replay.
+ */
+ already_aborted = rbtxn_is_aborted(txn);
+
  ReorderBufferReplay(txn, rb, xid, txn->final_lsn, txn->end_lsn,
  txn->xact_time.prepare_time, txn->origin_id, txn->origin_lsn);

@@ -2832,10 +2918,10 @@ ReorderBufferPrepare(ReorderBuffer *rb,
TransactionId xid,
  * when rollback prepared is decoded and sent, the downstream should be
  * able to rollback such a xact. See comments atop DecodePrepare.
  *
- * Note, for the concurrent_abort + streaming case a stream_prepare was
+ * Note, for the concurrent abort + streaming case a stream_prepare was
  * already sent within the ReorderBufferReplay call above.
  */
- if (txn->concurrent_abort && !rbtxn_is_streamed(txn))
+ if (!already_aborted && rbtxn_is_aborted(txn) && !rbtxn_is_streamed(txn))
  rb->prepare(rb, txn, txn->final_lsn);

It is not clear from the comments how the 'already_aborted' is
handled. I think after this patch we would have already truncated all
its changes. If so, why do we need to try to replay the changes of
such a xact?

5.
+/*
+ * Check the transaction status by looking CLOG and discard all changes if
+ * the transaction is aborted. The transaction status is cached in
+ * txn->txn_flags so we can skip future changes and avoid CLOG lookups on the
+ * next call. Return true if the transaction is aborted, otherwise return
+ * false.
+ *
+ * When the 'debug_logical_replication_streaming' is set to "immediate", we
+ * don't check the transaction status, meaning the caller will always process
+ * this transaction.
+ */
+static bool
+ReorderBufferTruncateTXNIfAborted(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{

I think this function is being invoked to mark a sub-transaction as
aborted. It is better to explain in comments how it interacts with
sub-transactions, why it is okay to mark them as aborted, and how the
other parts of the system interact with it.

--
With Regards,
Amit Kapila.

Re: Skip collecting decoded changes of already-aborted transactions

From

Dilip Kumar

Date:

10 December 2024, 08:29:05

On Tue, Nov 26, 2024 at 3:02 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

>
> I've attached a new version patch that incorporates all comments I got so far.
>
> I think the patch is in good shape but I'm considering whether we
> might want to call ReorderBufferToastReset() after truncating all
> changes, in ReorderBufferTruncateTXNIfAborted() just in case. Will
> investigate further.
>

There’s something that seems a bit odd to me. Consider the case where
the largest transaction(s) are aborted. If
ReorderBufferCanStartStreaming() returns true, the changes from this
transaction will only be discarded if it's a streamable transaction.
However, if ReorderBufferCanStartStreaming() is false, the changes
will be discarded regardless.

What seems strange to me in this patch is truncating the changes of a
large aborted transaction depending on whether we need to stream or
spill but actually that should be completely independent IMHO. My
concern is that if the largest transaction is aborted but isn’t yet
streamable, we might end up picking the next transaction, which could
be much smaller. This smaller transaction might not help us stay
within the memory limit, and we could repeat this process for a few
more transactions. In contrast, it might be more efficient to simply
discard the large aborted transaction, even if it’s not streamable, to
avoid this issue.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Skip collecting decoded changes of already-aborted transactions

From

Amit Kapila

Date:

10 December 2024, 08:39:28

On Tue, Dec 10, 2024 at 10:59 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Nov 26, 2024 at 3:02 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> >
> > I've attached a new version patch that incorporates all comments I got so far.
> >
> > I think the patch is in good shape but I'm considering whether we
> > might want to call ReorderBufferToastReset() after truncating all
> > changes, in ReorderBufferTruncateTXNIfAborted() just in case. Will
> > investigate further.
> >
>
> There’s something that seems a bit odd to me. Consider the case where
> the largest transaction(s) are aborted. If
> ReorderBufferCanStartStreaming() returns true, the changes from this
> transaction will only be discarded if it's a streamable transaction.
> However, if ReorderBufferCanStartStreaming() is false, the changes
> will be discarded regardless.
>
> What seems strange to me in this patch is truncating the changes of a
> large aborted transaction depending on whether we need to stream or
> spill but actually that should be completely independent IMHO. My
> concern is that if the largest transaction is aborted but isn’t yet
> streamable, we might end up picking the next transaction, which could
> be much smaller. This smaller transaction might not help us stay
> within the memory limit, and we could repeat this process for a few
> more transactions. In contrast, it might be more efficient to simply
> discard the large aborted transaction, even if it’s not streamable, to
> avoid this issue.
>

If the largest transaction is non-streamable, won't the transaction
returned by ReorderBufferLargestTXN() in the other case already
suffice the need?

--
With Regards,
Amit Kapila.

Re: Skip collecting decoded changes of already-aborted transactions

From

Amit Kapila

Date:

10 December 2024, 08:49:47

On Tue, Dec 10, 2024 at 10:39 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> 5.
> +/*
> + * Check the transaction status by looking CLOG and discard all changes if
> + * the transaction is aborted. The transaction status is cached in
> + * txn->txn_flags so we can skip future changes and avoid CLOG lookups on the
> + * next call. Return true if the transaction is aborted, otherwise return
> + * false.
> + *
> + * When the 'debug_logical_replication_streaming' is set to "immediate", we
> + * don't check the transaction status, meaning the caller will always process
> + * this transaction.
> + */
> +static bool
> +ReorderBufferTruncateTXNIfAborted(ReorderBuffer *rb, ReorderBufferTXN *txn)
> +{
>
> I think this function is being invoked to mark a sub-transaction as
> aborted. It is better to explain in comments how it interacts with
> sub-transactions, why it is okay to mark them as aborted, and how the
> other parts of the system interact with it.
>

The current name suggests that the main purpose is to truncate the txn
which is okay but wouldn't it be better to name on the lines of
ReorderBufferCheckAndTruncateAbortedTXN()?

In the following comment, can we move 'Return ...' to the next line to
make the return values from the function clear?
+ * next call. Return true if the transaction is aborted, otherwise return
+ * false.

--
With Regards,
Amit Kapila.

Re: Skip collecting decoded changes of already-aborted transactions

From

Dilip Kumar

Date:

10 December 2024, 09:18:44

On Tue, Dec 10, 2024 at 11:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Dec 10, 2024 at 10:59 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Tue, Nov 26, 2024 at 3:02 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > >
> > > I've attached a new version patch that incorporates all comments I got so far.
> > >
> > > I think the patch is in good shape but I'm considering whether we
> > > might want to call ReorderBufferToastReset() after truncating all
> > > changes, in ReorderBufferTruncateTXNIfAborted() just in case. Will
> > > investigate further.
> > >
> >
> > There’s something that seems a bit odd to me. Consider the case where
> > the largest transaction(s) are aborted. If
> > ReorderBufferCanStartStreaming() returns true, the changes from this
> > transaction will only be discarded if it's a streamable transaction.
> > However, if ReorderBufferCanStartStreaming() is false, the changes
> > will be discarded regardless.
> >
> > What seems strange to me in this patch is truncating the changes of a
> > large aborted transaction depending on whether we need to stream or
> > spill but actually that should be completely independent IMHO. My
> > concern is that if the largest transaction is aborted but isn’t yet
> > streamable, we might end up picking the next transaction, which could
> > be much smaller. This smaller transaction might not help us stay
> > within the memory limit, and we could repeat this process for a few
> > more transactions. In contrast, it might be more efficient to simply
> > discard the large aborted transaction, even if it’s not streamable, to
> > avoid this issue.
> >
>
> If the largest transaction is non-streamable, won't the transaction
> returned by ReorderBufferLargestTXN() in the other case already
> suffice the need?

I see your point, but I don’t think it’s quite the same. When
ReorderBufferCanStartStreaming() is true, the function
ReorderBufferLargestStreamableTopTXN() looks for the largest
transaction among those that have a base_snapshot. So, if the largest
transaction is aborted but hasn’t yet received a base_snapshot, it
will instead select the largest transaction that does have a
base_snapshot, which could be significantly smaller than the largest
aborted transaction.

I’m not saying this is a very common scenario, but I do feel that the
logic behind truncating the largest transaction doesn’t seem entirely
consistent. However, maybe this isn't a major issue. We could justify
the current behavior by saying that before picking any transaction for
streaming or spilling, we first check whether it has been aborted.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Skip collecting decoded changes of already-aborted transactions

From

Masahiko Sawada

Date:

11 December 2024, 00:48:21

On Mon, Dec 9, 2024 at 10:19 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Dec 10, 2024 at 11:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Dec 10, 2024 at 10:59 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Tue, Nov 26, 2024 at 3:02 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > >
> > > > I've attached a new version patch that incorporates all comments I got so far.
> > > >
> > > > I think the patch is in good shape but I'm considering whether we
> > > > might want to call ReorderBufferToastReset() after truncating all
> > > > changes, in ReorderBufferTruncateTXNIfAborted() just in case. Will
> > > > investigate further.
> > > >
> > >
> > > There’s something that seems a bit odd to me. Consider the case where
> > > the largest transaction(s) are aborted. If
> > > ReorderBufferCanStartStreaming() returns true, the changes from this
> > > transaction will only be discarded if it's a streamable transaction.
> > > However, if ReorderBufferCanStartStreaming() is false, the changes
> > > will be discarded regardless.
> > >
> > > What seems strange to me in this patch is truncating the changes of a
> > > large aborted transaction depending on whether we need to stream or
> > > spill but actually that should be completely independent IMHO. My
> > > concern is that if the largest transaction is aborted but isn’t yet
> > > streamable, we might end up picking the next transaction, which could
> > > be much smaller. This smaller transaction might not help us stay
> > > within the memory limit, and we could repeat this process for a few
> > > more transactions. In contrast, it might be more efficient to simply
> > > discard the large aborted transaction, even if it’s not streamable, to
> > > avoid this issue.
> > >
> >
> > If the largest transaction is non-streamable, won't the transaction
> > returned by ReorderBufferLargestTXN() in the other case already
> > suffice the need?
>
> I see your point, but I don’t think it’s quite the same. When
> ReorderBufferCanStartStreaming() is true, the function
> ReorderBufferLargestStreamableTopTXN() looks for the largest
> transaction among those that have a base_snapshot. So, if the largest
> transaction is aborted but hasn’t yet received a base_snapshot, it
> will instead select the largest transaction that does have a
> base_snapshot, which could be significantly smaller than the largest
> aborted transaction.

IIUC the transaction entries in reorderbuffer have the base snapshot
before decoding the first change (see SnapBuildProcessChange()). In
which case the transaction doesn't have the base snapshot and has the
largest amount of changes? Subtransaction entries could transfer its
base snapshot to its parent transaction entry but such subtransactions
will be picked by ReorderBufferLargestTXN().

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Skip collecting decoded changes of already-aborted transactions

From

Dilip Kumar

Date:

11 December 2024, 05:51:27

On Wed, Dec 11, 2024 at 3:18 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Dec 9, 2024 at 10:19 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

> > >
> > > If the largest transaction is non-streamable, won't the transaction
> > > returned by ReorderBufferLargestTXN() in the other case already
> > > suffice the need?
> >
> > I see your point, but I don’t think it’s quite the same. When
> > ReorderBufferCanStartStreaming() is true, the function
> > ReorderBufferLargestStreamableTopTXN() looks for the largest
> > transaction among those that have a base_snapshot. So, if the largest
> > transaction is aborted but hasn’t yet received a base_snapshot, it
> > will instead select the largest transaction that does have a
> > base_snapshot, which could be significantly smaller than the largest
> > aborted transaction.
>
> IIUC the transaction entries in reorderbuffer have the base snapshot
> before decoding the first change (see SnapBuildProcessChange()). In
> which case the transaction doesn't have the base snapshot and has the
> largest amount of changes? Subtransaction entries could transfer its
> base snapshot to its parent transaction entry but such subtransactions
> will be picked by ReorderBufferLargestTXN().
>
IIRC, there could be cases where reorder buffers of transactions can
grow in size without having a base snapshot, I think transactions
doing DDLs and generating a lot of INVALIDATION messages could fall in
such a category.  And that was one of the reasons why we were using
txns_by_base_snapshot_lsn inside
ReorderBufferLargestStreamableTopTXN().

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Skip collecting decoded changes of already-aborted transactions

From

Amit Kapila

Date:

12 December 2024, 08:38:33

On Wed, Dec 11, 2024 at 8:21 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Dec 11, 2024 at 3:18 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Dec 9, 2024 at 10:19 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> > > >
> > > > If the largest transaction is non-streamable, won't the transaction
> > > > returned by ReorderBufferLargestTXN() in the other case already
> > > > suffice the need?
> > >
> > > I see your point, but I don’t think it’s quite the same. When
> > > ReorderBufferCanStartStreaming() is true, the function
> > > ReorderBufferLargestStreamableTopTXN() looks for the largest
> > > transaction among those that have a base_snapshot. So, if the largest
> > > transaction is aborted but hasn’t yet received a base_snapshot, it
> > > will instead select the largest transaction that does have a
> > > base_snapshot, which could be significantly smaller than the largest
> > > aborted transaction.
> >
> > IIUC the transaction entries in reorderbuffer have the base snapshot
> > before decoding the first change (see SnapBuildProcessChange()). In
> > which case the transaction doesn't have the base snapshot and has the
> > largest amount of changes? Subtransaction entries could transfer its
> > base snapshot to its parent transaction entry but such subtransactions
> > will be picked by ReorderBufferLargestTXN().
> >
> IIRC, there could be cases where reorder buffers of transactions can
> grow in size without having a base snapshot, I think transactions
> doing DDLs and generating a lot of INVALIDATION messages could fall in
> such a category.
>

Are we recording such changes in the reorder buffer? If so, can you
please share how? AFAICU, the main idea behind skipping aborts is to
avoid sending a lot of data to the client that later needs to be
discarded or cases where we spent resources/time spilling the changes
that later need to be discarded. In that vein, the current idea of the
patch where it truncates and skips aborted xacts before streaming or
spilling them sounds reasonable.

--
With Regards,
Amit Kapila.

Re: Skip collecting decoded changes of already-aborted transactions

From

Dilip Kumar

Date:

12 December 2024, 09:01:03

On Thu, Dec 12, 2024 at 11:08 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Dec 11, 2024 at 8:21 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Wed, Dec 11, 2024 at 3:18 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Mon, Dec 9, 2024 at 10:19 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > > > >
> > > > > If the largest transaction is non-streamable, won't the transaction
> > > > > returned by ReorderBufferLargestTXN() in the other case already
> > > > > suffice the need?
> > > >
> > > > I see your point, but I don’t think it’s quite the same. When
> > > > ReorderBufferCanStartStreaming() is true, the function
> > > > ReorderBufferLargestStreamableTopTXN() looks for the largest
> > > > transaction among those that have a base_snapshot. So, if the largest
> > > > transaction is aborted but hasn’t yet received a base_snapshot, it
> > > > will instead select the largest transaction that does have a
> > > > base_snapshot, which could be significantly smaller than the largest
> > > > aborted transaction.
> > >
> > > IIUC the transaction entries in reorderbuffer have the base snapshot
> > > before decoding the first change (see SnapBuildProcessChange()). In
> > > which case the transaction doesn't have the base snapshot and has the
> > > largest amount of changes? Subtransaction entries could transfer its
> > > base snapshot to its parent transaction entry but such subtransactions
> > > will be picked by ReorderBufferLargestTXN().
> > >
> > IIRC, there could be cases where reorder buffers of transactions can
> > grow in size without having a base snapshot, I think transactions
> > doing DDLs and generating a lot of INVALIDATION messages could fall in
> > such a category.
> >
>
> Are we recording such changes in the reorder buffer? If so, can you
> please share how?

xact_decode, do add the XLOG_XACT_INVALIDATIONS in the reorder buffer
and for such changes we don't call SnapBuildProcessChange() that means
it is possible to collect such changes in reorder buffer without
setting the base_snapshot

  AFAICU, the main idea behind skipping aborts is to
> avoid sending a lot of data to the client that later needs to be
> discarded or cases where we spent resources/time spilling the changes
> that later need to be discarded. In that vein, the current idea of the
> patch where it truncates and skips aborted xacts before streaming or
> spilling them sounds reasonable.

I believe in one of my previous responses (a few emails above), I
agreed that it's a reasonable goal to check for aborted transactions
just before spilling or streaming, and if we detect an aborted
transaction, we can avoid streaming/spilling and simply discard the
changes. However, I wanted to make a point that if we have a large
aborted transaction without a base snapshot (assuming that's
possible), we might end up streaming many small transactions to stay
under the memory limit. Even though we try to stay within the limit,
we still might not succeed because the main issue is the large aborted
transaction, which doesn't have a base snapshot.

So, instead of streaming many small transactions, if we had selected
the largest transaction first and checked if it was aborted, we could
have avoided streaming all those smaller transactions. I agree this is
a hypothetical scenario and may not be worth optimizing, and that's
completely fair. I just wanted to clarify the point I raised when I
first started reviewing this patch.

I haven't tried it myself, but I believe this scenario could be
created by starting a transaction that performs multiple DDLs and then
ultimately gets aborted.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Skip collecting decoded changes of already-aborted transactions

From

Masahiko Sawada

Date:

13 December 2024, 00:30:29

On Wed, Dec 11, 2024 at 10:01 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Dec 12, 2024 at 11:08 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Dec 11, 2024 at 8:21 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Wed, Dec 11, 2024 at 3:18 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > On Mon, Dec 9, 2024 at 10:19 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > > > >
> > > > > > If the largest transaction is non-streamable, won't the transaction
> > > > > > returned by ReorderBufferLargestTXN() in the other case already
> > > > > > suffice the need?
> > > > >
> > > > > I see your point, but I don’t think it’s quite the same. When
> > > > > ReorderBufferCanStartStreaming() is true, the function
> > > > > ReorderBufferLargestStreamableTopTXN() looks for the largest
> > > > > transaction among those that have a base_snapshot. So, if the largest
> > > > > transaction is aborted but hasn’t yet received a base_snapshot, it
> > > > > will instead select the largest transaction that does have a
> > > > > base_snapshot, which could be significantly smaller than the largest
> > > > > aborted transaction.
> > > >
> > > > IIUC the transaction entries in reorderbuffer have the base snapshot
> > > > before decoding the first change (see SnapBuildProcessChange()). In
> > > > which case the transaction doesn't have the base snapshot and has the
> > > > largest amount of changes? Subtransaction entries could transfer its
> > > > base snapshot to its parent transaction entry but such subtransactions
> > > > will be picked by ReorderBufferLargestTXN().
> > > >
> > > IIRC, there could be cases where reorder buffers of transactions can
> > > grow in size without having a base snapshot, I think transactions
> > > doing DDLs and generating a lot of INVALIDATION messages could fall in
> > > such a category.
> > >
> >
> > Are we recording such changes in the reorder buffer? If so, can you
> > please share how?
>
> xact_decode, do add the XLOG_XACT_INVALIDATIONS in the reorder buffer
> and for such changes we don't call SnapBuildProcessChange() that means
> it is possible to collect such changes in reorder buffer without
> setting the base_snapshot

DDLs write not only XLOG_XACT_INVALIDATIONS but also system catalog
changes. I think that when decoding these system catalog changes, we
end up calling SnapBuildProcessChange(). I understand that decoding
XLOG_XACT_INVALIDATIONS doesn't call SnapBuildProcessChange() but
queues invalidation messages to the reorderbuffer, but I still don't
understand cases where a transaction entry is quite big and has only a
lot of invalidation messages.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Skip collecting decoded changes of already-aborted transactions

From

Dilip Kumar

Date:

15 December 2024, 08:15:17

On Fri, Dec 13, 2024 at 3:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> DDLs write not only XLOG_XACT_INVALIDATIONS but also system catalog
> changes. I think that when decoding these system catalog changes, we
> end up calling SnapBuildProcessChange(). I understand that decoding
> XLOG_XACT_INVALIDATIONS doesn't call SnapBuildProcessChange() but
> queues invalidation messages to the reorderbuffer, but I still don't
> understand cases where a transaction entry is quite big and has only a
> lot of invalidation messages.

You are right that SnapBuildProcessChange() will be called when there
are changes in the system catalog.  However it is very much possible
that when you are processing the system catalog operation the
snapbuild state is not yet SNAPBUILD_FULL_SNAPSHOT and by the time you
reach to XLOG_XACT_INVALIDATIONS some concurrent transaction get
committed and snapbuild state change to SNAPBUILD_FULL_SNAPSHOT.
However, I need to agree that such a transaction can not really be
very large because this can contain Invalidation messages at max from
a single DDL command so maybe we don't need to do anything special for
them and we can go ahead with the approach you followed in the current
patch.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Skip collecting decoded changes of already-aborted transactions

From

Amit Kapila

Date:

17 December 2024, 14:25:07

On Sun, Dec 15, 2024 at 10:45 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Dec 13, 2024 at 3:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > DDLs write not only XLOG_XACT_INVALIDATIONS but also system catalog
> > changes. I think that when decoding these system catalog changes, we
> > end up calling SnapBuildProcessChange(). I understand that decoding
> > XLOG_XACT_INVALIDATIONS doesn't call SnapBuildProcessChange() but
> > queues invalidation messages to the reorderbuffer, but I still don't
> > understand cases where a transaction entry is quite big and has only a
> > lot of invalidation messages.
>
> You are right that SnapBuildProcessChange() will be called when there
> are changes in the system catalog.  However it is very much possible
> that when you are processing the system catalog operation the
> snapbuild state is not yet SNAPBUILD_FULL_SNAPSHOT and by the time you
> reach to XLOG_XACT_INVALIDATIONS some concurrent transaction get
> committed and snapbuild state change to SNAPBUILD_FULL_SNAPSHOT.
> However, I need to agree that such a transaction can not really be
> very large because this can contain Invalidation messages at max from
> a single DDL command so maybe we don't need to do anything special for
> them and we can go ahead with the approach you followed in the current
> patch.
>

Thanks, I also think we can proceed with the current approach. So, the
pending task is to address a few comments raised by me.

--
With Regards,
Amit Kapila.

Re: Skip collecting decoded changes of already-aborted transactions

From

Amit Kapila

Date:

19 December 2024, 13:56:34

On Thu, Dec 19, 2024 at 7:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Dec 9, 2024 at 9:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Nov 26, 2024 at 3:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > I've attached a new version patch that incorporates all comments I got so far.
> > >
> >
> > Review comments:
>
> Thank you for reviewing the patch!
>
> > ===============
> > 1.
> > + * The given transaction is marked as streamed if appropriate and the caller
> > + * requested it by passing 'mark_txn_streaming' as true.
> > + *
> >   * 'txn_prepared' indicates that we have decoded the transaction at prepare
> >   * time.
> >   */
> >  static void
> > -ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
> > bool txn_prepared)
> > +ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
> > bool txn_prepared,
> > + bool mark_txn_streaming)
> >  {
> > ...
> >   }
> > + else if (mark_txn_streaming && (rbtxn_is_toptxn(txn) ||
> > (txn->nentries_mem != 0)))
> > + {
> > + /*
> > + * Mark the transaction as streamed, if appropriate.
> >
> > The comments related to the above changes don't clarify in which cases
> > the 'mark_txn_streaming' should be set. Before this patch, it was
> > clear from the comments and code about the cases where we would decide
> > to mark it as streamed.
>
> I think we can rename it to txn_streaming for consistency with
> txn_prepared. I've changed the comment for that.
>

@@ -2067,7 +2143,7 @@ ReorderBufferResetTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
    ReorderBufferChange *specinsert)
 {
  /* Discard the changes that we just streamed */
- ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn));
+ ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), true);

@@ -1924,7 +2000,7 @@ ReorderBufferStreamCommit(ReorderBuffer *rb,
ReorderBufferTXN *txn)
  * full cleanup will happen as part of the COMMIT PREPAREDs, so now
  * just truncate txn by removing changes and tuplecids.
  */
- ReorderBufferTruncateTXN(rb, txn, true);
+ ReorderBufferTruncateTXN(rb, txn, true, true);

In both the above places, the patch unconditionally passes the
'txn_streaming' even for prepared transactions when it wouldn't be a
streaming xact. Inside the function, the patch handled that by first
checking whether the transaction is prepared (txn_prepared). So, the
logic will work but the function signature and the way its callers are
using make it difficult to use and extend in the future.

I think for the first case, we should get the streaming parameter in
ReorderBufferResetTXN(), and for the second case
ReorderBufferStreamCommit(), we should pass it as false because by
that time transaction is already streamed and prepared. We are
invoking it for cleanup. Even when we call ReorderBufferTruncateTXN()
from ReorderBufferCheckAndTruncateAbortedTXN(), it will be better to
write a comment at the caller about why we are passing this parameter
as false.

>
> >
> > 4.
> > + /*
> > + * Remember if the transaction is already aborted so we can detect when
> > + * the transaction is concurrently aborted during the replay.
> > + */
> > + already_aborted = rbtxn_is_aborted(txn);
> > +
> >   ReorderBufferReplay(txn, rb, xid, txn->final_lsn, txn->end_lsn,
> >   txn->xact_time.prepare_time, txn->origin_id, txn->origin_lsn);
> >
> > @@ -2832,10 +2918,10 @@ ReorderBufferPrepare(ReorderBuffer *rb,
> > TransactionId xid,
> >   * when rollback prepared is decoded and sent, the downstream should be
> >   * able to rollback such a xact. See comments atop DecodePrepare.
> >   *
> > - * Note, for the concurrent_abort + streaming case a stream_prepare was
> > + * Note, for the concurrent abort + streaming case a stream_prepare was
> >   * already sent within the ReorderBufferReplay call above.
> >   */
> > - if (txn->concurrent_abort && !rbtxn_is_streamed(txn))
> > + if (!already_aborted && rbtxn_is_aborted(txn) && !rbtxn_is_streamed(txn))
> >   rb->prepare(rb, txn, txn->final_lsn);
> >
> > It is not clear from the comments how the 'already_aborted' is
> > handled. I think after this patch we would have already truncated all
> > its changes. If so, why do we need to try to replay the changes of
> > such a xact?
>
> I used ReorderBufferReplay() for convenience; it sends begin_prepare()
> and prepare() appropriately, handles streaming-prepared transactions,
> and updates statistics etc. But as you pointed out, it would not be
> necessary to set up a historical snapshot etc. I agree that we don't
> need to try replaying such aborted transactions but I'd like to
> confirm we don't really need to execute invalidation messages evein in
> aborted transactions.
>

We need to execute invalidations if we have loaded any cache entries,
for example in the case of streaming. See comments in the function
ReorderBufferAbort(). However, I find both the current changes and the
previous patch a bit difficult to follow. How about if we instead
invent a flag like RBTXN_SENT_PREPARE or something like that and then
use that flag to decide whether to send prepare in
ReorderBufferPrepare(). Then add comments for the cases in which
prepare will be sent from ReorderBufferPrepare().

*
+ * Since we don't check the transaction status while replaying the
+ * transaction, we don't need to reset toast reconstruction data here.
+ */
+ ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), false);
+
+ /* All changes should be discarded */
+ Assert(txn->size == 0);

Can we expect the size to be zero without resetting the toast data? In
ReorderBufferToastReset(), we call ReorderBufferReturnChange() which
reduces the change size. So, won't that size still be accounted for in
txn?

--
With Regards,
Amit Kapila.

Re: Skip collecting decoded changes of already-aborted transactions

From

Masahiko Sawada

Date:

19 December 2024, 22:11:50

On Thu, Dec 19, 2024 at 2:56 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Dec 19, 2024 at 7:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Dec 9, 2024 at 9:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Tue, Nov 26, 2024 at 3:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > I've attached a new version patch that incorporates all comments I got so far.
> > > >
> > >
> > > Review comments:
> >
> > Thank you for reviewing the patch!
> >
> > > ===============
> > > 1.
> > > + * The given transaction is marked as streamed if appropriate and the caller
> > > + * requested it by passing 'mark_txn_streaming' as true.
> > > + *
> > >   * 'txn_prepared' indicates that we have decoded the transaction at prepare
> > >   * time.
> > >   */
> > >  static void
> > > -ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
> > > bool txn_prepared)
> > > +ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
> > > bool txn_prepared,
> > > + bool mark_txn_streaming)
> > >  {
> > > ...
> > >   }
> > > + else if (mark_txn_streaming && (rbtxn_is_toptxn(txn) ||
> > > (txn->nentries_mem != 0)))
> > > + {
> > > + /*
> > > + * Mark the transaction as streamed, if appropriate.
> > >
> > > The comments related to the above changes don't clarify in which cases
> > > the 'mark_txn_streaming' should be set. Before this patch, it was
> > > clear from the comments and code about the cases where we would decide
> > > to mark it as streamed.
> >
> > I think we can rename it to txn_streaming for consistency with
> > txn_prepared. I've changed the comment for that.
> >
>
> @@ -2067,7 +2143,7 @@ ReorderBufferResetTXN(ReorderBuffer *rb,
> ReorderBufferTXN *txn,
>     ReorderBufferChange *specinsert)
>  {
>   /* Discard the changes that we just streamed */
> - ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn));
> + ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), true);
>
> @@ -1924,7 +2000,7 @@ ReorderBufferStreamCommit(ReorderBuffer *rb,
> ReorderBufferTXN *txn)
>   * full cleanup will happen as part of the COMMIT PREPAREDs, so now
>   * just truncate txn by removing changes and tuplecids.
>   */
> - ReorderBufferTruncateTXN(rb, txn, true);
> + ReorderBufferTruncateTXN(rb, txn, true, true);
>
> In both the above places, the patch unconditionally passes the
> 'txn_streaming' even for prepared transactions when it wouldn't be a
> streaming xact. Inside the function, the patch handled that by first
> checking whether the transaction is prepared (txn_prepared). So, the
> logic will work but the function signature and the way its callers are
> using make it difficult to use and extend in the future.
>

Valid concern.

> I think for the first case, we should get the streaming parameter in
> ReorderBufferResetTXN(),

I think we cannot pass 'rbtxn_is_streamed(txn)' to
ReorderBufferTruncateTXN() in the first case. ReorderBufferResetTXN()
is called to handle the concurrent abort of the streaming transaction
but the transaction might not have been marked as streamed at that
time. Since ReorderBufferTruncateTXN() is responsible for both
discarding changes and marking the transaction as streamed, we need to
unconditionally pass txn_streaming = true in this case.

> and for the second case
> ReorderBufferStreamCommit(), we should pass it as false because by
> that time transaction is already streamed and prepared.  We are
> invoking it for cleanup.

Agreed.

> Even when we call ReorderBufferTruncateTXN()
> from ReorderBufferCheckAndTruncateAbortedTXN(), it will be better to
> write a comment at the caller about why we are passing this parameter
> as false.

Agreed.

On second thoughts, I think the confusion related to txn_streaming
came from the fact that ReorderBufferTruncateTXN() does both
discarding changes and marking the transaction as streamed. If we make
the function do just discarding changes, we don't need to introduce
the txn_streaming function argument. Instead, we need to have a
separate function to mark the transaction as streamed and call it
before ReorderBufferTruncateTXN() where appropriate. And
ReorderBufferCheckAndTruncateAbortedTXN() just calls
ReorderBufferTruncateTXN().

>
> >
> > >
> > > 4.
> > > + /*
> > > + * Remember if the transaction is already aborted so we can detect when
> > > + * the transaction is concurrently aborted during the replay.
> > > + */
> > > + already_aborted = rbtxn_is_aborted(txn);
> > > +
> > >   ReorderBufferReplay(txn, rb, xid, txn->final_lsn, txn->end_lsn,
> > >   txn->xact_time.prepare_time, txn->origin_id, txn->origin_lsn);
> > >
> > > @@ -2832,10 +2918,10 @@ ReorderBufferPrepare(ReorderBuffer *rb,
> > > TransactionId xid,
> > >   * when rollback prepared is decoded and sent, the downstream should be
> > >   * able to rollback such a xact. See comments atop DecodePrepare.
> > >   *
> > > - * Note, for the concurrent_abort + streaming case a stream_prepare was
> > > + * Note, for the concurrent abort + streaming case a stream_prepare was
> > >   * already sent within the ReorderBufferReplay call above.
> > >   */
> > > - if (txn->concurrent_abort && !rbtxn_is_streamed(txn))
> > > + if (!already_aborted && rbtxn_is_aborted(txn) && !rbtxn_is_streamed(txn))
> > >   rb->prepare(rb, txn, txn->final_lsn);
> > >
> > > It is not clear from the comments how the 'already_aborted' is
> > > handled. I think after this patch we would have already truncated all
> > > its changes. If so, why do we need to try to replay the changes of
> > > such a xact?
> >
> > I used ReorderBufferReplay() for convenience; it sends begin_prepare()
> > and prepare() appropriately, handles streaming-prepared transactions,
> > and updates statistics etc. But as you pointed out, it would not be
> > necessary to set up a historical snapshot etc. I agree that we don't
> > need to try replaying such aborted transactions but I'd like to
> > confirm we don't really need to execute invalidation messages evein in
> > aborted transactions.
> >
>
> We need to execute invalidations if we have loaded any cache entries,
> for example in the case of streaming. See comments in the function
> ReorderBufferAbort(). However, I find both the current changes and the
> previous patch a bit difficult to follow. How about if we instead
> invent a flag like RBTXN_SENT_PREPARE or something like that and then
> use that flag to decide whether to send prepare in
> ReorderBufferPrepare(). Then add comments for the cases in which
> prepare will be sent from ReorderBufferPrepare().

The idea of using RBTXN_SENT_PREPARE sounds good to me. I'll use it.

>
> *
> + * Since we don't check the transaction status while replaying the
> + * transaction, we don't need to reset toast reconstruction data here.
> + */
> + ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), false);
> +
> + /* All changes should be discarded */
> + Assert(txn->size == 0);
>
> Can we expect the size to be zero without resetting the toast data? In
> ReorderBufferToastReset(), we call ReorderBufferReturnChange() which
> reduces the change size. So, won't that size still be accounted for in
> txn?

IIUC the toast reconstruction data is created only while replaying the
transaction but the ReorderBufferCheckAndTruncateAbortedTXN() is not
called during that. So I think any toast data should not be
accumulated at that time.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Skip collecting decoded changes of already-aborted transactions

From

Amit Kapila

Date:

20 December 2024, 08:36:47

On Fri, Dec 20, 2024 at 12:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Dec 19, 2024 at 2:56 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > @@ -2067,7 +2143,7 @@ ReorderBufferResetTXN(ReorderBuffer *rb,
> > ReorderBufferTXN *txn,
> >     ReorderBufferChange *specinsert)
> >  {
> >   /* Discard the changes that we just streamed */
> > - ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn));
> > + ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), true);
> >
> > @@ -1924,7 +2000,7 @@ ReorderBufferStreamCommit(ReorderBuffer *rb,
> > ReorderBufferTXN *txn)
> >   * full cleanup will happen as part of the COMMIT PREPAREDs, so now
> >   * just truncate txn by removing changes and tuplecids.
> >   */
> > - ReorderBufferTruncateTXN(rb, txn, true);
> > + ReorderBufferTruncateTXN(rb, txn, true, true);
> >
> > In both the above places, the patch unconditionally passes the
> > 'txn_streaming' even for prepared transactions when it wouldn't be a
> > streaming xact. Inside the function, the patch handled that by first
> > checking whether the transaction is prepared (txn_prepared). So, the
> > logic will work but the function signature and the way its callers are
> > using make it difficult to use and extend in the future.
> >
>
> Valid concern.
>
> > I think for the first case, we should get the streaming parameter in
> > ReorderBufferResetTXN(),
>
> I think we cannot pass 'rbtxn_is_streamed(txn)' to
> ReorderBufferTruncateTXN() in the first case. ReorderBufferResetTXN()
> is called to handle the concurrent abort of the streaming transaction
> but the transaction might not have been marked as streamed at that
> time. Since ReorderBufferTruncateTXN() is responsible for both
> discarding changes and marking the transaction as streamed, we need to
> unconditionally pass txn_streaming = true in this case.
>

Can't we use 'stream_started' variable available at the call site of
ReorderBufferResetTXN() for our purpose?

>
> On second thoughts, I think the confusion related to txn_streaming
> came from the fact that ReorderBufferTruncateTXN() does both
> discarding changes and marking the transaction as streamed. If we make
> the function do just discarding changes, we don't need to introduce
> the txn_streaming function argument. Instead, we need to have a
> separate function to mark the transaction as streamed and call it
> before ReorderBufferTruncateTXN() where appropriate. And
> ReorderBufferCheckAndTruncateAbortedTXN() just calls
> ReorderBufferTruncateTXN().
>

That sounds good to me. IIRC, initially, ReorderBufferTruncateTXN()
was used to truncate changes only for streaming transactions. Later,
it evolved for prepared facts and now for facts where we explicitly
detect whether they are aborted. So, I think it makes sense to improve
it by following your suggestion.

>
> >
> > *
> > + * Since we don't check the transaction status while replaying the
> > + * transaction, we don't need to reset toast reconstruction data here.
> > + */
> > + ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), false);
> > +
> > + /* All changes should be discarded */
> > + Assert(txn->size == 0);
> >
> > Can we expect the size to be zero without resetting the toast data? In
> > ReorderBufferToastReset(), we call ReorderBufferReturnChange() which
> > reduces the change size. So, won't that size still be accounted for in
> > txn?
>
> IIUC the toast reconstruction data is created only while replaying the
> transaction but the ReorderBufferCheckAndTruncateAbortedTXN() is not
> called during that. So I think any toast data should not be
> accumulated at that time.
>

How about the case where in the first pass, we streamed the
transaction partially, where it has reconstructed toast data, and
then, in the second pass, when memory becomes full, the reorder buffer
contains some partial data, due to which it tries to spill the data
and finds that the transaction is aborted? I could be wrong here
because I haven't tried to test this code path, but I see that it is
theoretically possible.

--
With Regards,
Amit Kapila.

Re: Skip collecting decoded changes of already-aborted transactions

From

Peter Smith

Date:

07 January, 04:52:12

Hi Sawada-San.

Here are some review comments for the patch v12-0001.

======
.../replication/logical/reorderbuffer.c

ReorderBufferCheckAndTruncateAbortedTXN:

1.
+/*
+ * Check the transaction status by looking CLOG and discard all changes if
+ * the transaction is aborted. The transaction status is cached in
+ * txn->txn_flags so we can skip future changes and avoid CLOG lookups on the
+ * next call.
+ *
+ * Return true if the transaction is aborted, otherwise return false.
+ *
+ * When the 'debug_logical_replication_streaming' is set to "immediate", we
+ * don't check the transaction status, meaning the caller will always process
+ * this transaction.
+ */

Typo "by looking CLOG".

It should be something like "by CLOG lookup".

~~~

2.
+ /* Quick return if the transaction status is already known */
+ if (rbtxn_is_committed(txn))
+ return false;
+ if (rbtxn_is_aborted(txn))
+ {
+ /* Already-aborted transactions should not have any changes */
+ Assert(txn->size == 0);
+
+ return true;
+ }
+

Consider changing that 2nd 'if' to be 'else if', because then that
will make it more obvious that the earlier single line comment "Quick
return if...", in fact applies to both these conditions.

Alternatively, make that a block comment and add some blank lines like:

+ /*
+    * Quick returns if the transaction status is already known.
+    */
+
+ if (rbtxn_is_committed(txn))
+ return false;
+
+ if (rbtxn_is_aborted(txn))
+ {
+ /* Already-aborted transactions should not have any changes */
+ Assert(txn->size == 0);
+
+ return true;
+ }

~~~

3.
+ if (TransactionIdDidCommit(txn->xid))
+ {
+ /*
+ * Remember the transaction is committed so that we can skip CLOG
+ * check next time, avoiding the pressure on CLOG lookup.
+ */
+ Assert(!rbtxn_is_aborted(txn));
+ txn->txn_flags |= RBTXN_IS_COMMITTED;
+ return false;
+ }
+
+ /*
+ * The transaction aborted. We discard the changes we've collected so far
+ * and toast reconstruction data. The full cleanup will happen as part of
+ * decoding ABORT record of this transaction.
+ */
+ ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn));
+ ReorderBufferToastReset(rb, txn);
+
+ /* All changes should be discarded */
+ Assert(txn->size == 0);
+
+ /*
+ * Mark the transaction as aborted so we can ignore future changes of this
+ * transaction.
+ */
+ Assert(!rbtxn_is_committed(txn));
+ txn->txn_flags |= RBTXN_IS_ABORTED;
+
+ return true;
+}

3a.
That whole last part related to "The transaction aborted", might be
clearer if the whole chunk of code was in an 'else' block from the
previous "if (TransactionIdDidCommit(txn->xid))".

~

3b.
"toast" is an acronym so it should be written in uppercase IMO.

~

3c.
The "and toast reconstruction data" seems to be missing a word/s. (??)
- "... and also discard TOAST reconstruction data"
- "... and reset TOAST reconstruction data"

~~~

ReorderBufferMaybeMarkTXNStreamed:

4.
+static void
+ReorderBufferMaybeMarkTXNStreamed(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{
+ /*
+ * The top-level transaction, is marked as streamed always, even if it
+ * does not contain any changes (that is, when all the changes are in
+ * subtransactions).
+ *
+ * For subtransactions, we only mark them as streamed when there are
+ * changes in them.
+ *
+ * We do it this way because of aborts - we don't want to send aborts for
+ * XIDs the downstream is not aware of. And of course, it always knows
+ * about the toplevel xact (we send the XID in all messages), but we never
+ * stream XIDs of empty subxacts.
+ */
+ if (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0))
+ txn->txn_flags |= RBTXN_IS_STREAMED;
+}

/the toplevel xact/the top-level xact/

~~~

5.
  /*
- * We send the prepare for the concurrently aborted xacts so that later
- * when rollback prepared is decoded and sent, the downstream should be
- * able to rollback such a xact. See comments atop DecodePrepare.
- *
- * Note, for the concurrent_abort + streaming case a stream_prepare was
- * already sent within the ReorderBufferReplay call above.
+ * Send a prepare if not yet. It happens if we detected the concurrent
+ * abort while replaying the non-streaming transaction.
  */

The first sentence "if not yet" seems incomplete/missing words.

SUGGESTION
Send a prepare if not already done so. This might occur if we had
detected a concurrent abort while replaying the non-streaming
transaction.

======
src/include/replication/reorderbuffer.h

6.
 #define RBTXN_PREPARE              0x0040
 #define RBTXN_SKIPPED_PREPARE   0x0080
 #define RBTXN_HAS_STREAMABLE_CHANGE 0x0100
+#define RBTXN_SENT_PREPARE 0x0200
+#define RBTXN_IS_COMMITTED 0x0400
+#define RBTXN_IS_ABORTED 0x0800

Something about this new RBTXN_SENT_PREPARE name seems inconsistent to me.

I feel there is now also some introduced ambiguity with these macros:

/* Has this transaction been prepared? */
#define rbtxn_prepared(txn) \
( \
((txn)->txn_flags & RBTXN_PREPARE) != 0 \
)

+/* Has a prepare or stream_prepare already been sent? */
+#define rbtxn_sent_prepare(txn) \
+( \
+ ((txn)->txn_flags & RBTXN_SENT_PREPARE) != 0 \
+)


e.g. It's also not clear from the comments what is the distinction
between the existing macro comment "Has this transaction been
prepared?" and the new macro comment "Has a prepare or stream_prepare
already been sent?".

Indeed, I was wondering if some of the places currently calling
"rbtxn_prepared(txn)" should now strictly be calling
"rbtxn_sent_prepared(txn)" macro instead?

IMO some minor renaming of the existing constants (and also their
associated macros) might help to make all this more coherent. For
example, perhaps like:

#define RBTXN_IS_PREPARE_NEEDED        0x0040
#define RBTXN_IS_PREPARE_SKIPPED   0x0080
#define RBTXN_IS_PREPARE_SENT 0x0200

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Skip collecting decoded changes of already-aborted transactions

From

Amit Kapila

Date:

27 January, 09:26:19

On Fri, Jan 24, 2025 at 12:38 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Jan 22, 2025 at 7:35 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > On Thu, Jan 23, 2025 at 2:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Wed, Jan 22, 2025 at 9:21 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > > >
> > > >
> > > > ======
> > > > Commit message
> > > >
> > > > typo /RBTXN_IS_PREAPRE/RBTXN_IS_PREPARE/
> > > >
>
> Will fix.
>
> > > >
> > > > Also, this code (below) seems to be treating those macros as
> > > > unrelated, but IIUC we know that rbtxn_skip_prepared(txn) is not
> > > > possible unless rbtxn_is_prepared(txn) is true.
> > > >
> > > > - if (rbtxn_prepared(txn) || rbtxn_skip_prepared(txn))
> > > > + if (rbtxn_is_prepared(txn) || rbtxn_skip_prepared(txn))
> > > >   continue;
>
> Right. We no longer need to check rbtxn_skip_prepared() here.
>
> > > >
> > > > ~~
> > > >
> > > > Furthermore, if we cannot infer that RBTXN_SKIPPED_PREPARE *must* also
> > > > be a prepared transaction, then why aren't the macros changed to match
> > > > that interpretation?
> > > >
> > > > e.g.
> > > >
> > > > /* prepare for this transaction skipped? */
> > > > #define rbtxn_skip_prepared(txn) \
> > > > ( \
> > > > ((txn)->txn_flags & RBTXN_IS_PREPARED != 0) && \
> > > > ((txn)->txn_flags & RBTXN_SKIPPED_PREPARE != 0) \
> > > > )
> > > >
> > > > /* Has a prepare or stream_prepare already been sent? */
> > > > #define rbtxn_sent_prepare(txn) \
> > > > ( \
> > > > ((txn)->txn_flags & RBTXN_IS_PREPARED != 0) && \
> > > > ((txn)->txn_flags & RBTXN_SENT_PREPARE != 0) \
> > > > )
> > > >
> > > > ~~~
> > > >
> > > > I think a to fix all this might be to enforce the RBTXN_IS_PREPARED
> > > > bitflag is set also for RBTXN_SKIPPED_PREPARE and RBTXN_SENT_PREPARE
> > > > constants, removing the ambiguity about how exactly to interpret those
> > > > two constants.
> > > >
> > > > e.g. something like
> > > >
> > > > #define RBTXN_IS_PREPARED 0x0040
> > > > #define RBTXN_SKIPPED_PREPARE    (0x0080 | RBTXN_IS_PREPARED)
> > > > #define RBTXN_SENT_PREPARE    (0x0200 | RBTXN_IS_PREPARED)
> > > >
> > >
> > > I think the better way would be to ensure that where we set
> > > RBTXN_SENT_PREPARE or RBTXN_SKIPPED_PREPARE, the transaction is a
> > > prepared one (RBTXN_IS_PREPARED must be already set). It should be
> > > already the case for RBTXN_SENT_PREPARE but we can ensure the same for
> > > RBTXN_SKIPPED_PREPARE as well.
>
> Since the patch already does "txn->txn_flags |= (RBTXN_IS_PREPARED |
> RBTXN_SKIPPED_PREPARE);", it's already ensured, no?
>

I mean to say that we add assert to ensure the same.

> I think we need to add both flags in ReorderBufferSkipPrepare(),
> because there is a case where a transaction might not be marked as
> RBTXN_IS_PREPARED here.
>

Are you talking about the case when it is invoked from
DecodePrepare()? I thought we would set the flag in that code path.

> > >
> > > Will that address your concern? Does anyone else have an opinion on this matter?
> >
> > Yes that would be OK, but should also add some clarifying comments in
> > the "reorderbuffer.h" like:
> >
> >  #define RBTXN_SKIPPED_PREPARE   0x0080 /* this flag can only be set
> > for RBTXN_IS_PREPARED transactions */
> >  #define RBTXN_SENT_PREPARE 0x0200 /* this flag can only be set for
> > RBTXN_IS_PREPARED transactions */
>
> I think the same is true for RBTXN_IS_SERIALIZED and
> RBTXN_IS_SERIALIZED_CLEAR; RBTXN_IS_SERIALIZED_CLEAR can only be set
> for RBTXN_IS_SERIALIZED transaction. Should we add some comments to
> them too? But I'm concerned about having too much explanation if we
> add descriptions to flags too while already having comments for
> corresponding macros.
>

Yeah, I am fine either way especially, if we decide to add asserts for
RBTXN_IS_PREPARED when we set those flags.

> Another way to ensure that is to convert these macros to inline
> functions and add an Assert() there, but it seems overkill.
>

True, but that would ensure, we won't make any coding mistakes which
Peter wants to ensure by writing additional comments but asserting is
probably a better way.

--
With Regards,
Amit Kapila.

Re: Skip collecting decoded changes of already-aborted transactions

From

Peter Smith

Date:

28 January, 06:01:20

On Tue, Jan 28, 2025 at 4:31 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Sun, Jan 26, 2025 at 10:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Jan 24, 2025 at 12:38 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Wed, Jan 22, 2025 at 7:35 PM Peter Smith <smithpb2250@gmail.com> wrote:
> > > >
> > > > On Thu, Jan 23, 2025 at 2:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > On Wed, Jan 22, 2025 at 9:21 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > > > > >
> > > > > >
> > > > > > ======
> > > > > > Commit message
> > > > > >
> > > > > > typo /RBTXN_IS_PREAPRE/RBTXN_IS_PREPARE/
> > > > > >
> > >
> > > Will fix.
> > >
> > > > > >
> > > > > > Also, this code (below) seems to be treating those macros as
> > > > > > unrelated, but IIUC we know that rbtxn_skip_prepared(txn) is not
> > > > > > possible unless rbtxn_is_prepared(txn) is true.
> > > > > >
> > > > > > - if (rbtxn_prepared(txn) || rbtxn_skip_prepared(txn))
> > > > > > + if (rbtxn_is_prepared(txn) || rbtxn_skip_prepared(txn))
> > > > > >   continue;
> > >
> > > Right. We no longer need to check rbtxn_skip_prepared() here.
> > >
> > > > > >
> > > > > > ~~
> > > > > >
> > > > > > Furthermore, if we cannot infer that RBTXN_SKIPPED_PREPARE *must* also
> > > > > > be a prepared transaction, then why aren't the macros changed to match
> > > > > > that interpretation?
> > > > > >
> > > > > > e.g.
> > > > > >
> > > > > > /* prepare for this transaction skipped? */
> > > > > > #define rbtxn_skip_prepared(txn) \
> > > > > > ( \
> > > > > > ((txn)->txn_flags & RBTXN_IS_PREPARED != 0) && \
> > > > > > ((txn)->txn_flags & RBTXN_SKIPPED_PREPARE != 0) \
> > > > > > )
> > > > > >
> > > > > > /* Has a prepare or stream_prepare already been sent? */
> > > > > > #define rbtxn_sent_prepare(txn) \
> > > > > > ( \
> > > > > > ((txn)->txn_flags & RBTXN_IS_PREPARED != 0) && \
> > > > > > ((txn)->txn_flags & RBTXN_SENT_PREPARE != 0) \
> > > > > > )
> > > > > >
> > > > > > ~~~
> > > > > >
> > > > > > I think a to fix all this might be to enforce the RBTXN_IS_PREPARED
> > > > > > bitflag is set also for RBTXN_SKIPPED_PREPARE and RBTXN_SENT_PREPARE
> > > > > > constants, removing the ambiguity about how exactly to interpret those
> > > > > > two constants.
> > > > > >
> > > > > > e.g. something like
> > > > > >
> > > > > > #define RBTXN_IS_PREPARED 0x0040
> > > > > > #define RBTXN_SKIPPED_PREPARE    (0x0080 | RBTXN_IS_PREPARED)
> > > > > > #define RBTXN_SENT_PREPARE    (0x0200 | RBTXN_IS_PREPARED)
> > > > > >
> > > > >
> > > > > I think the better way would be to ensure that where we set
> > > > > RBTXN_SENT_PREPARE or RBTXN_SKIPPED_PREPARE, the transaction is a
> > > > > prepared one (RBTXN_IS_PREPARED must be already set). It should be
> > > > > already the case for RBTXN_SENT_PREPARE but we can ensure the same for
> > > > > RBTXN_SKIPPED_PREPARE as well.
> > >
> > > Since the patch already does "txn->txn_flags |= (RBTXN_IS_PREPARED |
> > > RBTXN_SKIPPED_PREPARE);", it's already ensured, no?
> > >
> >
> > I mean to say that we add assert to ensure the same.
> >
> > > I think we need to add both flags in ReorderBufferSkipPrepare(),
> > > because there is a case where a transaction might not be marked as
> > > RBTXN_IS_PREPARED here.
> > >
> >
> > Are you talking about the case when it is invoked from
> > DecodePrepare()?
>
> Yes. IIUC ReorderBufferSkipPrepare() is called only from DecodePrepare().
>
> > I thought we would set the flag in that code path.
>
> I agree that it makes sense to add the flag before calling
> ReorderBufferSkipPrepare().
>
> >
> > > > >
> > > > > Will that address your concern? Does anyone else have an opinion on this matter?
> > > >
> > > > Yes that would be OK, but should also add some clarifying comments in
> > > > the "reorderbuffer.h" like:
> > > >
> > > >  #define RBTXN_SKIPPED_PREPARE   0x0080 /* this flag can only be set
> > > > for RBTXN_IS_PREPARED transactions */
> > > >  #define RBTXN_SENT_PREPARE 0x0200 /* this flag can only be set for
> > > > RBTXN_IS_PREPARED transactions */
> > >
> > > I think the same is true for RBTXN_IS_SERIALIZED and
> > > RBTXN_IS_SERIALIZED_CLEAR; RBTXN_IS_SERIALIZED_CLEAR can only be set
> > > for RBTXN_IS_SERIALIZED transaction. Should we add some comments to
> > > them too? But I'm concerned about having too much explanation if we
> > > add descriptions to flags too while already having comments for
> > > corresponding macros.

Hm That RBTXN_IS_SERIALIZED / RBTXN_IS_SERIALIZED_CLEAR is used
differently -- it seems more tricky because RBTXN_IS_SERIALIZED flag
is turned OFF again when RBTXN_IS_SERIALIZED_CLEAR is turned ON.
(Whereas setting SKIPPED_PREPARE and SENT_PREPARE will never turn off
the tx type IS_PREPARED)

To be honest, I didn't understand the "CLEAR" part of that name. It
seems more like it should've been called something like
RBTXN_IS_SERIALIZED_ALREADY or RBTXN_IS_SERIALIZED_PREVIOUSLY or
whatever  instead of something that appears to be saying "has the
RBTXN_IS_SERIALIZED bitflag been cleared?" I understand the reluctance
to over-comment everything but OTOH currently there is no way really
to understand what these flags mean without looking through all the
code to try to figure them out from the usage.

My recurring gripe about these flags is simply that their meanings and
how to use them should be apparent just by looking at reorderbuffer.h
and not having to guess anything or look at how they get used in the
code. It doesn't matter if that is achieved by better constant names,
by more comments or by enhanced macros/functions with asserts but
currently just looking at that file still leaves the reader with lots
of unanswered questions.

> > >
> >
> > Yeah, I am fine either way especially, if we decide to add asserts for
> > RBTXN_IS_PREPARED when we set those flags.
> >
> > > Another way to ensure that is to convert these macros to inline
> > > functions and add an Assert() there, but it seems overkill.
> > >
> >
> > True, but that would ensure, we won't make any coding mistakes which
> > Peter wants to ensure by writing additional comments but asserting is
> > probably a better way.
>

Maybe I misunderstood, but I thought Amit's reply there meant that
rewriting the macros as inline functions with asserts would be a good
way to ensure no coding mistakes. Yet, the macros are still unchanged
in v16-0002.

> I've attached the updated patch. In the 0002 patch, I've marked the
> transaction as a prepared transaction in
> ReorderBufferRememberPrepareInfo() so that all prepared transactions
> that have a ReordeBufferTXN entry at that time can be marked properly.
> And I've put some Assertions to ensure that all prepared transaction
> related flags have been set properly. Thoughts?
>

Here are a couple of other review comments for patch v16-0002

======
Commit message

1.
The RBTXN_PREPARE flag (and its corresponding macro) have been renamed
to RBTXN_IS_PREPARE to explicitly indicate the transaction
type. Therefore, this commit also adds the RBTXN_IS_PREAPRE flag also
to the transaction that is a prepared transaction and has been
skipped, which previously had only the RBTXN_SKIPPED_PREPARE flag.

Instead of fixing the "RBTXN_IS_PREAPRE" typo, it looks like a new
problem (double "also" in the same sentence) was added in v16.

======
.../replication/logical/reorderbuffer.c

2.
  if ((txn->final_lsn < two_phase_at) && is_commit)
  {
- txn->txn_flags |= RBTXN_PREPARE;
+ txn->txn_flags |= RBTXN_IS_PREPARED;

Won't this flag be already this flag already set? The next code
comment ("The prepare info must have been updated ...") made me think
so.

But if it does need to be assigned here, then why are there not the
same assertions about existing IS_PREPARED, SKIPPED and SENT as they
were in the other place where this flag was set?

======
Kind Regards,
Peter Smith.
Fujitsu Australia.

Re: Skip collecting decoded changes of already-aborted transactions

From

Peter Smith

Date:

30 January, 08:31:40

Some comments for patch v17-0001.

======
Commit message.

1.
typo /noticeble/noticeable/

======
.../replication/logical/reorderbuffer.c

ReorderBufferCheckAndTruncateAbortedTXN:

2.
It seemed tricky that the only place that is setting the
RBTXN_IS_COMMITTED flag is the function
ReorderBufferCheckAndTruncateAbortedTXN because neither the function
name nor the function comment gives any indication that it should be
having this side effect

~~~

ReorderBufferProcessTXN:

3.
  if (rbtxn_prepared(txn))
+ {
  rb->prepare(rb, txn, commit_lsn);
+ txn->txn_flags |= RBTXN_SENT_PREPARE;
+ }

In ReorderBufferStreamCommit there is an assertion that we are not
trying to do another prepare() if the _SENT_PREPARE flag is already
set. Should this code have a similar assert?

======
src/include/replication/reorderbuffer.h

4.
+#define RBTXN_SENT_PREPARE 0x0200
+#define RBTXN_IS_COMMITTED 0x0400
+#define RBTXN_IS_ABORTED 0x0800

IIUC, unlike the _SENT_PREPARE, those _IS_COMMITTED and _IS_ABORTED
flags are not quite the same as saying rb->commit() or rb->abort() was
called. But, those flags are only set some time later by
ReorderBufferCheckAndTruncateAbortedTXN() function based on the commit
log status.

The lag between the commit/abort happening and these flag getting set
seems unintuitive. Should they be named differently -- e.g. maybe
RBTXN_IS_CLOG_COMMITTED, RBTXN_IS_CLOG_ABORTED instead?

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Skip collecting decoded changes of already-aborted transactions

From

Peter Smith

Date:

30 January, 10:12:22

On Tue, Jan 28, 2025 at 9:26 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Jan 27, 2025 at 7:01 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
...

> > To be honest, I didn't understand the "CLEAR" part of that name. It
> > seems more like it should've been called something like
> > RBTXN_IS_SERIALIZED_ALREADY or RBTXN_IS_SERIALIZED_PREVIOUSLY or
> > whatever  instead of something that appears to be saying "has the
> > RBTXN_IS_SERIALIZED bitflag been cleared?" I understand the reluctance
> > to over-comment everything but OTOH currently there is no way really
> > to understand what these flags mean without looking through all the
> > code to try to figure them out from the usage.
> >
> > My recurring gripe about these flags is simply that their meanings and
> > how to use them should be apparent just by looking at reorderbuffer.h
> > and not having to guess anything or look at how they get used in the
> > code. It doesn't matter if that is achieved by better constant names,
> > by more comments or by enhanced macros/functions with asserts but
> > currently just looking at that file still leaves the reader with lots
> > of unanswered questions.
>
> I see your point. IIUC we have the comments about what the checks with
> the flags means but not have the description about the relationship
> among the flags. I think we can start a new thread for clarifying
> these flags and their usage. We can also discuss renaming
> RBTXN_IS_SERIALIZED[_CLEARE] there too.
>

OK.

======

Some comments for patch v17-0002.

======
.../replication/logical/reorderbuffer.c

ReorderBufferSkipPrepare:

1.
+ /* txn must have been marked as a prepared transaction */
+ Assert((txn->txn_flags & RBTXN_IS_PREPARED) != 0);
+
  txn->txn_flags |= RBTXN_SKIPPED_PREPARE;

Should this also be asserting that the _SENT_PREPARE flag is false,
because we cannot be skipping it if we already sent the prepare.

~~~

ReorderBufferFinishPrepared:

2.

- txn->txn_flags |= RBTXN_PREPARE;
-
  /*
- * The prepare info must have been updated in txn even if we skip
- * prepare.
+ * txn must have been marked as a prepared transaction and skipped but
+ * not sent a prepare. Also, the prepare info must have been updated
+ * in txn even if we skip prepare.
  */
+ Assert((txn->txn_flags & (RBTXN_IS_PREPARED | RBTXN_SKIPPED_PREPARE)) != 0);
+ Assert((txn->txn_flags & RBTXN_SENT_PREPARE) == 0);
  Assert(txn->final_lsn != InvalidXLogRecPtr);

2a.
If it must have been prepared *and* skipped (as the comment says) then
the first assert should be written as:
Assert((txn->txn_flags & (RBTXN_IS_PREPARED | RBTXN_SKIPPED_PREPARE))
== (RBTXN_IS_PREPARED | RBTXN_SKIPPED_PREPARE));

or easier to just have 2 asserts:
Assert(txn->txn_flags & RBTXN_IS_PREPARED);
Assert(txn->txn_flags & RBTXN_SKIPPED_PREPARE);

~

2b.
later in the same function there is code:

if (is_commit)
  rb->commit_prepared(rb, txn, commit_lsn);
else
  rb->rollback_prepared(rb, txn, prepare_end_lsn, prepare_time);

So it is OK to do a commit_prepared/rollback_prepared even though no
prepare() has been sent?

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Skip collecting decoded changes of already-aborted transactions

From

Masahiko Sawada

Date:

03 February, 21:55:52

On Thu, Jan 30, 2025 at 7:07 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Fri, Jan 31, 2025 at 11:04 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Jan 29, 2025 at 9:32 PM Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > > ======
> > > .../replication/logical/reorderbuffer.c
> > >
> > > ReorderBufferCheckAndTruncateAbortedTXN:
> > >
> > > 2.
> > > It seemed tricky that the only place that is setting the
> > > RBTXN_IS_COMMITTED flag is the function
> > > ReorderBufferCheckAndTruncateAbortedTXN because neither the function
> > > name nor the function comment gives any indication that it should be
> > > having this side effect
> >
> > Hmm, it doesn't seem so tricky to me that a function with the name
> > ReorderBufferCheckAndTruncateAbortedTXN() checks the transaction
> > status to truncate an aborted transaction and caches the transaction
> > status as a side effect.
> >
>
> I was coming at this from a different perspective, asking myself the
> question "When can I know the RBTXN_IS_COMMITTED bit setting?" -- aka
> rbtxn_is_committed()?
>
> AFAICT it turns out we can only have confidence in that result when
> know ReorderBufferCheckAndTruncateAbortedTXN was called already for
> this tx. But this happens only when ReorderBufferCheckMemoryLimit()
> gets called. So, these bitflags are getting set as a side-effect of
> calling unrelated functions. (e.g. the fact we can't test if a tx was
> aborted/committed unless ReorderBufferCheckMemoryLimit is called
> seemed unusual to me).

I'm not sure if ReorderBufferCheckMemoryLimit() is an unrelated
function because the whole idea (also mentioned in the commit message)
is that we check the transaction status only for large transactions to
avoid CLOG lookup overheads. TBH I'm not sure why readers expect these
transaction status flags to always be set. Also in the function
comment we have:

 * the transaction is aborted. The transaction status is cached in
 * txn->txn_flags so we can skip future changes and avoid CLOG lookups on the
 * next call.

which describes the side-effect of the function that it caches the
transaction status.

> I don't know what the solution is; maybe some
> more comments would be enough.

I'm not sure how we can improve the comment TBH.

>
> > >
> > > ~~~
> > >
> > > ReorderBufferProcessTXN:
> > >
> > > 3.
> > >   if (rbtxn_prepared(txn))
> > > + {
> > >   rb->prepare(rb, txn, commit_lsn);
> > > + txn->txn_flags |= RBTXN_SENT_PREPARE;
> > > + }
> > >
> > > In ReorderBufferStreamCommit there is an assertion that we are not
> > > trying to do another prepare() if the _SENT_PREPARE flag is already
> > > set. Should this code have a similar assert?
> >
> > We can have a similar assert there but why do you think it's needed there?
>
> No particular reason, other than for consistency to have similar
> assertions everywhere that the RBTXN_SENT_PREPARE flag is set.

Okay, addressed in the v18 patch I've just sent[1].

Regards,

[1] https://www.postgresql.org/message-id/CAD21AoDmYZtLnPLuiERT6Cibv1Gf1DwDjzBevtqKYn0ZzMQqBQ%40mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Skip collecting decoded changes of already-aborted transactions

From

Masahiko Sawada

Date:

12 February, 01:13:24

On Mon, Feb 3, 2025 at 10:41 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Jan 29, 2025 at 11:12 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > On Tue, Jan 28, 2025 at 9:26 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Mon, Jan 27, 2025 at 7:01 PM Peter Smith <smithpb2250@gmail.com> wrote:
> > > >
> > ...
> >
> > > > To be honest, I didn't understand the "CLEAR" part of that name. It
> > > > seems more like it should've been called something like
> > > > RBTXN_IS_SERIALIZED_ALREADY or RBTXN_IS_SERIALIZED_PREVIOUSLY or
> > > > whatever  instead of something that appears to be saying "has the
> > > > RBTXN_IS_SERIALIZED bitflag been cleared?" I understand the reluctance
> > > > to over-comment everything but OTOH currently there is no way really
> > > > to understand what these flags mean without looking through all the
> > > > code to try to figure them out from the usage.
> > > >
> > > > My recurring gripe about these flags is simply that their meanings and
> > > > how to use them should be apparent just by looking at reorderbuffer.h
> > > > and not having to guess anything or look at how they get used in the
> > > > code. It doesn't matter if that is achieved by better constant names,
> > > > by more comments or by enhanced macros/functions with asserts but
> > > > currently just looking at that file still leaves the reader with lots
> > > > of unanswered questions.
> > >
> > > I see your point. IIUC we have the comments about what the checks with
> > > the flags means but not have the description about the relationship
> > > among the flags. I think we can start a new thread for clarifying
> > > these flags and their usage. We can also discuss renaming
> > > RBTXN_IS_SERIALIZED[_CLEARE] there too.
> > >
> >
> > OK.
> >
> > ======
> >
> > Some comments for patch v17-0002.
>
> Thank you for reviewing the patch.
>
> >
> > ======
> > .../replication/logical/reorderbuffer.c
> >
> > ReorderBufferSkipPrepare:
> >
> > 1.
> > + /* txn must have been marked as a prepared transaction */
> > + Assert((txn->txn_flags & RBTXN_IS_PREPARED) != 0);
> > +
> >   txn->txn_flags |= RBTXN_SKIPPED_PREPARE;
> >
> > Should this also be asserting that the _SENT_PREPARE flag is false,
> > because we cannot be skipping it if we already sent the prepare.
> >
> > ~~~
> >
> > ReorderBufferFinishPrepared:
> >
> > 2.
> >
> > - txn->txn_flags |= RBTXN_PREPARE;
> > -
> >   /*
> > - * The prepare info must have been updated in txn even if we skip
> > - * prepare.
> > + * txn must have been marked as a prepared transaction and skipped but
> > + * not sent a prepare. Also, the prepare info must have been updated
> > + * in txn even if we skip prepare.
> >   */
> > + Assert((txn->txn_flags & (RBTXN_IS_PREPARED | RBTXN_SKIPPED_PREPARE)) != 0);
> > + Assert((txn->txn_flags & RBTXN_SENT_PREPARE) == 0);
> >   Assert(txn->final_lsn != InvalidXLogRecPtr);
> >
> > 2a.
> > If it must have been prepared *and* skipped (as the comment says) then
> > the first assert should be written as:
> > Assert((txn->txn_flags & (RBTXN_IS_PREPARED | RBTXN_SKIPPED_PREPARE))
> > == (RBTXN_IS_PREPARED | RBTXN_SKIPPED_PREPARE));
> >
> > or easier to just have 2 asserts:
> > Assert(txn->txn_flags & RBTXN_IS_PREPARED);
> > Assert(txn->txn_flags & RBTXN_SKIPPED_PREPARE);
> >
>
> Agreed with all the above comments. Since checking
> prepared-transaction-related-flags is getting complicated I've
> introduced RBTXN_PREPARE_STATUS_FLAGS so that we can check the desired
> prepared transaction status easily.
>
> > ~
> >
> > 2b.
> > later in the same function there is code:
> >
> > if (is_commit)
> >   rb->commit_prepared(rb, txn, commit_lsn);
> > else
> >   rb->rollback_prepared(rb, txn, prepare_end_lsn, prepare_time);
> >
> > So it is OK to do a commit_prepared/rollback_prepared even though no
> > prepare() has been sent?
>
> IIUC ReorderBufferReplay() is responsible for sending a prepare
> message in this case. See the comment around there:
>
> /*
>  * By this time the txn has the prepare record information and it is
>  * important to use that so that downstream gets the accurate
>  * information. If instead, we have passed commit information here
>  * then downstream can behave as it has already replayed commit
>  * prepared after the restart.
>  */
>
> I've attached the updated patches.
>

If there are no further comments on v18 patches, I'm going to push
them tomorrow.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Skip collecting decoded changes of already-aborted transactions

From

Peter Smith

Date:

12 February, 08:43:12

Hi. Here are some minor comments for the v18* patch set.

//////////

Patch v18-0001

1.1. Commit message

A previously reported typo still exists:

/noticeble/noticeable/

//////////

Patch v18-0002

2.1
+#define RBTXN_PREPARE_STATUS_FLAGS (RBTXN_IS_PREPARED |
RBTXN_SKIPPED_PREPARE | RBTXN_SENT_PREPARE)
+

AFAICT bitmasks like this are more commonly named with a _MASK suffix.

How about something like:
- RBTXN_PREPARE_MASK
- RBTXN_PREPARE_STATUS_MASK
- RBTXN_PREPARE_FLAGS_MASK

==========
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Skip collecting decoded changes of already-aborted transactions

From

Masahiko Sawada

Date:

13 February, 04:58:12

On Tue, Feb 11, 2025 at 9:43 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi. Here are some minor comments for the v18* patch set.
>
> //////////
>
> Patch v18-0001
>
> 1.1. Commit message
>
> A previously reported typo still exists:
>
> /noticeble/noticeable/
>
> //////////
>
> Patch v18-0002
>
> 2.1
> +#define RBTXN_PREPARE_STATUS_FLAGS (RBTXN_IS_PREPARED |
> RBTXN_SKIPPED_PREPARE | RBTXN_SENT_PREPARE)
> +
>
> AFAICT bitmasks like this are more commonly named with a _MASK suffix.
>
> How about something like:
> - RBTXN_PREPARE_MASK
> - RBTXN_PREPARE_STATUS_MASK
> - RBTXN_PREPARE_FLAGS_MASK
>

Pushed both patches after addressing the above comments. Thank you for
your review!

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

RE: Skip collecting decoded changes of already-aborted transactions

From

"Hayato Kuroda (Fujitsu)"

Date:

14 March, 08:04:20

Dear hackers,

I hope I'm in the correct thread. In abfb296, rbtxn_skip_prepared() was removed from
SnapBuildDistributeNewCatalogSnapshot(). ISTM it was an only caller of the function.

Is it an intentional for external projects? Or it can be removed like attached?

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment

remove_func.diffs

Re: Skip collecting decoded changes of already-aborted transactions

From

Masahiko Sawada

Date:

17 March, 20:02:17

On Thu, Mar 13, 2025 at 10:04 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear hackers,
>
> I hope I'm in the correct thread. In abfb296, rbtxn_skip_prepared() was removed from
> SnapBuildDistributeNewCatalogSnapshot(). ISTM it was an only caller of the function.
>
> Is it an intentional for external projects? Or it can be removed like attached?

I think we can keep it as all RBTXN_xxx flags have the corresponding
macro and the comments of these macros somewhat help understand what
the flag indicates.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com