Thread: Bug in logical decoding of in-progress transactions

Bug in logical decoding of in-progress transactions

From
Amit Kapila
Date:
Hi,

There is a recent build farm failure [1] in one of the test_decoding
tests as pointed by Tom Lane [2]. The failure report is shown below:

@@ -71,6 +71,8 @@
                    data
 ------------------------------------------
  opening a streamed block for transaction
+ closing a streamed block for transaction
+ opening a streamed block for transaction
  streaming change for transaction
  streaming change for transaction
  streaming change for transaction
@@ -83,7 +85,7 @@
  streaming change for transaction
  closing a streamed block for transaction
  committing streamed transaction
-(13 rows)
+(15 rows)

Here, the symptoms are quite similar to what we have fixed in commit
82a0ba7707 which is that an extra empty transaction is being decoded
in the test. It can happen even if have instructed the test to 'skip
empty xacts' for streaming transactions because the test_decoding
plugin APIs (related to streaming changes for in-progress xacts) makes
no effort to skip such empty xacts. It was kept intentionally like
that under the assumption that we would never try to stream empty
xacts but on closer inspection of the code, it seems to me that
assumption was not correct. Basically, we can pick to stream a
transaction that has change messages for
REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT and we don't send such
messages to downstream rather they are just to update the internal
state. So, in this particular failure, it is possible that autovacuum
transaction has got such a change message added by one of the other
committed xact and on trying to stream it we get such additional
messages. The fix is to skip empty xacts when indicated by the user in
streaming APIs of test_decoding.

Thoughts?

[1] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2020-09-09+03%3A42%3A19
[2] - https://www.postgresql.org/message-id/118303.1599691636%40sss.pgh.pa.us

-- 
With Regards,
Amit Kapila.



Re: Bug in logical decoding of in-progress transactions

From
Dilip Kumar
Date:
On Thu, Sep 10, 2020 at 11:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
Hi,

There is a recent build farm failure [1] in one of the test_decoding
tests as pointed by Tom Lane [2]. The failure report is shown below:

@@ -71,6 +71,8 @@
                    data
 ------------------------------------------
  opening a streamed block for transaction
+ closing a streamed block for transaction
+ opening a streamed block for transaction
  streaming change for transaction
  streaming change for transaction
  streaming change for transaction
@@ -83,7 +85,7 @@
  streaming change for transaction
  closing a streamed block for transaction
  committing streamed transaction
-(13 rows)
+(15 rows)

Here, the symptoms are quite similar to what we have fixed in commit
82a0ba7707 which is that an extra empty transaction is being decoded
in the test. It can happen even if have instructed the test to 'skip
empty xacts' for streaming transactions because the test_decoding
plugin APIs (related to streaming changes for in-progress xacts) makes
no effort to skip such empty xacts. It was kept intentionally like
that under the assumption that we would never try to stream empty
xacts but on closer inspection of the code, it seems to me that
assumption was not correct. Basically, we can pick to stream a
transaction that has change messages for
REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT and we don't send such
messages to downstream rather they are just to update the internal
state. So, in this particular failure, it is possible that autovacuum
transaction has got such a change message added by one of the other
committed xact and on trying to stream it we get such additional
messages. The fix is to skip empty xacts when indicated by the user in
streaming APIs of test_decoding.

Thoughts?

Yeah, that's an issue.
 

[1] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2020-09-09+03%3A42%3A19
[2] - https://www.postgresql.org/message-id/118303.1599691636%40sss.pgh.pa.us


I have written a test case to reproduce the same.  I have also prepared a patch to skip the empty transaction.  And after that, the issue has been fixed.  But the extra side effect will be that it would skip any empty stream even if the transaction is not empty.  As such I don't see any problem with that but this is not what the user has asked for.

logical_decoding_work_mem=64kB

SET synchronous_commit = on;
SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');

CREATE TABLE stream_test(data text);

-- consume DDL
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
CREATE OR REPLACE FUNCTION large_val() RETURNS TEXT LANGUAGE SQL AS 'select array_agg(md5(g::text))::text from generate_series(1, 80000) g';

--session1
BEGIN;
CREATE TABLE stream_test1(data text);

--session2
BEGIN;
CREATE TABLE stream_test2(data text);
COMMIT;

--session3
BEGIN;
INSERT INTO stream_test SELECT large_val();
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '1', 'skip-empty-xacts', '1', 'stream-changes', '1');

                       data                      
--------------------------------------------------
 opening a streamed block for transaction TXN 508
 closing a streamed block for transaction TXN 508
 opening a streamed block for transaction TXN 510
 streaming change for TXN 510
 closing a streamed block for transaction TXN 510
(5 rows)


After patch
                       data                      
--------------------------------------------------
 opening a streamed block for transaction TXN 510
 streaming change for TXN 510
 closing a streamed block for transaction TXN 510


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachment

Re: Bug in logical decoding of in-progress transactions

From
Amit Kapila
Date:
On Thu, Sep 10, 2020 at 11:42 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Sep 10, 2020 at 11:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> Hi,
>>
>> There is a recent build farm failure [1] in one of the test_decoding
>> tests as pointed by Tom Lane [2]. The failure report is shown below:
>>
>> @@ -71,6 +71,8 @@
>>                     data
>>  ------------------------------------------
>>   opening a streamed block for transaction
>> + closing a streamed block for transaction
>> + opening a streamed block for transaction
>>   streaming change for transaction
>>   streaming change for transaction
>>   streaming change for transaction
>> @@ -83,7 +85,7 @@
>>   streaming change for transaction
>>   closing a streamed block for transaction
>>   committing streamed transaction
>> -(13 rows)
>> +(15 rows)
>>
>> Here, the symptoms are quite similar to what we have fixed in commit
>> 82a0ba7707 which is that an extra empty transaction is being decoded
>> in the test. It can happen even if have instructed the test to 'skip
>> empty xacts' for streaming transactions because the test_decoding
>> plugin APIs (related to streaming changes for in-progress xacts) makes
>> no effort to skip such empty xacts. It was kept intentionally like
>> that under the assumption that we would never try to stream empty
>> xacts but on closer inspection of the code, it seems to me that
>> assumption was not correct. Basically, we can pick to stream a
>> transaction that has change messages for
>> REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT and we don't send such
>> messages to downstream rather they are just to update the internal
>> state. So, in this particular failure, it is possible that autovacuum
>> transaction has got such a change message added by one of the other
>> committed xact and on trying to stream it we get such additional
>> messages. The fix is to skip empty xacts when indicated by the user in
>> streaming APIs of test_decoding.
>>
>> Thoughts?
>
>
> Yeah, that's an issue.
>
>>
>>
>> [1] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2020-09-09+03%3A42%3A19
>> [2] - https://www.postgresql.org/message-id/118303.1599691636%40sss.pgh.pa.us
>>
>
> I have written a test case to reproduce the same.  I have also prepared a patch to skip the empty transaction.  And
afterthat, the issue has been fixed.  But the extra side effect will be that it would skip any empty stream even if the
transactionis not empty.  As such I don't see any problem with that but this is not what the user has asked for. 
>

Isn't that true for non-streaming xacts as well? Basically
skip-empty-xacts option indicates that if there is no change for
'tuple' or 'message', we skip it.

--
With Regards,
Amit Kapila.



Re: Bug in logical decoding of in-progress transactions

From
Dilip Kumar
Date:
On Thu, Sep 10, 2020 at 11:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Sep 10, 2020 at 11:42 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Sep 10, 2020 at 11:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> Hi,
>>
>> There is a recent build farm failure [1] in one of the test_decoding
>> tests as pointed by Tom Lane [2]. The failure report is shown below:
>>
>> @@ -71,6 +71,8 @@
>>                     data
>>  ------------------------------------------
>>   opening a streamed block for transaction
>> + closing a streamed block for transaction
>> + opening a streamed block for transaction
>>   streaming change for transaction
>>   streaming change for transaction
>>   streaming change for transaction
>> @@ -83,7 +85,7 @@
>>   streaming change for transaction
>>   closing a streamed block for transaction
>>   committing streamed transaction
>> -(13 rows)
>> +(15 rows)
>>
>> Here, the symptoms are quite similar to what we have fixed in commit
>> 82a0ba7707 which is that an extra empty transaction is being decoded
>> in the test. It can happen even if have instructed the test to 'skip
>> empty xacts' for streaming transactions because the test_decoding
>> plugin APIs (related to streaming changes for in-progress xacts) makes
>> no effort to skip such empty xacts. It was kept intentionally like
>> that under the assumption that we would never try to stream empty
>> xacts but on closer inspection of the code, it seems to me that
>> assumption was not correct. Basically, we can pick to stream a
>> transaction that has change messages for
>> REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT and we don't send such
>> messages to downstream rather they are just to update the internal
>> state. So, in this particular failure, it is possible that autovacuum
>> transaction has got such a change message added by one of the other
>> committed xact and on trying to stream it we get such additional
>> messages. The fix is to skip empty xacts when indicated by the user in
>> streaming APIs of test_decoding.
>>
>> Thoughts?
>
>
> Yeah, that's an issue.
>
>>
>>
>> [1] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2020-09-09+03%3A42%3A19
>> [2] - https://www.postgresql.org/message-id/118303.1599691636%40sss.pgh.pa.us
>>
>
> I have written a test case to reproduce the same.  I have also prepared a patch to skip the empty transaction.  And after that, the issue has been fixed.  But the extra side effect will be that it would skip any empty stream even if the transaction is not empty.  As such I don't see any problem with that but this is not what the user has asked for.
>

Isn't that true for non-streaming xacts as well? Basically
skip-empty-xacts option indicates that if there is no change for
'tuple' or 'message', we skip it.

Yeah, that's right. 

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Bug in logical decoding of in-progress transactions

From
Dilip Kumar
Date:
On Thu, Sep 10, 2020 at 11:53 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Thu, Sep 10, 2020 at 11:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Sep 10, 2020 at 11:42 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Sep 10, 2020 at 11:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> Hi,
>>
>> There is a recent build farm failure [1] in one of the test_decoding
>> tests as pointed by Tom Lane [2]. The failure report is shown below:
>>
>> @@ -71,6 +71,8 @@
>>                     data
>>  ------------------------------------------
>>   opening a streamed block for transaction
>> + closing a streamed block for transaction
>> + opening a streamed block for transaction
>>   streaming change for transaction
>>   streaming change for transaction
>>   streaming change for transaction
>> @@ -83,7 +85,7 @@
>>   streaming change for transaction
>>   closing a streamed block for transaction
>>   committing streamed transaction
>> -(13 rows)
>> +(15 rows)
>>
>> Here, the symptoms are quite similar to what we have fixed in commit
>> 82a0ba7707 which is that an extra empty transaction is being decoded
>> in the test. It can happen even if have instructed the test to 'skip
>> empty xacts' for streaming transactions because the test_decoding
>> plugin APIs (related to streaming changes for in-progress xacts) makes
>> no effort to skip such empty xacts. It was kept intentionally like
>> that under the assumption that we would never try to stream empty
>> xacts but on closer inspection of the code, it seems to me that
>> assumption was not correct. Basically, we can pick to stream a
>> transaction that has change messages for
>> REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT and we don't send such
>> messages to downstream rather they are just to update the internal
>> state. So, in this particular failure, it is possible that autovacuum
>> transaction has got such a change message added by one of the other
>> committed xact and on trying to stream it we get such additional
>> messages. The fix is to skip empty xacts when indicated by the user in
>> streaming APIs of test_decoding.
>>
>> Thoughts?
>
>
> Yeah, that's an issue.
>
>>
>>
>> [1] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2020-09-09+03%3A42%3A19
>> [2] - https://www.postgresql.org/message-id/118303.1599691636%40sss.pgh.pa.us
>>
>
> I have written a test case to reproduce the same.  I have also prepared a patch to skip the empty transaction.  And after that, the issue has been fixed.  But the extra side effect will be that it would skip any empty stream even if the transaction is not empty.  As such I don't see any problem with that but this is not what the user has asked for.
>

Isn't that true for non-streaming xacts as well? Basically
skip-empty-xacts option indicates that if there is no change for
'tuple' or 'message', we skip it.

Yeah, that's right. 

I have removed some comments which are not valid after this patch.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachment

Re: Bug in logical decoding of in-progress transactions

From
Amit Kapila
Date:
On Thu, Sep 10, 2020 at 12:00 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Sep 10, 2020 at 11:53 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>>
>>> >
>>> > I have written a test case to reproduce the same.
>

Can we write an isolation test for this scenario? See some similar
tests in contrib/test_decoding/specs. If that is possible then we can
probably remove the test which failed and instead write an isolation
test involving three transactions as shown by you. Also, please
prepare two separate patches (one for test and other for code) if you
are able to convert existing test to an isolation test as that will
make it easier to test the fix.

>
> I have removed some comments which are not valid after this patch.
>

Few comments:
=============
1. We need to set xact_wrote_changes in pg_decode_stream_truncate() as
well along with the APIs in which you have set it.
2.
+static void
+pg_output_stream_start(LogicalDecodingContext *ctx, TestDecodingData
*data, ReorderBufferTXN *txn, bool last_write)
+{
  OutputPluginPrepareWrite(ctx, true);
  if (data->include_xids)
  appendStringInfo(ctx->out, "opening a streamed block for transaction
TXN %u", txn->xid);
@@ -601,16 +610,15 @@ pg_decode_stream_start(LogicalDecodingContext *ctx,
  OutputPluginWrite(ctx, true);

In this API, we need to use 'last_write' in OutputPluginPrepareWrite()
and OutputPluginWrite().

The attached patch fixes both these comments.

-- 
With Regards,
Amit Kapila.

Attachment

Re: Bug in logical decoding of in-progress transactions

From
Dilip Kumar
Date:
On Thu, Sep 10, 2020 at 2:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Sep 10, 2020 at 12:00 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Sep 10, 2020 at 11:53 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>>
>>> >
>>> > I have written a test case to reproduce the same.
>

Can we write an isolation test for this scenario? See some similar
tests in contrib/test_decoding/specs. If that is possible then we can
probably remove the test which failed and instead write an isolation
test involving three transactions as shown by you. Also, please
prepare two separate patches (one for test and other for code) if you
are able to convert existing test to an isolation test as that will
make it easier to test the fix.

I have written a test in isolation test.  IMHO, we should not try to merge stream.sql to this isolation test mainly for two reasons a) this isolation test is very specific that while we are trying to stream we must have the incomplete changes so if we try to put more operation like message/truncate/abort then it will become unpredictable.  Currently, I have kept it with just one big tuple so it is a guarantee that whenever the the logical_decoding_work_mem exceed then it will have the partial changes.   b) we can add another operation in the transaction and cover the stream changes but then those are not very specific to the isolation test.  So I feel it is better to put only the specific scenario in the isolation test.

 

>
> I have removed some comments which are not valid after this patch.
>

Few comments:
=============
1. We need to set xact_wrote_changes in pg_decode_stream_truncate() as
well along with the APIs in which you have set it.
2.
+static void
+pg_output_stream_start(LogicalDecodingContext *ctx, TestDecodingData
*data, ReorderBufferTXN *txn, bool last_write)
+{
  OutputPluginPrepareWrite(ctx, true);
  if (data->include_xids)
  appendStringInfo(ctx->out, "opening a streamed block for transaction
TXN %u", txn->xid);
@@ -601,16 +610,15 @@ pg_decode_stream_start(LogicalDecodingContext *ctx,
  OutputPluginWrite(ctx, true);

In this API, we need to use 'last_write' in OutputPluginPrepareWrite()
and OutputPluginWrite().

The attached patch fixes both these comments.

Okay,  there is some change in stream.out so I have included that in the first patch.


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachment

Re: Bug in logical decoding of in-progress transactions

From
Amit Kapila
Date:
On Thu, Sep 10, 2020 at 4:01 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
>
> Okay,  there is some change in stream.out so I have included that in the first patch.
>

Pushed after minor changes in comments.

-- 
With Regards,
Amit Kapila.