Re: [HACKERS] logical decoding of two-phase transactions - Mailing list pgsql-hackers

From Ajin Cherian
Subject Re: [HACKERS] logical decoding of two-phase transactions
Date
Msg-id CAFPTHDYy7_50QzsXbYp4cr2L-PiOZ0+ya=+xwzYzLo0UWRO=Gg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] logical decoding of two-phase transactions  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [HACKERS] logical decoding of two-phase transactions
List pgsql-hackers
On Sun, Nov 29, 2020 at 1:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

> Then once you found which existing test covers
> that, you can try to generate prepared transaction behavior as
> mentioned above.

I was able to find out the test case that exercises that code, it is
the ondisk_startup spec in test_decoding. Using that, I was able to
create the problem with the following setup:
Using 4 sessions (this could be optimized to 3, but just sharing what
I've tested):

s1(session 1):
begin;
postgres=# begin;
BEGIN
postgres=*# SELECT pg_current_xact_id();
 pg_current_xact_id
--------------------
                546
(1 row)
--------------------the above commands leave a transaction running
s2:
CREATE TABLE do_write(id serial primary key);
SELECT 'init' FROM
pg_create_logical_replication_slot('isolation_slot', 'test_decoding');

---------------------this will hang because of 546 txn is pending

s3:
postgres=# begin;
BEGIN
postgres=*# SELECT pg_current_xact_id();
 pg_current_xact_id
--------------------
                547
(1 row)
-------------------------------- leave another txn pending---

s1:
postgres=*# ALTER TABLE do_write ADD COLUMN addedbys2 int;
ALTER TABLE
postgres=*# commit;
------------------------------commit the first txn; this will cause
state to move to SNAPBUILD_FULL_SNAPSHOT state
2020-11-30 03:31:07.354 EST [16312] LOG:  logical decoding found
initial consistent point at 0/1730A18
2020-11-30 03:31:07.354 EST [16312] DETAIL:  Waiting for transactions
(approximately 1) older than 553 to end.


s4:
postgres=# begin;
BEGIN
postgres=*# INSERT INTO do_write DEFAULT VALUES;
INSERT 0 1
postgres=*# prepare transaction 'test1';
PREPARE TRANSACTION
-------------- leave this transaction prepared

s3:
postgres=*# commit;
COMMIT
----------------- this will cause s2 call to return and a consistent
point has been reached.
2020-11-30 03:31:34.200 EST [16312] LOG:  logical decoding found
consistent point at 0/1730D58

s4:
commit prepared 'test1';

s2:
postgres=# SELECT * FROM pg_logical_slot_get_changes('isolation_slot',
NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0',
'skip-empty-xacts', '1');
    lsn    | xid |          data
-----------+-----+-------------------------
 0/1730FC8 | 553 | COMMIT PREPARED 'test1'
(1 row)

In pg_logical_slot_get_changes() we see only the Commit Prepared but
no insert and no prepare command. I debugged this and I see that in
DecodePrepare, the
prepare is skipped because the prepare lsn is prior to the
start_decoding_at point and is skipped in SnapBuildXactNeedsSkip.  So,
the reason for skipping
the PREPARE is similar to the reason why it would have been skipped on
a restart after a previous decode run.

One possible fix would be similar to what you suggested, in
DecodePrepare , add the check DecodingContextReady(ctx), which if
false would indicate that the
PREPARE was prior to a consistent snapshot and if so, set a flag value
in txn accordingly (say RBTXN_PREPARE_NOT_DECODED?), and if this flag
is detected
while handling the COMMIT PREPARED, then handle it like you would
handle a COMMIT. This would ensure that all the changes of the
transaction are sent out
and at the same time, the subscriber side does not need to try and
handle a prepared transaction that does not exist on its side.

Let me know what you think of this?

regards,
Ajin Cherian
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: obsolete comment from WITH OIDS days
Next
From: Dilip Kumar
Date:
Subject: Re: Is Recovery actually paused?