Re: [HACKERS] logical decoding of two-phase transactions - Mailing list pgsql-hackers
From | Ajin Cherian |
---|---|
Subject | Re: [HACKERS] logical decoding of two-phase transactions |
Date | |
Msg-id | CAFPTHDYy7_50QzsXbYp4cr2L-PiOZ0+ya=+xwzYzLo0UWRO=Gg@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] logical decoding of two-phase transactions (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: [HACKERS] logical decoding of two-phase transactions
|
List | pgsql-hackers |
On Sun, Nov 29, 2020 at 1:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > Then once you found which existing test covers > that, you can try to generate prepared transaction behavior as > mentioned above. I was able to find out the test case that exercises that code, it is the ondisk_startup spec in test_decoding. Using that, I was able to create the problem with the following setup: Using 4 sessions (this could be optimized to 3, but just sharing what I've tested): s1(session 1): begin; postgres=# begin; BEGIN postgres=*# SELECT pg_current_xact_id(); pg_current_xact_id -------------------- 546 (1 row) --------------------the above commands leave a transaction running s2: CREATE TABLE do_write(id serial primary key); SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding'); ---------------------this will hang because of 546 txn is pending s3: postgres=# begin; BEGIN postgres=*# SELECT pg_current_xact_id(); pg_current_xact_id -------------------- 547 (1 row) -------------------------------- leave another txn pending--- s1: postgres=*# ALTER TABLE do_write ADD COLUMN addedbys2 int; ALTER TABLE postgres=*# commit; ------------------------------commit the first txn; this will cause state to move to SNAPBUILD_FULL_SNAPSHOT state 2020-11-30 03:31:07.354 EST [16312] LOG: logical decoding found initial consistent point at 0/1730A18 2020-11-30 03:31:07.354 EST [16312] DETAIL: Waiting for transactions (approximately 1) older than 553 to end. s4: postgres=# begin; BEGIN postgres=*# INSERT INTO do_write DEFAULT VALUES; INSERT 0 1 postgres=*# prepare transaction 'test1'; PREPARE TRANSACTION -------------- leave this transaction prepared s3: postgres=*# commit; COMMIT ----------------- this will cause s2 call to return and a consistent point has been reached. 2020-11-30 03:31:34.200 EST [16312] LOG: logical decoding found consistent point at 0/1730D58 s4: commit prepared 'test1'; s2: postgres=# SELECT * FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1'); lsn | xid | data -----------+-----+------------------------- 0/1730FC8 | 553 | COMMIT PREPARED 'test1' (1 row) In pg_logical_slot_get_changes() we see only the Commit Prepared but no insert and no prepare command. I debugged this and I see that in DecodePrepare, the prepare is skipped because the prepare lsn is prior to the start_decoding_at point and is skipped in SnapBuildXactNeedsSkip. So, the reason for skipping the PREPARE is similar to the reason why it would have been skipped on a restart after a previous decode run. One possible fix would be similar to what you suggested, in DecodePrepare , add the check DecodingContextReady(ctx), which if false would indicate that the PREPARE was prior to a consistent snapshot and if so, set a flag value in txn accordingly (say RBTXN_PREPARE_NOT_DECODED?), and if this flag is detected while handling the COMMIT PREPARED, then handle it like you would handle a COMMIT. This would ensure that all the changes of the transaction are sent out and at the same time, the subscriber side does not need to try and handle a prepared transaction that does not exist on its side. Let me know what you think of this? regards, Ajin Cherian Fujitsu Australia
pgsql-hackers by date: