RE: Found issues related with logical replication and 2PC - Mailing list pgsql-hackers

From Hayato Kuroda (Fujitsu)
Subject RE: Found issues related with logical replication and 2PC
Date
Msg-id TYAPR01MB56927A6DBB25C9C79B16D9C4F5BA2@TYAPR01MB5692.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Found issues related with logical replication and 2PC  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Found issues related with logical replication and 2PC
List pgsql-hackers
Dear Amit,

>
> The code changes look mostly good to me. I have changed/added a few
> comments in the attached modified version.
>

Thanks for updating the patch! It LGTM. I've tested your patch and confirmed
it did not cause the data loss. I used the source which was applied v3 and additional
fix to visualize the replication command [1].

Method
======

1. Construct a logical replication system with two_phase = true and
   synchronous_commit = false
2. attach a walwriter of the subscriber to stop the process
3. Start a transaction and prepare it for the publisher.
4. Wait until the worker replies to the publisher.
5. Stop the subscriber
6. Restart subscriber.
7. Do COMMIT PREPARED

Attached script can construct the same situation.

Result
======

After the step 5, I ran pg_waldump and confirmed PREPARE record existed on
the subscriber.

```
$ pg_waldump data_sub/pg_wal/000000010000000000000001
...
rmgr: Transaction len..., desc: PREPARE gid pg_gid_16389_741: ...
rmgr: XLOG        len..., desc: CHECKPOINT_SHUTDOWN ...
```

Also, after the step 7, I confirmed that only the COMMIT PREPARED record
was sent because log output the below line. "75" means the ASCII character 'K';
this indicated that the replication message corresponded to COMMIT PREPARED.
```
LOG:  XXX got message 75
```



Additionally, I did another test, which is basically same as above but 1) XLogFlush()
in EndPrepare() was commented out and 2) kill -9 was used at step 5 to emulate a
crash. Since the PREPAREd transaction cannot survive on the subscriber in this case,
so COMMIT PREPARED command on publisher causes an ERROR on the subscriber.
```
ERROR:  prepared transaction with identifier "pg_gid_16389_741" does not exist
CONTEXT:  processing remote data for replication origin "pg_16389" during message
            type "COMMIT PREPARED" in transaction 741, finished at 0/15463C0
```
I think this shows that the backend process can ensure the WAL is persisted so data loss
won't occur.


[1]:
```
@@ -3297,6 +3297,8 @@ apply_dispatch(StringInfo s)
     saved_command = apply_error_callback_arg.command;
     apply_error_callback_arg.command = action;
 
+    elog(LOG, "XXX got message %d", action);
```

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Don't overwrite scan key in systable_beginscan()
Next
From: Amit Kapila
Date:
Subject: Re: Found issues related with logical replication and 2PC