Dear Haiyang,
Thanks for checking! This reply was still focused only on "Issue 2" in your notation.
>## Issue 2
>Inspired by your spec case, I've reorganized the spec case provided in [2]. The new test in attachment
>is able to reproduce the issue mentioned in [1] even before commit 6b77048e5.
Good findings. I also confirmed the workload could fail after reverting the 6b77048e5.
Also confirmed that the patch [1] could fix the workload as well.
permutation "s0_init" "s0_begin" "s0_savepoint" "s0_create_part1" "s0_savepoint_release"
"s2_init" "s1_checkpoint" "s1_get_changes" "s0_commit" "s2_get_changes"
## Analysis
The point was that the serialized snapshot by another replication slot can be reused.
When the first get_change is called, a consistent snapshot can be serialized because
of the XLOG_RUNNING_XACTS record (see later).
The get_changes for the second slot reuses so that it can read WAL records property.
(If the first slot does not exist, the status of the snapshot would be
SNAPBUILD_BUILDING_SNAPSHOT. So no records are read)
In the second get_changes, below records are read. First (LOCK, RUNNING_XACTS)
pair is generated from the slot creation, and second pair comes from the
CHECKPOINT. I.e., it reads all records from the slot generation.
```
...lsn: 0/01906DB8, prev 0/01906D58, desc: LOCK ...
...lsn: 0/01906DF0, prev 0/01906DB8, desc: RUNNING_XACTS ...
...lsn: 0/01906E30, prev 0/01906DF0, desc: LOCK ...
...lsn: 0/01906E68, prev 0/01906E30, desc: RUNNING_XACTS ...
...lsn: 0/01906EA8, prev 0/01906E68, desc: CHECKPOINT_ONLINE ...
...lsn: 0/01906F20, prev 0/01906EA8, desc: COMMIT ... subxacts: 728; ... inval msgs: ...
```
Also the final COMMIT record contains the info for a subtransaction and
XACT_XINFO_HAS_INVALS flag, so DecodeCommit()->SnapBuildXidSetCatalogChanges()
is called transactions.
After that, two ReorderBufferTXNs are created with the same LSN, it fails the
assertion in AssertTXNLsnOrder().
I will update the patch if above analysis is correct.
>The approach in [3] is also LGFM.
Thanks. I agreed that we should not ease condition for Assert() as much as possible.
[1]:
https://www.postgresql.org/message-id/TYCPR01MB1207790E98F0A563280CC39FCF5262%40TYCPR01MB12077.jpnprd01.prod.outlook.com
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/