Re: Fix logical decoding not track transaction during SNAPBUILD_BUILDING_SNAPSHOT - Mailing list pgsql-hackers
| From | ocean_li_996 |
|---|---|
| Subject | Re: Fix logical decoding not track transaction during SNAPBUILD_BUILDING_SNAPSHOT |
| Date | |
| Msg-id | 1d8ffe1.9688.19c0b39303e.Coremail.ocean_li_996@163.com Whole thread Raw |
| In response to | Re: Re: Fix logical decoding not track transaction during SNAPBUILD_BUILDING_SNAPSHOT (Ajin Cherian <itsajin@gmail.com>) |
| List | pgsql-hackers |
Hi Ajin,
At 2026-01-29 11:33:59, "Ajin Cherian" <itsajin@gmail.com> wrote:
>I tested with patch 0002+0004 on HEAD, and the test added by patch
>0002 is failing like below.
>
>not ok 15 - snapshot_build 289 ms
># (test process exited with exit code 1)
>
>I see the postgres crashed and when I look at the core file, I see the
>below stack trace:
>Core was generated by
>`/home/ajin/postgresql/postgres3/postgres/tmp_install/home/ajin/install-oss/bin/postgres
>'' '' '' '' '' '' '''.
>Program terminated with signal SIGABRT, Aborted.
>#0 0x00007436092c7e9c in __pthread_kill_implementation () from /lib64/libc.so.6
>Missing rpms, try: dnf --enablerepo='*debug*' install
>zlib-ng-compat-debuginfo-2.2.3-2.el10.x86_64
>glibc-debuginfo-2.39-38.el10.x86_64
>libicu-debuginfo-74.2-5.el10.x86_64
>libstdc++-debuginfo-14.3.1-2.1.el10.x86_64
>libgcc-debuginfo-14.3.1-2.1.el10.x86_64
>(gdb) bt
>#0 0x00007436092c7e9c in __pthread_kill_implementation () from /lib64/libc.so.6
>#1 0x0000743609271a96 in raise () from /lib64/libc.so.6
>#2 0x00007436092598fa in abort () from /lib64/libc.so.6
>#3 0x0000000000a00a93 in ExceptionalCondition
>(conditionName=conditionName@entry=0xb1e039 "txn->ninvalidations ==
>0", fileName=fileName@entry=0xb1dcd8 "reorderbuffer.c",
>lineNumber=lineNumber@entry=3207) at assert.c:65
>#4 0x0000000000836c94 in ReorderBufferForget (rb=0x3cf96a40,
>xid=xid@entry=1136, lsn=34819568) at reorderbuffer.c:3207
>#5 0x0000000000823aea in DecodeCommit (ctx=ctx@entry=0x3cf869d0,
>buf=buf@entry=0x7ffd069389a0, parsed=parsed@entry=0x7ffd069387d0,
>xid=xid@entry=1136, two_phase=false) at decode.c:707
>#6 0x000000000082497c in xact_decode (ctx=ctx@entry=0x3cf869d0,
>buf=buf@entry=0x7ffd069389a0) at decode.c:237
>#7 0x00000000008246eb in LogicalDecodingProcessRecord
>(ctx=ctx@entry=0x3cf869d0, record=0x3cf86da8) at decode.c:116
>#8 0x0000000000829602 in DecodingContextFindStartpoint
>(ctx=ctx@entry=0x3cf869d0) at logical.c:647
>#9 0x000000000084e8ea in create_logical_replication_slot
>(name=name@entry=0x3ce8fb70 "isolation_slot",
>plugin=plugin@entry=0x3ce8fc10 "test_decoding",
>temporary=temporary@entry=false, two_phase=two_phase@entry=false,
>failover=failover@entry=false,
> restart_lsn=restart_lsn@entry=0, find_startpoint=true) at slotfuncs.c:177
>#10 0x000000000084f300 in pg_create_logical_replication_slot
>(fcinfo=<optimized out>) at slotfuncs.c:207
>#11 0x00000000006c77c4 in ExecMakeTableFunctionResult
>(setexpr=0x3cf61b18, econtext=0x3cf61968, argContext=<optimized out>,
>expectedDesc=0x3cf79f48, randomAccess=false) at execSRF.c:234
>#12 0x00000000006da839 in FunctionNext (node=node@entry=0x3cf61758) at
>nodeFunctionscan.c:94
>
>(gdb) f 4
>#4 0x0000000000836c94 in ReorderBufferForget (rb=0x3cf96a40,
>xid=xid@entry=1136, lsn=34819568) at reorderbuffer.c:3207
>3207 Assert(txn->ninvalidations == 0);
>(gdb) p *txn
>$1 = {txn_flags = 1, xid = 1136, toplevel_xid = 0, gid = 0x0,
>first_lsn = 34814328, final_lsn = 34819568, end_lsn = 0, toptxn = 0x0,
>restart_decoding_lsn = 0, origin_id = 0, origin_lsn = 0, {commit_time
>= 0, prepare_time = 0, abort_time = 0}, base_snapshot = 0x0,
> base_snapshot_lsn = 0, base_snapshot_node = {prev = 0x0, next =
>0x0}, snapshot_now = 0x0, command_id = 4294967295, nentries = 14,
>nentries_mem = 14, changes = {head = {prev = 0x3cfb5510, next =
>0x3cfb4ae8}}, tuplecids = {head = {prev = 0x3cfb5440, next =
>0x3cfb4a80}},
> ntuplecids = 13, tuplecid_hash = 0x0, toast_hash = 0x0, subtxns =
>{head = {prev = 0x3cfb2c68, next = 0x3cfb2c68}}, nsubtxns = 0,
>ninvalidations = 22, invalidations = 0x3cf96c50,
>ninvalidations_distributed = 0, invalidations_distributed = 0x0, node
>= {prev = 0x3cfb2b30,
> next = 0x3cf96a48}, catchange_node = {prev = 0x3cf96a68, next =
>0x3cf96a68}, txn_node = {first_child = 0x0, next_sibling = 0x0,
>prev_or_parent = 0x0}, size = 1472, total_size = 1472,
>output_plugin_private = 0x0}
>
>Looks like the assert in ReorderBufferForget failed because
>ninvalidations is not 0.
>
>You need to build with asserts enabled.
Oops, I forgot to add the --enable-cassert option. After adding it,I was able to reproduce the issue.
I spent some time researching the code. Now, I am wonder about therelationship between the invalidation message and the base snapshotof that XID.
1) IIUC, current design requires that an XID with an invalidation messagemust have a base snapshot (see the handling in SnapBuildCommitTxn andReorderBufferForget regarding base snapshots). This is consistent withthe requirement when adding other regular changes. In that case, whenadding an invalidation message, should we also apply the same checks asfor other regular changes? Specifically, for transactions without enoughinformation to decode, we would skip adding the invalidation message (seeSnapBuildProcessChange). This should be fine because at that point nodecoding has been done yet, so there is no cache to invalidate. Based onthis idea, I have updated the 0003 and 0004 patches (see attached v7.1-0003and v7.1-0004).
2) When executing an invalidation message with an XID, do we necessarilyrequire that the XID has a base snapshot? I did not see a strict relationshipbetween the two. In other words, when executing an invalidation message, wecould potentially remove the check for xid->base_snapshot. From another perspective,before executing an invalidation message, the current logic already processesthe base snapshot, and a base snapshot may be not present (see the handlingin SnapBuildCommitTxn with if (needs_snapshot)). Based on this idea, I haveupdated the 0003 and 0004 patches (see attached v7.2-0003 and v7.2-0004).
Regards,Haiyang Li
Attachment
pgsql-hackers by date: