Re: BUG #19109: catalog lookup with the wrong snapshot during logical decoding causes coredump - Mailing list pgsql-bugs
| From | Xuneng Zhou |
|---|---|
| Subject | Re: BUG #19109: catalog lookup with the wrong snapshot during logical decoding causes coredump |
| Date | |
| Msg-id | CABPTF7W883TcGrY71vVDTdEJgxJ993_Fe+fHTT+1Ttk6oDH3YQ@mail.gmail.com Whole thread Raw |
| In response to | BUG #19109: catalog lookup with the wrong snapshot during logical decoding causes coredump (PG Bug reporting form <noreply@postgresql.org>) |
| Responses |
Re: BUG #19109: catalog lookup with the wrong snapshot during logical decoding causes coredump
|
| List | pgsql-bugs |
Hi Haiyang,
On Tue, Nov 11, 2025 at 12:53 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference: 19109
> Logged by: haiyang li
> Email address: mohen.lhy@alibaba-inc.com
> PostgreSQL version: 18.0
> Operating system: any
> Description:
>
> Hi all,
> I got some similar core-dump files when running logical decoding test in
> PostgreSQL 14.19 (2
> times in debug mode) and PostgreSQL 18.0(1 time in release mode). All the
> provided information
> is based on 14.19 due to more information about the core-dump.
>
> The stack trace is as follows:
> ```
> Program terminated with signal SIGSEGV, Segmentation fault.
> ...
> #2 <signal handler called>
> #3 0x00000000007f41c6 in nocachegetattr (tuple=0x6ddd578, attnum=3,
> tupleDesc=0x7fb4586e8e48) at heaptuple.c:698
> #4 0x00007fb4589d579d in tuple_to_stringinfo (s=0x2fb0bb0,
> tupdesc=0x7fb4586e8e48, tuple=0x6ddd578, skip_nulls=false) at
> test_decoding.c:560
> #5 0x00007fb4589d5b2d in pg_decode_change (ctx=0x2fac888, txn=0x2fea668,
> relation=0x7fb4586db898, change=0x7145cf8) at test_decoding.c:645
> #6 0x0000000000e66be6 in change_cb_wrapper (cache=0x2fbc8f8, txn=0x2fea668,
> relation=0x7fb4586db898, change=0x7145cf8) at logical.c:1076
> #7 0x0000000000e72d2a in ReorderBufferApplyChange (rb=0x2fbc8f8,
> txn=0x2fea668, relation=0x7fb4586db898, change=0x7145cf8, streaming=false)
> at reorderbuffer.c:2020
> #8 0x0000000000e73712 in ReorderBufferProcessTXN (rb=0x2fbc8f8,
> txn=0x2fea668, commit_lsn=36675124736, snapshot_now=0x2fc4580, command_id=0,
> streaming=false) at reorderbuffer.c:2319
> #9 0x0000000000e743e8 in ReorderBufferReplay (txn=0x2fea668, rb=0x2fbc8f8,
> xid=10798, commit_lsn=36675124736, end_lsn=36675124784,
> commit_time=815706623738566, origin_id=0, origin_lsn=0) at
> reorderbuffer.c:2773
> #10 0x0000000000e74466 in ReorderBufferCommit (rb=0x2fbc8f8, xid=10798,
> commit_lsn=36675124736, end_lsn=36675124784, commit_time=815706623738566,
> origin_id=0, origin_lsn=0) at reorderbuffer.c:2797
> #11 0x0000000000e614fb in DecodeCommit (ctx=0x2fac888, buf=0x7ffd647e7ea0,
> parsed=0x7ffd647e7d40, xid=10798, two_phase=false) at decode.c:848
> #12 0x0000000000e605b2 in DecodeXactOp (ctx=0x2fac888, buf=0x7ffd647e7ea0)
> at decode.c:303
> #13 0x0000000000e600d1 in LogicalDecodingProcessRecord (ctx=0x2fac888,
> record=0x2facc48) at decode.c:144
> #14 0x0000000000e69981 in pg_logical_slot_get_changes_guts
> (fcinfo=0x2fa0808, confirm=true, binary=false) at logicalfuncs.c:296
> #15 0x0000000000e69b88 in pg_logical_slot_get_changes (fcinfo=0x2fa0808) at
> logicalfuncs.c:365
> ...
> #35 0x0000000000f8df2b in exec_simple_query (query_string=0x2d732e8 "SELECT
> * FROM pg_logical_slot_get_changes("test_logical_decode_slot_0", NULL, NULL,
> "stream-changes", "on")") at postgres.c:1360
> ...
>
> (gdb) f 3
> #3 0x00000000007f41c6 in nocachegetattr (tuple=0x6ddd578, attnum=3,
> tupleDesc=0x7fb4586e8e48) at heaptuple.c:698
> 698 off = att_align_pointer(off,
> att->attalign, -1,
> (gdb) p tupleDesc->attrs[0]
> $3 = {attrelid = 35727, attname = {data = "id", '\000' <repeats 61 times>},
> atttypid = 23, attstattarget = -1, attlen = 4, attnum = 1, attndims = 0,
> attcacheoff = 0, atttypmod = -1, attbyval = true, attalign = 105 'i',
> attstorage = 112 'p', attcompression = 0 '\000', attnotnull = true,
> atthasdef = false,
> atthasmissing = false, attidentity = 0 '\000', attgenerated = 0 '\000',
> attisdropped = false, attislocal = true, attinhcount = 0, attcollation = 0,
> attisinvisible = false}
> (gdb) p tupleDesc->attrs[1]
> $4 = {attrelid = 35727, attname = {data = "unique1", '\000' <repeats 56
> times>}, atttypid = 23, attstattarget = -1, attlen = 4, attnum = 2, attndims
> = 0,
> attcacheoff = 4, atttypmod = -1, attbyval = true, attalign = 105 'i',
> attstorage = 112 'p', attcompression = 0 '\000', attnotnull = false,
> atthasdef = false,
> atthasmissing = false, attidentity = 0 '\000', attgenerated = 0 '\000',
> attisdropped = false, attislocal = true, attinhcount = 0, attcollation = 0,
> attisinvisible = false}
> (gdb) p tupleDesc->attrs[2]
> $5 = {attrelid = 35727, attname = {data = "hundred", '\000' <repeats 56
> times>}, atttypid = 1700, attstattarget = -1, attlen = -1, attnum = 3,
> attndims = 0,
> attcacheoff = 8, atttypmod = -1, attbyval = false, attalign = 105 'i',
> attstorage = 120 'x', attcompression = 0 '\000', attnotnull = false,
> atthasdef = false,
> atthasmissing = false, attidentity = 0 '\000', attgenerated = 0 '\000',
> attisdropped = false, attislocal = true, attinhcount = 0, attcollation = 0,
> attisinvisible = false}
>
> (gdb) f 7
> #7 0x0000000000e72d2a in ReorderBufferApplyChange (rb=0x2fbc8f8,
> txn=0x2fea668, relation=0x7fb4586db898, change=0x7145cf8, streaming=false)
> at reorderbuffer.c:2020
> 2020 rb->apply_change(rb, txn, relation, change);
> (gdb) p change->lsn
> $16 = 36675122736
>
> (gdb) f 8
> #8 0x0000000000e73712 in ReorderBufferProcessTXN (rb=0x2fbc8f8,
> txn=0x2fea668, commit_lsn=36675124736, snapshot_now=0x2fc4580, command_id=0,
> streaming=false)
> at reorderbuffer.c:2319
> 2319
> ReorderBufferApplyChange(rb, txn, relation, change,
> (gdb) p *snapshot_now->xip@20
> $17 = {10713, 10731, 10744, 10745, 10746, 10747, 10748, 10749, 10753, 10754,
> 10757, 10758, 10766, 10770, 10771, 10773, 10785, 10786, 10789, 10790}
> (gdb) p *snapshot_now
> $18 = {snapshot_type = SNAPSHOT_HISTORIC_MVCC, xmin = 10713, xmax = 10791,
> xip = 0x2fc4658, xcnt = 20, subxip = 0x0, subxcnt = 0, suboverflowed =
> false,
> takenDuringRecovery = false, copied = false, curcid = 0, speculativeToken
> = 0, vistest = 0x0, active_count = 5, regd_count = 0
> ...
> ```
>
> After further investigation, I found that the immediate cause of the crash
> was that the unique1
> column is actually of type numeric (atttypid = 1700), but here it was
> mistakenly treated as
> int4 (atttypid = 23). This type mismatch led to a segmentation fault when
> retrieving the data.
> (After restoring from backup to the next LSN after txn 10798 commit
> (8/8A01FE30), I was able to
> confirm that the unique1 column is indeed of type numeric.)
>
> logical slot information:
> -[ RECORD 1 ]-------+---------------------------
> slot_name | test_logical_decode_slot_0
> plugin | test_decoding
> slot_type | logical
> datoid | 33795
> database | ddl_full0
> temporary | f
> active | f
> active_pid |
> xmin |
> catalog_xmin | 10713
> restart_lsn | 8/3C425D18
> confirmed_flush_lsn | 8/7B80FBA0
> wal_status | extended
> safe_wal_size |
> two_phase | f
>
> Between txn 10713 and 10798, txn 10716 and 10722 successfully executed the
> column type change SQL:
> ALTER TABLE IF EXISTS t1 ALTER COLUMN unique1 SET DATA TYPE NUMERIC;
>
> Relevant WAL records:
> ```
> -- 10716
> rmgr: Standby len (rec/tot): 42/ 42, tx: 10716, lsn:
> 8/4259E558, prev 8/4259C6C0, desc: LOCK xid 10716 db 33795 rel 35727
> rmgr: Heap2 len (rec/tot): 60/ 60, tx: 10716, lsn:
> 8/4259E588, prev 8/4259E558, desc: NEW_CID rel 1663/33795/1249; tid 84/1;
> cmin: 4294967295, cmax: 0, combo: 4294967295
> rmgr: Heap2 len (rec/tot): 60/ 60, tx: 10716, lsn:
> 8/4259E5C8, prev 8/4259E588, desc: NEW_CID rel 1663/33795/1249; tid 84/2;
> cmin: 0, cmax: 4294967295, combo: 4294967295
> rmgr: Heap len (rec/tot): 105/ 105, tx: 10716, lsn:
> 8/4259E608, prev 8/4259E5C8, desc: HOT_UPDATE off 1 xmax 10716 flags 0x60 ;
> new off 2 xmax 0, blkref #0: rel 1663/33795/1249 blk 84
> rmgr: Heap2 len (rec/tot): 60/ 60, tx: 10716, lsn:
> 8/4259E678, prev 8/4259E608, desc: NEW_CID rel 1663/33795/2619; tid 6/24;
> cmin: 4294967295, cmax: 0, combo: 4294967295
> rmgr: Heap len (rec/tot): 54/ 54, tx: 10716, lsn:
> 8/4259E6B8, prev 8/4259E678, desc: DELETE off 24 flags 0x00 KEYS_UPDATED ,
> blkref #0: rel 1663/33795/2619 blk 6
> ...
> rmgr: Transaction len (rec/tot): 4505/ 4505, tx: 10716, lsn:
> 8/4323DB78, prev 8/4323BCE0, desc: COMMIT 2025-11-06 01:10:08.196860 UTC;
> rels: base/33795/1632572 base/33795/1632575 base/33795/1632576
> base/33795/1632606 base/33795/1632604 base/33795/1632603 base/33795/1632602
> base/33795/1632601; inval msgs: catcache 51 catcache 50 catcache 51 catcache
> 50 catcache 50 catcache 51 catcache 50 catcache 50 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 51 catcache
> 50 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 51 catcache 50 catcache 32 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 51 catcache 50 catcache 76
> catcache 75 catcache 76 catcache 75 catcache 51 catcache 50 catcache 51
> catcache 50 catcache 51 catcache 50 catcache 51 catcache 50 catcache 51
> catcache 50 catcache 51 catcache 50 catcache 51 catcache 50 catcache 51
> catcache 50 catcache 51 catcache 50 catcache 51 catcache 50 catcache 51
> catcache 50 catcache 51 catcache 50 catcache 51 catcache 50 catcache 51
> catcache 50 catcache 51 catcache 50 catcache 51 catcache 50 catcache 51
> catcache 50 catcache 7 catcache 6 catcache 7 catcache 6 catcache 32 catcache
> 51 catcache 50 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 76 catcache
> 75 catcache 76 catcache 75 catcache 51 catcache 50 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 59 relcache
> 1640003 relcache 1640003 relcache 1640004 relcache 1640000 snapshot 2608
> snapshot 1214 relcache 1632575 snapshot 2608 relcache 1632576 relcache
> 1632575 snapshot 2608 snapshot 2608 snapshot 2608 relcache 35727 relcache
> 1622119 relcache 1622119 relcache 35727 relcache 1606414 relcache 1606414
> relcache 35727 relcache 1545373 relcache 1545373 relcache 35727 relcache
> 1537952 relcache 1537952 relcache 35727 relcache 35733 relcache 35733
> relcache 35727 relcache 1640000 snapshot 2608 relcache 1640000 snapshot 2608
> relcache 1640003 relcache 1640004 relcache 1640004 relcache 1640003 snapshot
> 2608 relcache 1640003 snapshot 2608 relcache 1640000 snapshot 1214 relcache
> 35727
>
> -- 10722
> rmgr: Standby len (rec/tot): 42/ 42, tx: 10722, lsn:
> 8/4946A640, prev 8/4946A468, desc: LOCK xid 10722 db 33795 rel 35727
> rmgr: Heap2 len (rec/tot): 60/ 60, tx: 10722, lsn:
> 8/4946A670, prev 8/4946A640, desc: NEW_CID rel 1663/33795/1249; tid 84/2;
> cmin: 4294967295, cmax: 0, combo: 4294967295
> rmgr: Heap2 len (rec/tot): 60/ 60, tx: 10722, lsn:
> 8/4946A6B0, prev 8/4946A670, desc: NEW_CID rel 1663/33795/1249; tid 84/6;
> cmin: 0, cmax: 4294967295, combo: 4294967295
> rmgr: Heap len (rec/tot): 76/ 76, tx: 10722, lsn:
> 8/4946A6F0, prev 8/4946A6B0, desc: HOT_UPDATE off 2 xmax 10722 flags 0x20 ;
> new off 6 xmax 0, blkref #0: rel 1663/33795/1249 blk 84
> rmgr: Transaction len (rec/tot): 78/ 78, tx: 10722, lsn:
> 8/4946A740, prev 8/4946A6F0, desc: INVALIDATION ; inval msgs: catcache 7
> catcache 6 relcache 35727
> rmgr: Transaction len (rec/tot): 98/ 98, tx: 10722, lsn:
> 8/4946A790, prev 8/4946A740, desc: COMMIT 2025-11-06 01:10:09.687549 UTC;
> inval msgs: catcache 7 catcache 6 relcache 35727
>
> -- 10798
> rmgr: Heap len (rec/tot): 1211/ 1211, tx: 10798, lsn:
> 8/8A01F630, prev 8/8A01F560, desc: INSERT off 11 flags 0x08, blkref #0: rel
> 1663/33795/1677447 blk 0
> rmgr: Btree len (rec/tot): 64/ 64, tx: 10798, lsn:
> 8/8A01FAF0, prev 8/8A01F630, desc: INSERT_LEAF off 5, blkref #0: rel
> 1663/33795/1695353 blk 1
> rmgr: Gin len (rec/tot): 174/ 174, tx: 10798, lsn:
> 8/8A01FB30, prev 8/8A01FAF0, desc: UPDATE_META_PAGE , blkref #0: rel
> 1663/33795/1695354 blk 0, blkref #1: rel 1663/33795/1695354 blk 2
> rmgr: Btree len (rec/tot): 64/ 64, tx: 10798, lsn:
> 8/8A01FBE0, prev 8/8A01FB30, desc: INSERT_LEAF off 5, blkref #0: rel
> 1663/33795/1695355 blk 1
> rmgr: Hash len (rec/tot): 72/ 72, tx: 10798, lsn:
> 8/8A01FC20, prev 8/8A01FBE0, desc: INSERT off 6, blkref #0: rel
> 1663/33795/1695356 blk 2, blkref #1: rel 1663/33795/1695356 blk 0
> rmgr: Gist len (rec/tot): 322/ 322, tx: 10798, lsn:
> 8/8A01FC68, prev 8/8A01FC20, desc: PAGE_UPDATE , blkref #0: rel
> 1663/33795/1695357 blk 0
> rmgr: Btree len (rec/tot): 80/ 80, tx: 10798, lsn:
> 8/8A01FDB0, prev 8/8A01FC68, desc: INSERT_LEAF off 3, blkref #0: rel
> 1663/33795/1695358 blk 2
> rmgr: Transaction len (rec/tot): 46/ 46, tx: 10798, lsn:
> 8/8A01FE00, prev 8/8A01FDB0, desc: COMMIT 2025-11-06 01:10:23.738566 UTC
>
> -- ALL RUNING_XACTS WAL RECORD
> rmgr: Standby len (rec/tot): 58/ 58, tx: 0, lsn:
> 8/3C425D18, prev 8/3C425CD8, desc: RUNNING_XACTS nextXid 10674
> latestCompletedXid 10673 oldestRunningXid 10639; 2 xacts: 10639 10664
> ...
> rmgr: Standby len (rec/tot): 62/ 62, tx: 0, lsn:
> 8/66737378, prev 8/66736B80, desc: RUNNING_XACTS nextXid 10742
> latestCompletedXid 10741 oldestRunningXid 10713; 3 xacts: 10731 10713 10729
> ...
> rmgr: Standby len (rec/tot): 66/ 66, tx: 0, lsn:
> 8/89C23500, prev 8/89C234A0, desc: RUNNING_XACTS nextXid 10798
> latestCompletedXid 10797 oldestRunningXid 10767; 4 xacts: 10780 10769 10795
> 10767
> ```
>
> From the snapshot_now information for txn 10798, txn 10716 and 10722 are not
> present in xip,
> which appears unexpected.
Thanks for your report and analysis!
> I have not yet figured out the root cause and will continue analyzing.
> If you need the complete WAL log from 8/3C425D18 to 8/8A01FE30, please let
> me know — I can provide it.
That would be helpful. Could you also share the test or script that
reproduces the core dump?
pgsql-bugs by date: