BUG #19109: catalog lookup with the wrong snapshot during logical decoding causes coredump - Mailing list pgsql-bugs
| From | PG Bug reporting form |
|---|---|
| Subject | BUG #19109: catalog lookup with the wrong snapshot during logical decoding causes coredump |
| Date | |
| Msg-id | 19109-4a9ce36715d69632@postgresql.org Whole thread Raw |
| List | pgsql-bugs |
The following bug has been logged on the website:
Bug reference: 19109
Logged by: haiyang li
Email address: mohen.lhy@alibaba-inc.com
PostgreSQL version: 18.0
Operating system: any
Description:
Hi all,
I got some similar core-dump files when running logical decoding test in
PostgreSQL 14.19 (2
times in debug mode) and PostgreSQL 18.0(1 time in release mode). All the
provided information
is based on 14.19 due to more information about the core-dump.
The stack trace is as follows:
```
Program terminated with signal SIGSEGV, Segmentation fault.
...
#2 <signal handler called>
#3 0x00000000007f41c6 in nocachegetattr (tuple=0x6ddd578, attnum=3,
tupleDesc=0x7fb4586e8e48) at heaptuple.c:698
#4 0x00007fb4589d579d in tuple_to_stringinfo (s=0x2fb0bb0,
tupdesc=0x7fb4586e8e48, tuple=0x6ddd578, skip_nulls=false) at
test_decoding.c:560
#5 0x00007fb4589d5b2d in pg_decode_change (ctx=0x2fac888, txn=0x2fea668,
relation=0x7fb4586db898, change=0x7145cf8) at test_decoding.c:645
#6 0x0000000000e66be6 in change_cb_wrapper (cache=0x2fbc8f8, txn=0x2fea668,
relation=0x7fb4586db898, change=0x7145cf8) at logical.c:1076
#7 0x0000000000e72d2a in ReorderBufferApplyChange (rb=0x2fbc8f8,
txn=0x2fea668, relation=0x7fb4586db898, change=0x7145cf8, streaming=false)
at reorderbuffer.c:2020
#8 0x0000000000e73712 in ReorderBufferProcessTXN (rb=0x2fbc8f8,
txn=0x2fea668, commit_lsn=36675124736, snapshot_now=0x2fc4580, command_id=0,
streaming=false) at reorderbuffer.c:2319
#9 0x0000000000e743e8 in ReorderBufferReplay (txn=0x2fea668, rb=0x2fbc8f8,
xid=10798, commit_lsn=36675124736, end_lsn=36675124784,
commit_time=815706623738566, origin_id=0, origin_lsn=0) at
reorderbuffer.c:2773
#10 0x0000000000e74466 in ReorderBufferCommit (rb=0x2fbc8f8, xid=10798,
commit_lsn=36675124736, end_lsn=36675124784, commit_time=815706623738566,
origin_id=0, origin_lsn=0) at reorderbuffer.c:2797
#11 0x0000000000e614fb in DecodeCommit (ctx=0x2fac888, buf=0x7ffd647e7ea0,
parsed=0x7ffd647e7d40, xid=10798, two_phase=false) at decode.c:848
#12 0x0000000000e605b2 in DecodeXactOp (ctx=0x2fac888, buf=0x7ffd647e7ea0)
at decode.c:303
#13 0x0000000000e600d1 in LogicalDecodingProcessRecord (ctx=0x2fac888,
record=0x2facc48) at decode.c:144
#14 0x0000000000e69981 in pg_logical_slot_get_changes_guts
(fcinfo=0x2fa0808, confirm=true, binary=false) at logicalfuncs.c:296
#15 0x0000000000e69b88 in pg_logical_slot_get_changes (fcinfo=0x2fa0808) at
logicalfuncs.c:365
...
#35 0x0000000000f8df2b in exec_simple_query (query_string=0x2d732e8 "SELECT
* FROM pg_logical_slot_get_changes("test_logical_decode_slot_0", NULL, NULL,
"stream-changes", "on")") at postgres.c:1360
...
(gdb) f 3
#3 0x00000000007f41c6 in nocachegetattr (tuple=0x6ddd578, attnum=3,
tupleDesc=0x7fb4586e8e48) at heaptuple.c:698
698 off = att_align_pointer(off,
att->attalign, -1,
(gdb) p tupleDesc->attrs[0]
$3 = {attrelid = 35727, attname = {data = "id", '\000' <repeats 61 times>},
atttypid = 23, attstattarget = -1, attlen = 4, attnum = 1, attndims = 0,
attcacheoff = 0, atttypmod = -1, attbyval = true, attalign = 105 'i',
attstorage = 112 'p', attcompression = 0 '\000', attnotnull = true,
atthasdef = false,
atthasmissing = false, attidentity = 0 '\000', attgenerated = 0 '\000',
attisdropped = false, attislocal = true, attinhcount = 0, attcollation = 0,
attisinvisible = false}
(gdb) p tupleDesc->attrs[1]
$4 = {attrelid = 35727, attname = {data = "unique1", '\000' <repeats 56
times>}, atttypid = 23, attstattarget = -1, attlen = 4, attnum = 2, attndims
= 0,
attcacheoff = 4, atttypmod = -1, attbyval = true, attalign = 105 'i',
attstorage = 112 'p', attcompression = 0 '\000', attnotnull = false,
atthasdef = false,
atthasmissing = false, attidentity = 0 '\000', attgenerated = 0 '\000',
attisdropped = false, attislocal = true, attinhcount = 0, attcollation = 0,
attisinvisible = false}
(gdb) p tupleDesc->attrs[2]
$5 = {attrelid = 35727, attname = {data = "hundred", '\000' <repeats 56
times>}, atttypid = 1700, attstattarget = -1, attlen = -1, attnum = 3,
attndims = 0,
attcacheoff = 8, atttypmod = -1, attbyval = false, attalign = 105 'i',
attstorage = 120 'x', attcompression = 0 '\000', attnotnull = false,
atthasdef = false,
atthasmissing = false, attidentity = 0 '\000', attgenerated = 0 '\000',
attisdropped = false, attislocal = true, attinhcount = 0, attcollation = 0,
attisinvisible = false}
(gdb) f 7
#7 0x0000000000e72d2a in ReorderBufferApplyChange (rb=0x2fbc8f8,
txn=0x2fea668, relation=0x7fb4586db898, change=0x7145cf8, streaming=false)
at reorderbuffer.c:2020
2020 rb->apply_change(rb, txn, relation, change);
(gdb) p change->lsn
$16 = 36675122736
(gdb) f 8
#8 0x0000000000e73712 in ReorderBufferProcessTXN (rb=0x2fbc8f8,
txn=0x2fea668, commit_lsn=36675124736, snapshot_now=0x2fc4580, command_id=0,
streaming=false)
at reorderbuffer.c:2319
2319
ReorderBufferApplyChange(rb, txn, relation, change,
(gdb) p *snapshot_now->xip@20
$17 = {10713, 10731, 10744, 10745, 10746, 10747, 10748, 10749, 10753, 10754,
10757, 10758, 10766, 10770, 10771, 10773, 10785, 10786, 10789, 10790}
(gdb) p *snapshot_now
$18 = {snapshot_type = SNAPSHOT_HISTORIC_MVCC, xmin = 10713, xmax = 10791,
xip = 0x2fc4658, xcnt = 20, subxip = 0x0, subxcnt = 0, suboverflowed =
false,
takenDuringRecovery = false, copied = false, curcid = 0, speculativeToken
= 0, vistest = 0x0, active_count = 5, regd_count = 0
...
```
After further investigation, I found that the immediate cause of the crash
was that the unique1
column is actually of type numeric (atttypid = 1700), but here it was
mistakenly treated as
int4 (atttypid = 23). This type mismatch led to a segmentation fault when
retrieving the data.
(After restoring from backup to the next LSN after txn 10798 commit
(8/8A01FE30), I was able to
confirm that the unique1 column is indeed of type numeric.)
logical slot information:
-[ RECORD 1 ]-------+---------------------------
slot_name | test_logical_decode_slot_0
plugin | test_decoding
slot_type | logical
datoid | 33795
database | ddl_full0
temporary | f
active | f
active_pid |
xmin |
catalog_xmin | 10713
restart_lsn | 8/3C425D18
confirmed_flush_lsn | 8/7B80FBA0
wal_status | extended
safe_wal_size |
two_phase | f
Between txn 10713 and 10798, txn 10716 and 10722 successfully executed the
column type change SQL:
ALTER TABLE IF EXISTS t1 ALTER COLUMN unique1 SET DATA TYPE NUMERIC;
Relevant WAL records:
```
-- 10716
rmgr: Standby len (rec/tot): 42/ 42, tx: 10716, lsn:
8/4259E558, prev 8/4259C6C0, desc: LOCK xid 10716 db 33795 rel 35727
rmgr: Heap2 len (rec/tot): 60/ 60, tx: 10716, lsn:
8/4259E588, prev 8/4259E558, desc: NEW_CID rel 1663/33795/1249; tid 84/1;
cmin: 4294967295, cmax: 0, combo: 4294967295
rmgr: Heap2 len (rec/tot): 60/ 60, tx: 10716, lsn:
8/4259E5C8, prev 8/4259E588, desc: NEW_CID rel 1663/33795/1249; tid 84/2;
cmin: 0, cmax: 4294967295, combo: 4294967295
rmgr: Heap len (rec/tot): 105/ 105, tx: 10716, lsn:
8/4259E608, prev 8/4259E5C8, desc: HOT_UPDATE off 1 xmax 10716 flags 0x60 ;
new off 2 xmax 0, blkref #0: rel 1663/33795/1249 blk 84
rmgr: Heap2 len (rec/tot): 60/ 60, tx: 10716, lsn:
8/4259E678, prev 8/4259E608, desc: NEW_CID rel 1663/33795/2619; tid 6/24;
cmin: 4294967295, cmax: 0, combo: 4294967295
rmgr: Heap len (rec/tot): 54/ 54, tx: 10716, lsn:
8/4259E6B8, prev 8/4259E678, desc: DELETE off 24 flags 0x00 KEYS_UPDATED ,
blkref #0: rel 1663/33795/2619 blk 6
...
rmgr: Transaction len (rec/tot): 4505/ 4505, tx: 10716, lsn:
8/4323DB78, prev 8/4323BCE0, desc: COMMIT 2025-11-06 01:10:08.196860 UTC;
rels: base/33795/1632572 base/33795/1632575 base/33795/1632576
base/33795/1632606 base/33795/1632604 base/33795/1632603 base/33795/1632602
base/33795/1632601; inval msgs: catcache 51 catcache 50 catcache 51 catcache
50 catcache 50 catcache 51 catcache 50 catcache 50 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 51 catcache
50 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 51 catcache 50 catcache 32 catcache 7
catcache 6 catcache 7 catcache 6 catcache 51 catcache 50 catcache 76
catcache 75 catcache 76 catcache 75 catcache 51 catcache 50 catcache 51
catcache 50 catcache 51 catcache 50 catcache 51 catcache 50 catcache 51
catcache 50 catcache 51 catcache 50 catcache 51 catcache 50 catcache 51
catcache 50 catcache 51 catcache 50 catcache 51 catcache 50 catcache 51
catcache 50 catcache 51 catcache 50 catcache 51 catcache 50 catcache 51
catcache 50 catcache 51 catcache 50 catcache 51 catcache 50 catcache 51
catcache 50 catcache 7 catcache 6 catcache 7 catcache 6 catcache 32 catcache
51 catcache 50 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 76 catcache
75 catcache 76 catcache 75 catcache 51 catcache 50 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 59 relcache
1640003 relcache 1640003 relcache 1640004 relcache 1640000 snapshot 2608
snapshot 1214 relcache 1632575 snapshot 2608 relcache 1632576 relcache
1632575 snapshot 2608 snapshot 2608 snapshot 2608 relcache 35727 relcache
1622119 relcache 1622119 relcache 35727 relcache 1606414 relcache 1606414
relcache 35727 relcache 1545373 relcache 1545373 relcache 35727 relcache
1537952 relcache 1537952 relcache 35727 relcache 35733 relcache 35733
relcache 35727 relcache 1640000 snapshot 2608 relcache 1640000 snapshot 2608
relcache 1640003 relcache 1640004 relcache 1640004 relcache 1640003 snapshot
2608 relcache 1640003 snapshot 2608 relcache 1640000 snapshot 1214 relcache
35727
-- 10722
rmgr: Standby len (rec/tot): 42/ 42, tx: 10722, lsn:
8/4946A640, prev 8/4946A468, desc: LOCK xid 10722 db 33795 rel 35727
rmgr: Heap2 len (rec/tot): 60/ 60, tx: 10722, lsn:
8/4946A670, prev 8/4946A640, desc: NEW_CID rel 1663/33795/1249; tid 84/2;
cmin: 4294967295, cmax: 0, combo: 4294967295
rmgr: Heap2 len (rec/tot): 60/ 60, tx: 10722, lsn:
8/4946A6B0, prev 8/4946A670, desc: NEW_CID rel 1663/33795/1249; tid 84/6;
cmin: 0, cmax: 4294967295, combo: 4294967295
rmgr: Heap len (rec/tot): 76/ 76, tx: 10722, lsn:
8/4946A6F0, prev 8/4946A6B0, desc: HOT_UPDATE off 2 xmax 10722 flags 0x20 ;
new off 6 xmax 0, blkref #0: rel 1663/33795/1249 blk 84
rmgr: Transaction len (rec/tot): 78/ 78, tx: 10722, lsn:
8/4946A740, prev 8/4946A6F0, desc: INVALIDATION ; inval msgs: catcache 7
catcache 6 relcache 35727
rmgr: Transaction len (rec/tot): 98/ 98, tx: 10722, lsn:
8/4946A790, prev 8/4946A740, desc: COMMIT 2025-11-06 01:10:09.687549 UTC;
inval msgs: catcache 7 catcache 6 relcache 35727
-- 10798
rmgr: Heap len (rec/tot): 1211/ 1211, tx: 10798, lsn:
8/8A01F630, prev 8/8A01F560, desc: INSERT off 11 flags 0x08, blkref #0: rel
1663/33795/1677447 blk 0
rmgr: Btree len (rec/tot): 64/ 64, tx: 10798, lsn:
8/8A01FAF0, prev 8/8A01F630, desc: INSERT_LEAF off 5, blkref #0: rel
1663/33795/1695353 blk 1
rmgr: Gin len (rec/tot): 174/ 174, tx: 10798, lsn:
8/8A01FB30, prev 8/8A01FAF0, desc: UPDATE_META_PAGE , blkref #0: rel
1663/33795/1695354 blk 0, blkref #1: rel 1663/33795/1695354 blk 2
rmgr: Btree len (rec/tot): 64/ 64, tx: 10798, lsn:
8/8A01FBE0, prev 8/8A01FB30, desc: INSERT_LEAF off 5, blkref #0: rel
1663/33795/1695355 blk 1
rmgr: Hash len (rec/tot): 72/ 72, tx: 10798, lsn:
8/8A01FC20, prev 8/8A01FBE0, desc: INSERT off 6, blkref #0: rel
1663/33795/1695356 blk 2, blkref #1: rel 1663/33795/1695356 blk 0
rmgr: Gist len (rec/tot): 322/ 322, tx: 10798, lsn:
8/8A01FC68, prev 8/8A01FC20, desc: PAGE_UPDATE , blkref #0: rel
1663/33795/1695357 blk 0
rmgr: Btree len (rec/tot): 80/ 80, tx: 10798, lsn:
8/8A01FDB0, prev 8/8A01FC68, desc: INSERT_LEAF off 3, blkref #0: rel
1663/33795/1695358 blk 2
rmgr: Transaction len (rec/tot): 46/ 46, tx: 10798, lsn:
8/8A01FE00, prev 8/8A01FDB0, desc: COMMIT 2025-11-06 01:10:23.738566 UTC
-- ALL RUNING_XACTS WAL RECORD
rmgr: Standby len (rec/tot): 58/ 58, tx: 0, lsn:
8/3C425D18, prev 8/3C425CD8, desc: RUNNING_XACTS nextXid 10674
latestCompletedXid 10673 oldestRunningXid 10639; 2 xacts: 10639 10664
...
rmgr: Standby len (rec/tot): 62/ 62, tx: 0, lsn:
8/66737378, prev 8/66736B80, desc: RUNNING_XACTS nextXid 10742
latestCompletedXid 10741 oldestRunningXid 10713; 3 xacts: 10731 10713 10729
...
rmgr: Standby len (rec/tot): 66/ 66, tx: 0, lsn:
8/89C23500, prev 8/89C234A0, desc: RUNNING_XACTS nextXid 10798
latestCompletedXid 10797 oldestRunningXid 10767; 4 xacts: 10780 10769 10795
10767
```
From the snapshot_now information for txn 10798, txn 10716 and 10722 are not
present in xip,
which appears unexpected.
I have not yet figured out the root cause and will continue analyzing.
If you need the complete WAL log from 8/3C425D18 to 8/8A01FE30, please let
me know — I can provide it.
regards,
Haiyang Li
pgsql-bugs by date: