Thread: PANIC in heap_delete during ALTER TABLE

PANIC in heap_delete during ALTER TABLE

From
Robins Tharakan
Date:
Hi,

During ALTER TABLE, heap_delete() can PANIC with the error
"PANIC:  wrong buffer passed to visibilitymap_clear".

Given below are backtraces on master, REL_15_STABLE and REL_13_STABLE.
There was a similar discussion last year [1], although that was about
heap_update() and was soon fixed [2]; thus reporting this as a separate
thread.


Here are 4 concurrent PANICs running ALTER TABLE on separate tables
(on 1509abe2c5@master):

=== Error Log (Shows all 4 PIDs above) ===
2022-08-20 12:37:53.937 UTC [3739799] PANIC:  wrong buffer passed to
visibilitymap_clear
2022-08-20 12:37:53.937 UTC [3739799] STATEMENT:  ALTER TABLE ONLY sch12.bool
   ALTER COLUMN b SET DATA TYPE  TEXT  USING
public.postgres_fdw_disconnect_all()
2022-08-20 12:37:53.937 UTC [3737759] PANIC:  wrong buffer passed to
visibilitymap_clear
2022-08-20 12:37:53.937 UTC [3737759] STATEMENT:  ALTER TABLE ONLY sch13.bool
   ALTER COLUMN b SET DATA TYPE  TEXT  USING pg_catalog.pg_rotate_logfile_old()
2022-08-20 12:37:53.941 UTC [3740312] PANIC:  wrong buffer passed to
visibilitymap_clear
2022-08-20 12:37:53.941 UTC [3740312] STATEMENT:  ALTER TABLE ONLY sch2.flot
   ALTER COLUMN f SET DATA TYPE  TEXT  USING pg_catalog.ceiling(
    cast(sch2.flot.f as float8))
2022-08-20 12:37:53.945 UTC [3737305] PANIC:  wrong buffer passed to
visibilitymap_clear
2022-08-20 12:37:53.945 UTC [3737305] STATEMENT:  ALTER TABLE ONLY sch27.chr
   ALTER COLUMN c SET DATA TYPE  TEXT  USING sch27.chr.c


=== Backtrace - PID 3737759 - 1509abe2c5@master ===
#3  0x0000556a0e4a669f in visibilitymap_clear (rel=0x7fcec5fba700,
    heapBlk=10, buf=0, flags=3 '\003') at visibilitymap.c:156
#4  0x0000556a0e48401a in heap_delete (relation=0x7fcec5fba700,
    tid=0x7fcec5d50f3c, cid=1, crosscheck=0x0, wait=true,
    tmfd=0x7ffe1987e530, changingPart=false) at heapam.c:2931
#5  0x0000556a0e484567 in simple_heap_delete (relation=0x7fcec5fba700,
    tid=0x7fcec5d50f3c) at heapam.c:3078
#6  0x0000556a0e55d566 in CatalogTupleDelete (heapRel=0x7fcec5fba700,
    tid=0x7fcec5d50f3c) at indexing.c:352
#7  0x0000556a0e559bc7 in index_drop (indexId=20788, concurrent=false,
    concurrent_lock_mode=false) at index.c:2349
#8  0x0000556a0e54bb3e in doDeletion (object=0x556a120d32b0, flags=1)
    at dependency.c:1408
#9  0x0000556a0e54b88b in deleteOneObject (object=0x556a120d32b0,
    depRel=0x7ffe1987e7e0, flags=1) at dependency.c:1317
#10 0x0000556a0e549ff1 in deleteObjectsInList (targetObjects=0x556a0fd884e0,
    depRel=0x7ffe1987e7e0, flags=1) at dependency.c:278
#11 0x0000556a0e54a1fe in performMultipleDeletions (objects=0x556a0fed5688,
    behavior=DROP_RESTRICT, flags=1) at dependency.c:439
#12 0x0000556a0e6a0d4d in ATPostAlterTypeCleanup (wqueue=0x7ffe1987e9a8,
    tab=0x556a11fcc8f8, lockmode=8) at tablecmds.c:13196
#13 0x0000556a0e68dbc6 in ATRewriteCatalogs (wqueue=0x7ffe1987e9a8,
    lockmode=8, context=0x7ffe1987eba0) at tablecmds.c:4888
#14 0x0000556a0e68cda1 in ATController (parsetree=0x556a0f945e88,
    rel=0x7fcec5ef9e70, cmds=0x556a0f945e50, recurse=false, lockmode=8,
    context=0x7ffe1987eba0) at tablecmds.c:4455
#15 0x0000556a0e68c9b6 in AlterTable (stmt=0x556a0f945e88, lockmode=8,
    context=0x7ffe1987eba0) at tablecmds.c:4101
#16 0x0000556a0e995140 in ProcessUtilitySlow (pstate=0x556a11fcc7e0,
    pstmt=0x556a0f946198,
    queryString=0x556a0f944fb0 "ALTER TABLE ONLY sch13.bool\n   ALTER
COLUMN b SET DATA TYPE  TEXT  USING
pg_catalog.pg_rotate_logfile_old()",
    context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
    dest=0x556a0f946268, qc=0x7ffe1987f4e0) at utility.c:1325


=== Backtrace FULL - PID 3737759 - 1509abe2c5@master ===
#2  0x0000556a0eb5246d in errfinish (
    filename=0x556a0ebeb0a8 "visibilitymap.c", lineno=156,
    funcname=0x556a0ebeb250 <__func__.12780> "visibilitymap_clear")
    at elog.c:675
        edata = 0x556a0eeed2c0 <errordata>
        elevel = 23
        oldcontext = 0x556a11fcc6c0
        econtext = 0x0
        __func__ = "errfinish"
#3  0x0000556a0e4a669f in visibilitymap_clear (rel=0x7fcec5fba700,
    heapBlk=10, buf=0, flags=3 '\003') at visibilitymap.c:156
        __errno_location = <optimized out>
        mapBlock = 0
        mapByte = 2
        mapOffset = 4
        mask = 48 '0'
        map = 0x7fcec71f4680 "\177\006"
        cleared = false
        __func__ = "visibilitymap_clear"
#4  0x0000556a0e48401a in heap_delete (relation=0x7fcec5fba700,
    tid=0x7fcec5d50f3c, cid=1, crosscheck=0x0, wait=true,
    tmfd=0x7ffe1987e530, changingPart=false) at heapam.c:2931
        result = TM_Ok
        xid = 84612
        lp = 0x7fcecb11f1dc
        tp = {t_len = 166, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 10},
            ip_posid = 18}, t_tableOid = 2610, t_data = 0x7fcecb1202a0}
        page = 0x7fcecb11f180 "\036"
        block = 10
        buffer = 8021
        vmbuffer = 0
        new_xmax = 84612
        new_infomask = 0
        new_infomask2 = 8192
        have_tuple_lock = true
        iscombo = false
        all_visible_cleared = true
        old_key_tuple = 0x0
        old_key_copied = false
        __func__ = "heap_delete"



=== Backtrace - PID 3740312 - 1509abe2c5@master ===
#3  0x0000556a0e4a669f in visibilitymap_clear (rel=0x7fcec5d9eda8, heapBlk=6,
    buf=0, flags=3 '\003') at visibilitymap.c:156
#4  0x0000556a0e48401a in heap_delete (relation=0x7fcec5d9eda8,
    tid=0x7fcec5da102c, cid=1, crosscheck=0x0, wait=true,
    tmfd=0x7ffe1987e530, changingPart=false) at heapam.c:2931
#5  0x0000556a0e484567 in simple_heap_delete (relation=0x7fcec5d9eda8,
    tid=0x7fcec5da102c) at heapam.c:3078
#6  0x0000556a0e55d566 in CatalogTupleDelete (heapRel=0x7fcec5d9eda8,
    tid=0x7fcec5da102c) at indexing.c:352
#7  0x0000556a0e559bc7 in index_drop (indexId=20606, concurrent=false,
    concurrent_lock_mode=false) at index.c:2349
#8  0x0000556a0e54bb3e in doDeletion (object=0x556a108fadf0, flags=1)
    at dependency.c:1408
#9  0x0000556a0e54b88b in deleteOneObject (object=0x556a108fadf0,
    depRel=0x7ffe1987e7e0, flags=1) at dependency.c:1317
#10 0x0000556a0e549ff1 in deleteObjectsInList (targetObjects=0x556a121823d0,
    depRel=0x7ffe1987e7e0, flags=1) at dependency.c:278
#11 0x0000556a0e54a1fe in performMultipleDeletions (objects=0x556a108f9058,
    behavior=DROP_RESTRICT, flags=1) at dependency.c:439
#12 0x0000556a0e6a0d4d in ATPostAlterTypeCleanup (wqueue=0x7ffe1987e9a8,
    tab=0x556a1022c128, lockmode=8) at tablecmds.c:13196
#13 0x0000556a0e68dbc6 in ATRewriteCatalogs (wqueue=0x7ffe1987e9a8,
    lockmode=8, context=0x7ffe1987eba0) at tablecmds.c:4888
#14 0x0000556a0e68cda1 in ATController (parsetree=0x556a0f946138,
    rel=0x556a0fe345f0, cmds=0x556a0f946100, recurse=false, lockmode=8,
    context=0x7ffe1987eba0) at tablecmds.c:4455
#15 0x0000556a0e68c9b6 in AlterTable (stmt=0x556a0f946138, lockmode=8,
    context=0x7ffe1987eba0) at tablecmds.c:4101
#16 0x0000556a0e995140 in ProcessUtilitySlow (pstate=0x556a1022c010,
    pstmt=0x556a0f946448,
    queryString=0x556a0f944fb0 "ALTER TABLE ONLY sch2.flot\n   ALTER
COLUMN f SET DATA TYPE  TEXT  USING pg_catalog.ceiling(\n
cast(sch2.flot.f as float8))", context=PROCESS_UTILITY_TOPLEVEL,
params=0x0, queryEnv=0x0,
    dest=0x556a0f946518, qc=0x7ffe1987f4e0) at utility.c:1325

=== Backtrace - PID 3739799 - 1509abe2c5@master ===
#3  0x0000556a0e4a669f in visibilitymap_clear (rel=0x556a0fe00490, heapBlk=8,
    buf=0, flags=3 '\003') at visibilitymap.c:156
#4  0x0000556a0e48401a in heap_delete (relation=0x556a0fe00490,
    tid=0x556a10886dcc, cid=1, crosscheck=0x0, wait=true,
    tmfd=0x7ffe1987e530, changingPart=false) at heapam.c:2931
#5  0x0000556a0e484567 in simple_heap_delete (relation=0x556a0fe00490,
    tid=0x556a10886dcc) at heapam.c:3078
#6  0x0000556a0e55d566 in CatalogTupleDelete (heapRel=0x556a0fe00490,
    tid=0x556a10886dcc) at indexing.c:352
#7  0x0000556a0e559bc7 in index_drop (indexId=20720, concurrent=false,
    concurrent_lock_mode=false) at index.c:2349
#8  0x0000556a0e54bb3e in doDeletion (object=0x556a12093570, flags=1)
    at dependency.c:1408
#9  0x0000556a0e54b88b in deleteOneObject (object=0x556a12093570,
    depRel=0x7ffe1987e7e0, flags=1) at dependency.c:1317
#10 0x0000556a0e549ff1 in deleteObjectsInList (targetObjects=0x556a0fc0f160,
    depRel=0x7ffe1987e7e0, flags=1) at dependency.c:278
#11 0x0000556a0e54a1fe in performMultipleDeletions (objects=0x556a103286c8,
    behavior=DROP_RESTRICT, flags=1) at dependency.c:439
#12 0x0000556a0e6a0d4d in ATPostAlterTypeCleanup (wqueue=0x7ffe1987e9a8,
    tab=0x556a10c86428, lockmode=8) at tablecmds.c:13196
#13 0x0000556a0e68dbc6 in ATRewriteCatalogs (wqueue=0x7ffe1987e9a8,
    lockmode=8, context=0x7ffe1987eba0) at tablecmds.c:4888
#14 0x0000556a0e68cda1 in ATController (parsetree=0x556a0f945e80,
    rel=0x556a0fee4d08, cmds=0x556a0f945e48, recurse=false, lockmode=8,
    context=0x7ffe1987eba0) at tablecmds.c:4455
#15 0x0000556a0e68c9b6 in AlterTable (stmt=0x556a0f945e80, lockmode=8,
    context=0x7ffe1987eba0) at tablecmds.c:4101
#16 0x0000556a0e995140 in ProcessUtilitySlow (pstate=0x556a10c86310,
    pstmt=0x556a0f946190,
    queryString=0x556a0f944fb0 "ALTER TABLE ONLY sch12.bool\n   ALTER
COLUMN b SET DATA TYPE  TEXT  USING
public.postgres_fdw_disconnect_all()",
    context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
    dest=0x556a0f946260, qc=0x7ffe1987f4e0) at utility.c:1325

=== Backtrace - PID 3737305 - 1509abe2c5@master ===
#3  0x0000556a0e4a669f in visibilitymap_clear (rel=0x7fcec64529e0,
    heapBlk=14, buf=0, flags=3 '\003') at visibilitymap.c:156
#4  0x0000556a0e48401a in heap_delete (relation=0x7fcec64529e0,
    tid=0x7fcec6434d5c, cid=1, crosscheck=0x0, wait=true,
    tmfd=0x7ffe1987e530, changingPart=false) at heapam.c:2931
#5  0x0000556a0e484567 in simple_heap_delete (relation=0x7fcec64529e0,
    tid=0x7fcec6434d5c) at heapam.c:3078
#6  0x0000556a0e55d566 in CatalogTupleDelete (heapRel=0x7fcec64529e0,
    tid=0x7fcec6434d5c) at indexing.c:352
#7  0x0000556a0e559bc7 in index_drop (indexId=21557, concurrent=false,
    concurrent_lock_mode=false) at index.c:2349
#8  0x0000556a0e54bb3e in doDeletion (object=0x556a10f0d340, flags=1)
    at dependency.c:1408
#9  0x0000556a0e54b88b in deleteOneObject (object=0x556a10f0d340,
    depRel=0x7ffe1987e7e0, flags=1) at dependency.c:1317
#10 0x0000556a0e549ff1 in deleteObjectsInList (targetObjects=0x556a10703990,
    depRel=0x7ffe1987e7e0, flags=1) at dependency.c:278
#11 0x0000556a0e54a1fe in performMultipleDeletions (objects=0x556a0fc85878,
    behavior=DROP_RESTRICT, flags=1) at dependency.c:439
#12 0x0000556a0e6a0d4d in ATPostAlterTypeCleanup (wqueue=0x7ffe1987e9a8,
    tab=0x556a10bfc5b8, lockmode=8) at tablecmds.c:13196
#13 0x0000556a0e68dbc6 in ATRewriteCatalogs (wqueue=0x7ffe1987e9a8,
    lockmode=8, context=0x7ffe1987eba0) at tablecmds.c:4888
#14 0x0000556a0e68cda1 in ATController (parsetree=0x556a0f945ec8,
    rel=0x7fcec6443ab0, cmds=0x556a0f945e90, recurse=false, lockmode=8,
    context=0x7ffe1987eba0) at tablecmds.c:4455
#15 0x0000556a0e68c9b6 in AlterTable (stmt=0x556a0f945ec8, lockmode=8,
    context=0x7ffe1987eba0) at tablecmds.c:4101
#16 0x0000556a0e995140 in ProcessUtilitySlow (pstate=0x556a10bfc4a0,
    pstmt=0x556a0f9461d8,
    queryString=0x556a0f944fb0 "ALTER TABLE ONLY sch27.chr\n   ALTER
COLUMN c SET DATA TYPE  TEXT  USING sch27.chr.c",
context=PROCESS_UTILITY_TOPLEVEL,
    params=0x0, queryEnv=0x0, dest=0x556a0f9462a8, qc=0x7ffe1987f4e0)
    at utility.c:1325




Below are backtraces of 2 concurrent crashes on REL_15_STABLE (~month old)

=== Backtrace - c7e21e9664@REL_15_STABLE ===
#3  0x000055ceba5222f8 in visibilitymap_clear (rel=0x7f568cdc3b80,
heapBlk=39, buf=0, flags=3 '\003') at visibilitymap.c:156
#4  0x000055ceba4fae90 in heap_delete (relation=0x7f568cdc3b80,
tid=0x55cebc44319c, cid=2, crosscheck=0x0, wait=true,
tmfd=0x7fff74227990, changingPart=false) at heapam.c:2931
#5  0x000055ceba4fb43f in simple_heap_delete (relation=0x7f568cdc3b80,
tid=0x55cebc44319c) at heapam.c:3078
#6  0x000055ceba5df034 in CatalogTupleDelete (heapRel=0x7f568cdc3b80,
tid=0x55cebc44319c) at indexing.c:352
#7  0x000055ceba5d2dc4 in DeleteRelationTuple (relid=19720) at heap.c:1549
#8  0x000055ceba5db6fa in index_drop (indexId=19720, concurrent=false,
concurrent_lock_mode=false) at index.c:2370
#9  0x000055ceba5cd347 in doDeletion (object=0x55cebb750ca4, flags=1)
at dependency.c:1403
#10 0x000055ceba5cd094 in deleteOneObject (object=0x55cebb750ca4,
depRel=0x7fff74227c70, flags=1) at dependency.c:1312
#11 0x000055ceba5cb7fa in deleteObjectsInList
(targetObjects=0x55cebd5d1000, depRel=0x7fff74227c70, flags=1) at
dependency.c:273
#12 0x000055ceba5cba07 in performMultipleDeletions
(objects=0x55cebb750088, behavior=DROP_RESTRICT, flags=1) at
dependency.c:434
#13 0x000055ceba723f95 in ATPostAlterTypeCleanup
(wqueue=0x7fff74227e38, tab=0x55cebb6437e8, lockmode=8) at
tablecmds.c:13203
#14 0x000055ceba71056d in ATRewriteCatalogs (wqueue=0x7fff74227e38,
lockmode=8, context=0x7fff74228030) at tablecmds.c:4869
#15 0x000055ceba70f724 in ATController (parsetree=0x55cebb1041a0,
rel=0x55cebc0a5008, cmds=0x55cebb104148, recurse=false, lockmode=8,
context=0x7fff74228030) at tablecmds.c:4437
#16 0x000055ceba70f339 in AlterTable (stmt=0x55cebb1041a0, lockmode=8,
context=0x7fff74228030) at tablecmds.c:4083
#17 0x000055ceba9f4543 in ProcessUtilitySlow (pstate=0x55cebb6436d0,
pstmt=0x55cebb1044f0,
    queryString=0x55cebb102f30 "ALTER TABLE ONLY sch15.chr\n   ALTER
COLUMN c SET DATA TYPE  TEXT  USING pg_catalog.varchar(\n
cast(pg_catalog.session_user() as name))",
    context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
dest=0x55cebb1045e0, qc=0x7fff74228970) at utility.c:1325

=== Backtrace - c7e21e9664@REL_15_STABLE ===
#3  0x000055ceba5222f8 in visibilitymap_clear (rel=0x7f568cc4f940,
heapBlk=56, buf=0, flags=3 '\003') at visibilitymap.c:156
#4  0x000055ceba4fae90 in heap_delete (relation=0x7f568cc4f940,
tid=0x55cebb41a624, cid=1, crosscheck=0x0, wait=true,
tmfd=0x7fff74227990, changingPart=false) at heapam.c:2931
#5  0x000055ceba4fb43f in simple_heap_delete (relation=0x7f568cc4f940,
tid=0x55cebb41a624) at heapam.c:3078
#6  0x000055ceba5df034 in CatalogTupleDelete (heapRel=0x7f568cc4f940,
tid=0x55cebb41a624) at indexing.c:352
#7  0x000055ceba5d2dc4 in DeleteRelationTuple (relid=21181) at heap.c:1549
#8  0x000055ceba5db6fa in index_drop (indexId=21181, concurrent=false,
concurrent_lock_mode=false) at index.c:2370
#9  0x000055ceba5cd347 in doDeletion (object=0x55cebb93a990, flags=1)
at dependency.c:1403
#10 0x000055ceba5cd094 in deleteOneObject (object=0x55cebb93a990,
depRel=0x7fff74227c70, flags=1) at dependency.c:1312
#11 0x000055ceba5cb7fa in deleteObjectsInList
(targetObjects=0x55cebde36870, depRel=0x7fff74227c70, flags=1) at
dependency.c:273
#12 0x000055ceba5cba07 in performMultipleDeletions
(objects=0x55cebb93abc8, behavior=DROP_RESTRICT, flags=1) at
dependency.c:434
#13 0x000055ceba723f95 in ATPostAlterTypeCleanup
(wqueue=0x7fff74227e38, tab=0x55cebb669928, lockmode=8) at
tablecmds.c:13203
#14 0x000055ceba71056d in ATRewriteCatalogs (wqueue=0x7fff74227e38,
lockmode=8, context=0x7fff74228030) at tablecmds.c:4869
#15 0x000055ceba70f724 in ATController (parsetree=0x55cebb81f838,
rel=0x7f568cc4a8a8, cmds=0x55cebb81f7e0, recurse=false, lockmode=8,
context=0x7fff74228030) at tablecmds.c:4437
#16 0x000055ceba70f339 in AlterTable (stmt=0x55cebb81f838, lockmode=8,
context=0x7fff74228030) at tablecmds.c:4083
#17 0x000055ceba9f4543 in ProcessUtilitySlow (pstate=0x55cebb669810,
pstmt=0x55cebb81fb88,
    queryString=0x55cebb102f30 "ALTER TABLE ONLY sch10.num\n   ALTER
COLUMN n SET DATA TYPE  TEXT  USING pg_catalog.numeric_add(\n
cast(pg_catalog.numeric_div_trunc(\n      cast(sch10.num.n as
\"numeric\"),\n      cast(sch10.num.n as \""...,
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
dest=0x55cebb81fc78, qc=0x7fff74228970) at utility.c:1325



Below are backtraces of 2 concurrent crashes on REL_13_STABLE (~month old)

=== Backtrace - 7cdd0c2d7c@REL_13_STABLE ===
#3  0x00005572a01852f8 in visibilitymap_clear (rel=0x7fc91e5bedc8,
heapBlk=128, buf=0, flags=3 '\003') at visibilitymap.c:156
#4  0x00005572a015de90 in heap_delete (relation=0x7fc91e5bedc8,
tid=0x5572a2f63b2c, cid=2, crosscheck=0x0, wait=true,
tmfd=0x7ffda170ae60, changingPart=false) at heapam.c:2931
#5  0x00005572a015e43f in simple_heap_delete (relation=0x7fc91e5bedc8,
tid=0x5572a2f63b2c) at heapam.c:3078
#6  0x00005572a0242034 in CatalogTupleDelete (heapRel=0x7fc91e5bedc8,
tid=0x5572a2f63b2c) at indexing.c:352
#7  0x00005572a0235e7b in DeleteAttributeTuples (relid=628267) at heap.c:1586
#8  0x00005572a023e6f0 in index_drop (indexId=628267,
concurrent=false, concurrent_lock_mode=false) at index.c:2365
#9  0x00005572a0230347 in doDeletion (object=0x5572a2f99944, flags=1)
at dependency.c:1403
#10 0x00005572a0230094 in deleteOneObject (object=0x5572a2f99944,
depRel=0x7ffda170b1a0, flags=1) at dependency.c:1312
#11 0x00005572a022e7fa in deleteObjectsInList
(targetObjects=0x5572a2f63088, depRel=0x7ffda170b1a0, flags=1) at
dependency.c:273
#12 0x00005572a022ea07 in performMultipleDeletions
(objects=0x5572a2f74998, behavior=DROP_RESTRICT, flags=1) at
dependency.c:434
#13 0x00005572a0386f95 in ATPostAlterTypeCleanup
(wqueue=0x7ffda170b368, tab=0x5572a2fb9790, lockmode=8) at
tablecmds.c:13203
#14 0x00005572a037356d in ATRewriteCatalogs (wqueue=0x7ffda170b368,
lockmode=8, context=0x7ffda170b560) at tablecmds.c:4869
#15 0x00005572a0372724 in ATController (parsetree=0x5572a266ee60,
rel=0x5572a2c56e68, cmds=0x5572a266ee08, recurse=false, lockmode=8,
context=0x7ffda170b560) at tablecmds.c:4437
#16 0x00005572a0372339 in AlterTable (stmt=0x5572a266ee60, lockmode=8,
context=0x7ffda170b560) at tablecmds.c:4083
#17 0x00005572a0657543 in ProcessUtilitySlow (pstate=0x5572a2fb9640,
pstmt=0x5572a266f1b0, queryString=0x5572a266df30 "ALTER TABLE ONLY
sch7.bool\n   ALTER COLUMN b SET DATA TYPE  TEXT  USING
pg_catalog.pg_is_wal_replay_paused()",
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
dest=0x5572a266f2a0, qc=0x7ffda170bea0) at utility.c:1325


=== Backtrace - 7cdd0c2d7c@REL_13_STABLE ===
#3  0x00005572a01852f8 in visibilitymap_clear (rel=0x7fc91e5bedc8,
heapBlk=98, buf=0, flags=3 '\003') at visibilitymap.c:156
#4  0x00005572a015de90 in heap_delete (relation=0x7fc91e5bedc8,
tid=0x5572a308f82c, cid=1, crosscheck=0x0, wait=true,
tmfd=0x7ffda170ae60, changingPart=false) at heapam.c:2931
#5  0x00005572a015e43f in simple_heap_delete (relation=0x7fc91e5bedc8,
tid=0x5572a308f82c) at heapam.c:3078
#6  0x00005572a0242034 in CatalogTupleDelete (heapRel=0x7fc91e5bedc8,
tid=0x5572a308f82c) at indexing.c:352
#7  0x00005572a0235e7b in DeleteAttributeTuples (relid=537682) at heap.c:1586
#8  0x00005572a023e6f0 in index_drop (indexId=537682,
concurrent=false, concurrent_lock_mode=false) at index.c:2365
#9  0x00005572a0230347 in doDeletion (object=0x5572a3092080, flags=1)
at dependency.c:1403
#10 0x00005572a0230094 in deleteOneObject (object=0x5572a3092080,
depRel=0x7ffda170b1a0, flags=1) at dependency.c:1312
#11 0x00005572a022e7fa in deleteObjectsInList
(targetObjects=0x5572a30bdef8, depRel=0x7ffda170b1a0, flags=1) at
dependency.c:273
#12 0x00005572a022ea07 in performMultipleDeletions
(objects=0x5572a308fa10, behavior=DROP_RESTRICT, flags=1) at
dependency.c:434
#13 0x00005572a0386f95 in ATPostAlterTypeCleanup
(wqueue=0x7ffda170b368, tab=0x5572a31a98f8, lockmode=8) at
tablecmds.c:13203
#14 0x00005572a037356d in ATRewriteCatalogs (wqueue=0x7ffda170b368,
lockmode=8, context=0x7ffda170b560) at tablecmds.c:4869
#15 0x00005572a0372724 in ATController (parsetree=0x5572a266fb40,
rel=0x5572a28b3628, cmds=0x5572a266fae8, recurse=false, lockmode=8,
context=0x7ffda170b560) at tablecmds.c:4437
#16 0x00005572a0372339 in AlterTable (stmt=0x5572a266fb40, lockmode=8,
context=0x7ffda170b560) at tablecmds.c:4083
#17 0x00005572a0657543 in ProcessUtilitySlow (pstate=0x5572a31a97e0,
pstmt=0x5572a266fd20, queryString=0x5572a266df30 "ALTER TABLE ONLY
sch2.chr\n   ALTER COLUMN c SET DATA TYPE  TEXT  USING
pg_catalog.varcharin(\n    cast(public.gtrgm_out(\n
cast(cast(null as public.gtrgm) as public.gtrgm)) as cstring),\n
cast(pg"..., context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
    queryEnv=0x0, dest=0x5572a30b2fe0, qc=0x7ffda170bea0) at utility.c:1325



The issue isn't easily reproducible and thus posting this as an
aggregated report. Let me know, if changing something in the (SQLSmith
based) test-setup could help triaging this further. Also, can provide
full backtraces, if required.

Reference:
1) https://www.postgresql.org/message-id/2247102.1618008027@sss.pgh.pa.us
2) https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=5f12bc94dcc60403bdfec6ea3d97aa616a512d9e

-
Robins Tharakan
Amazon Web Services



Re: PANIC in heap_delete during ALTER TABLE

From
Jeff Davis
Date:
On Mon, 2022-08-22 at 13:32 +0930, Robins Tharakan wrote:
> Hi,
>
> During ALTER TABLE, heap_delete() can PANIC with the error
> "PANIC:  wrong buffer passed to visibilitymap_clear".

Thank you for the detailed report!

> Given below are backtraces on master, REL_15_STABLE and
> REL_13_STABLE.
> There was a similar discussion last year [1], although that was about
> heap_update() and was soon fixed [2]; thus reporting this as a
> separate
> thread.

Yes, it looks like it's essentially the same issue.

heap_delete() does recheck after locking the buffer correctly at the
top, but there are other paths below which unlock/relock the buffer
without checking to see if the VM pin should be acquired.

Repro on master:

---------------------------------------
GDB attached to session 2:
   break vacuumlazy.c:1243
   continue

Session 1:
   create table foo(i int) with (autovacuum_enabled=false);
   create index foo_idx on foo (i);
   insert into foo select generate_series(1,100);
   delete from foo where i = 49;

Session 2:
   vacuum foo; -- hits breakpoint

Session 3:
   begin;
   select i from foo where i = 50 for update;

Session 1:
   delete from foo where i = 50;

GDB:
   continue

Session 3:
   commmit;

---------------------------------------

I attached a hasty patch that seems to fix it. I'll need to look more
closely later, and I'll also need some review because there have been
multiple bugs in this area.

Regards,
    Jeff Davis


Attachment

Re: PANIC in heap_delete during ALTER TABLE

From
Peter Geoghegan
Date:
On Wed, Sep 7, 2022 at 2:53 PM Jeff Davis <pgsql@j-davis.com> wrote:
> Yes, it looks like it's essentially the same issue.
>
> heap_delete() does recheck after locking the buffer correctly at the
> top, but there are other paths below which unlock/relock the buffer
> without checking to see if the VM pin should be acquired.

This certainly vindicates Tom's decision to backpatch his fix for the
heap_update() issue. Clearly this new variant bug can't have anything to do
with the 14-era behavioral change to lazy_vacuum_heap_page() from
commit 8523492d4e. That behavioral change was clearly a factor in the
heap_update() issue, but can't be relevant here. Since the repro we have
shows that 13 is also affected.

I wonder why it took this long to hear anything about it, though.
Maybe it just worked out that way because the earlier (or was it
later?) heap_update() issue created a public record that somebody like Robins
could find.


--
Peter Geoghegan



Re: PANIC in heap_delete during ALTER TABLE

From
Jeff Davis
Date:
On Wed, 2022-09-07 at 15:44 -0700, Peter Geoghegan wrote:
> This certainly vindicates Tom's decision to backpatch his fix for the
> heap_update() issue. Clearly this new variant bug can't have anything
> to do
> with the 14-era behavioral change to lazy_vacuum_heap_page() from
> commit 8523492d4e. That behavioral change was clearly a factor in the
> heap_update() issue, but can't be relevant here. Since the repro we
> have
> shows that 13 is also affected.

I wasn't able to repro on 13, but Robins' report indicates that it's
possible.

Regards,
    Jeff Davis




Re: PANIC in heap_delete during ALTER TABLE

From
Jeff Davis
Date:
On Thu, 2022-09-08 at 10:30 -0700, Jeff Davis wrote:
> I wasn't able to repro on 13, but Robins' report indicates that it's
> possible.

I looked again, and I still can't find a way to repro on 13,
specifically the sha1 you mentioned (7cdd0c2d7c).

heap_delete() lets go of its exclusive lock (but not pin) in a couple
places (just like on master). But it looks like all the places that set
PD_ALL_VISIBLE are doing so under a cleanup lock (unlike master), which
means that it can't change for the duration of heap_delete().

I'm hesitant to commit anything here if I can't repro the problem on
13. I must be missing something.

Regards,
    Jeff Davis




Re: PANIC in heap_delete during ALTER TABLE

From
Peter Geoghegan
Date:
On Thu, Sep 15, 2022 at 3:25 PM Jeff Davis <pgsql@j-davis.com> wrote:
> heap_delete() lets go of its exclusive lock (but not pin) in a couple
> places (just like on master). But it looks like all the places that set
> PD_ALL_VISIBLE are doing so under a cleanup lock (unlike master), which
> means that it can't change for the duration of heap_delete().
>
> I'm hesitant to commit anything here if I can't repro the problem on
> 13. I must be missing something.

What about Hot Standby? There was never a rule that said that we had
to have a cleanup lock to set PD_ALL_VISIBLE during original execution
(or in recovery), even before Postgres 14. I believe that Postgres 14
was the first version to do it without a cleanup lock during original
execution. But it wasn't the first version to do it during recovery.

Based on a quick look just now: the heap_xlog_visible() REDO routine
is processed as an independent atomic action on replicas. It doesn't
require a cleanup lock for this, and never has. It may or may not be
preceded by a prune operation for the same page (in any case obviously
we won't keep the cleanup lock after the prune REDO routine runs).

-- 
Peter Geoghegan



Re: PANIC in heap_delete during ALTER TABLE

From
Jeff Davis
Date:
On Wed, 2022-09-07 at 14:53 -0700, Jeff Davis wrote:
> I attached a hasty patch that seems to fix it. I'll need to look more
> closely later, and I'll also need some review because there have been
> multiple bugs in this area.

I plan to commit something close to this patch in about 48 hours and
backport to 12, like 5f12bc94dc was.

I would prefer if I was able to repro on 13 first, and some review
would be nice given the area involved and how close we are to release.
But it seems like fixing a bug is better than not. If someone thinks
this is the wrong time to commit and wants me to hold off, please let
me know.

Regards,
    Jeff Davis




Re: PANIC in heap_delete during ALTER TABLE

From
Michael Paquier
Date:
On Tue, Sep 20, 2022 at 11:32:05AM -0700, Jeff Davis wrote:
> I would prefer if I was able to repro on 13 first, and some review
> would be nice given the area involved and how close we are to release.
> But it seems like fixing a bug is better than not. If someone thinks
> this is the wrong time to commit and wants me to hold off, please let
> me know.

Doing something in the next 48 hours will give enough room to the
buildfarm to check this stuff for 15 RC.  So the timing is fine in my
opinion.
--
Michael

Attachment

Re: PANIC in heap_delete during ALTER TABLE

From
Robins Tharakan
Date:
Thanks Jeff for picking this up.

> I plan to commit something close to this patch in about 48 hours and
> backport to 12, like 5f12bc94dc was.
>
Using the repro provided, I can confirm that the patch works on master
(6bcda4a7212) and thus also +1 on going ahead with pushing this.

> I would prefer if I was able to repro on 13 first, and some review
> would be nice given the area involved and how close we are to release.
> But it seems like fixing a bug is better than not.

Outside of the repro steps, I applied this patch on my test-instance
(the rationale was to attempt the same workload that generated the
original set of crashes) and ran the workload but didn't get anything
to report - which isn't conclusive, but IMHO still good.

Given that the original set of crashes were generated over a month of
tracking, I'll keep an eye on the results for the next few weeks and come
back in case I have something interesting.

-
robins



Re: PANIC in heap_delete during ALTER TABLE

From
Jeff Davis
Date:
On Wed, 2022-09-21 at 20:48 +0930, Robins Tharakan wrote:
> Thanks Jeff for picking this up.
>
> > I plan to commit something close to this patch in about 48 hours
> > and
> > backport to 12, like 5f12bc94dc was.
> >
> Using the repro provided, I can confirm that the patch works on
> master
> (6bcda4a7212) and thus also +1 on going ahead with pushing this.

Committed.

--
Jeff Davis
PostgreSQL Contributor Team - AWS





Re: PANIC in heap_delete during ALTER TABLE

From
Peter Geoghegan
Date:
On Thu, Sep 22, 2022 at 11:47 AM Jeff Davis <pgsql@j-davis.com> wrote:
> Committed.

My thoughts on the fix (a little belated):

The patch looks very much like it can only affect cases that are going
to crash anyway (by forcing a restart). Plus you're just adding
another reason to restart in heap_delete() -- there were already a few
reasons why that might be necessary. So the fix is very well targeted.
Even when I think about it in the most adversarial/pessimistic way
possible, it's still very difficult to imagine anybody being worse off
than they would have been without the patch. OTOH it's readily
apparent that this fixes a crash.

There are no absolute certainties, but I do feel confident that
committing this fix was the right response, all things considered.

-- 
Peter Geoghegan