PANIC: wrong buffer passed to visibilitymap_clear - Mailing list pgsql-hackers

From 王伟(学弈)
Subject PANIC: wrong buffer passed to visibilitymap_clear
Date
Msg-id 9c75c5d7-5e87-413b-9587-4f77834e2a79.rogers.ww@alibaba-inc.com
Whole thread Raw
Responses Re: PANIC: wrong buffer passed to visibilitymap_clear
Re: PANIC: wrong buffer passed to visibilitymap_clear
List pgsql-hackers
Hi,
I recently find this problem while testing PG14 with sysbench.
Then I look through the emails from pgsql-hackers and find a previous similary bug which is https://www.postgresql.org/message-id/flat/2247102.1618008027%40sss.pgh.pa.us. But the bugfix commit(34f581c39e97e2ea237255cf75cccebccc02d477) is already patched to PG14.

The following is the stack of coredump.
#0  0x00007f2ba9bfa277 in raise () from /lib64/libc.so.6
#1  0x00007f2ba9bfb968 in abort () from /lib64/libc.so.6
#2  0x00000000009416bb in errfinish () at elog.c:717
#3  0x000000000049518b in visibilitymap_clear (rel=<optimized out>, heapBlk=<optimized out>, buf=<optimized out>, flags=<optimized out>, polar_record=<optimized out>) at visibilitymap.c:142
#4  0x000000000054c2df in heap_update () at heapam.c:3948
#5  0x0000000000555538 in heapam_tuple_update (relation=0x7f2b909cfef8, otid=0x7fff930c577a, slot=0x2612528, cid=0, snapshot=<optimized out>, crosscheck=0x0, wait=true, tmfd=0x7fff930c5690, lockmode=0x7fff930c5684, update_indexes=0x7fff930c5681)
    at heapam_handler.c:327
#6  0x00000000006e04f3 in table_tuple_update (update_indexes=0x7fff930c5681, lockmode=0x7fff930c5684, tmfd=0x7fff930c5690, wait=true, crosscheck=<optimized out>, snapshot=<optimized out>, cid=<optimized out>, slot=0x2612528, otid=0x26126d0,
    rel=0x7f2b909cfef8) at ../../../src/include/access/tableam.h:1509
#7  ExecUpdate () at nodeModifyTable.c:1785
#8  0x00000000006e0f2a in ExecModifyTable () at nodeModifyTable.c:2592
#9  0x00000000006b909c in ExecProcNode (node=0x24d4eb0) at ../../../src/include/executor/executor.h:257
#10 ExecutePlan (execute_once=<optimized out>, dest=0xaa52e0 <donothingDR>, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_UPDATE, use_parallel_mode=<optimized out>, planstate=0x24d4eb0, estate=0x24d4c08)
    at execMain.c:1553
#11 standard_ExecutorRun () at execMain.c:363
#12 0x00007f2babd893dd in pgss_ExecutorRun (queryDesc=0x25dcab8, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at pg_stat_statements.c:1018
#13 0x00007f2babd816fa in explain_ExecutorRun (queryDesc=0x25dcab8, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at auto_explain.c:334
#14 0x0000000000828ac8 in ProcessQuery (plan=0x25c9e88, sourceText=0x25dc9a8 "UPDATE sbtest8 SET k=k+1 WHERE id=$1", params=0x25dca28, queryEnv=0x0, dest=<optimized out>, qc=0x7fff930c5e40) at pquery.c:160
#15 0x00000000008294d8 in PortalRunMulti (portal=portal@entry=0x25697c8, isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=false, dest=0xaa52e0 <donothingDR>, dest@entry=0x24a6568, altdest=0xaa52e0 <donothingDR>,
    altdest@entry=0x24a6568, qc=qc@entry=0x7fff930c5e40) at pquery.c:1277
#16 0x0000000000829929 in PortalRun () at pquery.c:797
#17 0x0000000000826f57 in exec_execute_message (max_rows=9223372036854775807, portal_name=0x24a6158 "") at postgres.c:2306
#18 PostgresMain () at postgres.c:4826
#19 0x00000000007a1e8a in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4910
#20 BackendStartup (port=<optimized out>) at postmaster.c:4621
#21 ServerLoop () at postmaster.c:1823
#22 0x00000000007a2c4b in PostmasterMain () at postmaster.c:1488
#23 0x000000000050c5d0 in main (argc=3, argv=0x24a0f50) at main.c:209

I'm wondering whether there's another code path to lead this problem happened. Since, I take a deep dig via gdb which turns out that newbuffer is not euqal to buffer. In other words, the function RelationGetBufferForTuple must have been called just now.
Besides, why didn't we re-check the flag after RelationGetBufferForTuple was called?
But I'm confused about the heap_update and RelationGetBufferForTuple functions which are too long to understand for me. Can anyone give me a hand?

--
Best regards,
rogers.ww

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Next
From: "tanghy.fnst@fujitsu.com"
Date:
Subject: RE: Support tab completion for upper character inputs in psql