Hello,
my colleague Jürg Senn recently found a backend crash in Postgres 11 and trimmed it down to a reproducible test case.
Testingrevealed the crash is still present in last week's master branch but not in 10. I further simplified the test
caseto:
CREATE EXTENSION btree_gist;
CREATE TABLE segfault (i int);
CREATE INDEX ON segfault USING gist ((i + 10));
INSERT INTO segfault VALUES (1);
UPDATE segfault SET i = 2;
Our tests were performed on macOS: PostgreSQL 12devel on x86_64-apple-darwin17.7.0, compiled by Apple LLVM version
9.1.0(clang-902.0.39.2), 64-bit
Apparently the segfault happens within the check for the possibility of HOT Updates with expression / functional
indexeswhich was added by c203d6cf8 in 11.
The backtrace on the segfault is:
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0xb)
* frame #0: 0x00007fff51b06cc0 libsystem_platform.dylib`_platform_memcmp + 288
frame #1: 0x0000000108b8b7e3 postgres`datumIsEqual(value1=11, value2=12, typByVal=false, typLen=8) at datum.c:249
frame #2: 0x000000010869ca52 postgres`ProjIndexIsUnchanged(relation=0x000000010b292778, oldtup=0x00007ffee75e28b8,
newtup=0x00007f9f84007e70)at heapam.c:4551
frame #3: 0x000000010869b60f postgres`heap_update(relation=0x000000010b292778, otid=0x00007ffee75e2bc8,
newtup=0x00007f9f84007e70,cid=0, crosscheck=0x0000000000000000, wait=true, hufd=0x00007ffee75e2a98,
lockmode=0x00007ffee75e2a80)at heapam.c:4242
frame #4: 0x000000010892c381 postgres`ExecUpdate(mtstate=0x00007f9f840066c0, tupleid=0x00007ffee75e2bc8,
oldtuple=0x0000000000000000,slot=0x00007f9f84007708, planSlot=0x00007f9f84006bd8, epqstate=0x00007f9f84006780,
estate=0x00007f9f84006318,canSetTag=true) at nodeModifyTable.c:1208
frame #5: 0x000000010892a57e postgres`ExecModifyTable(pstate=0x00007f9f840066c0) at nodeModifyTable.c:2172
frame #6: 0x00000001088fbc12 postgres`ExecProcNodeFirst(node=0x00007f9f840066c0) at execProcnode.c:445
frame #7: 0x00000001088f5002 postgres`ExecProcNode(node=0x00007f9f840066c0) at executor.h:237
frame #8: 0x00000001088f0a31 postgres`ExecutePlan(estate=0x00007f9f84006318, planstate=0x00007f9f840066c0,
use_parallel_mode=false,operation=CMD_UPDATE, sendTuples=false, numberTuples=0, direction=ForwardScanDirection,
dest=0x00007f9f83005e40,execute_once=true) at execMain.c:1707
frame #9: 0x00000001088f08fc postgres`standard_ExecutorRun(queryDesc=0x00007f9f83005b18,
direction=ForwardScanDirection,count=0, execute_once=true) at execMain.c:364
frame #10: 0x00000001088f06c2 postgres`ExecutorRun(queryDesc=0x00007f9f83005b18, direction=ForwardScanDirection,
count=0,execute_once=true) at execMain.c:307
frame #11: 0x0000000108b2edfe postgres`ProcessQuery(plan=0x00007f9f830015b0, sourceText="UPDATE segfault SET i =
2;",params=0x0000000000000000, queryEnv=0x0000000000000000, dest=0x00007f9f83005e40, completionTag="") at pquery.c:161
frame #12: 0x0000000108b2de78 postgres`PortalRunMulti(portal=0x00007f9f82842518, isTopLevel=true,
setHoldSnapshot=false,dest=0x00007f9f83005e40, altdest=0x00007f9f83005e40, completionTag="") at pquery.c:1286
frame #13: 0x0000000108b2d540 postgres`PortalRun(portal=0x00007f9f82842518, count=9223372036854775807,
isTopLevel=true,run_once=true, dest=0x00007f9f83005e40, altdest=0x00007f9f83005e40, completionTag="") at pquery.c:799
frame #14: 0x0000000108b28a21 postgres`exec_simple_query(query_string="UPDATE segfault SET i = 2;") at
postgres.c:1215
frame #15: 0x0000000108b27bb8 postgres`PostgresMain(argc=1, argv=0x00007f9f82815138, dbname="bussmann",
username="bussmann")at postgres.c:4243
frame #16: 0x0000000108a655b4 postgres`BackendRun(port=0x00007f9f80e00910) at postmaster.c:4377
frame #17: 0x0000000108a649b8 postgres`BackendStartup(port=0x00007f9f80e00910) at postmaster.c:4068
frame #18: 0x0000000108a6385a postgres`ServerLoop at postmaster.c:1700
frame #19: 0x0000000108a61158 postgres`PostmasterMain(argc=3, argv=0x00007f9f81900580) at postmaster.c:1373
frame #20: 0x000000010896c219 postgres`main(argc=3, argv=0x00007f9f81900580) at main.c:228
frame #21: 0x00007fff517f9015 libdyld.dylib`start + 1
Interestingly, the debugger has no issue accessing the memory in question:
(Datum) value1 = 11
(Datum) value2 = 12
(bool) typByVal = false
(int) typLen = 8
(bool) res = false
(Size) size1 = 8
(Size) size2 = 8
(char *) s1 = 0x000000000000000b ""
(char *) s2 = 0x000000000000000c ""
How can this be investigated further?
The top of the stack trace is similar to the one in
https://www.postgresql.org/message-id/20181101140216.GA2954%40hermes.hilbert.locbut I'm not sure these are actually
related.
I was unable to reproduce the segfault with other index types then GIST nor w/o the opclasses provided by the
btree_gistextension. However none of that specifically appears in the stack frame.
Further poking around the issue I came across the following:
# ALTER INDEX segfault_expr_idx SET (RECHECK_ON_UPDATE = off);
ERROR: unrecognized parameter "recheck_on_update"
According to the commit referred to earlier, this option should work for all index types, but apparently it does not
forGIST. In reloptions.h:54 we have
> RELOPT_KIND_INDEX = RELOPT_KIND_BTREE|RELOPT_KIND_HASH|RELOPT_KIND_GIN|RELOPT_KIND_SPGIST
which is missing RELOPT_KIND_GIST and RELOPT_KIND_BRIN. Adding these at least fixes the possibility to set and use the
parameteron the GIST index. Trivial patch for that attached.
Tobias