Re: BUG #15032: Segmentation fault when running a particular query - Mailing list pgsql-bugs
From | Tom Lane |
---|---|
Subject | Re: BUG #15032: Segmentation fault when running a particular query |
Date | |
Msg-id | 13015.1516992308@sss.pgh.pa.us Whole thread Raw |
In response to | BUG #15032: Segmentation fault when running a particular query (PG Bug reporting form <noreply@postgresql.org>) |
Responses |
Re: BUG #15032: Segmentation fault when running a particular query
|
List | pgsql-bugs |
=?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes: > ## Query that results in segmentation fault Unsurprisingly, the given info is not enough to reproduce the crash. However, looking at the stack trace: > (gdb) bt > #0 index_markpos (scan=0x0) at > /build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/access/index/indexam.c:373 > #1 0x000055a812746c68 in ExecMergeJoin (pstate=0x55a8131bc778) at > /build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/executor/nodeMergejoin.c:1188 > #2 0x000055a81272cf3f in ExecProcNode (node=0x55a8131bc778) at > /build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/include/executor/executor.h:250 > #3 EvalPlanQualNext (epqstate=epqstate@entry=0x55a81318c518) at > /build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/executor/execMain.c:3005 > #4 0x000055a81272d342 in EvalPlanQual (estate=estate@entry=0x55a81318c018, > epqstate=epqstate@entry=0x55a81318c518, > relation=relation@entry=0x7f4e4e25ab68, rti=1, lockmode=<optimized out>, > tid=tid@entry=0x7ffd54492330, priorXmax=8959603) at > /build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/executor/execMain.c:2521 > #5 0x000055a812747af7 in ExecUpdate (mtstate=mtstate@entry=0x55a81318c468, > tupleid=tupleid@entry=0x7ffd54492450, oldtuple=oldtuple@entry=0x0, > slot=<optimized out>, slot@entry=0x55a8131a2f08, > planSlot=planSlot@entry=0x55a81319db60, > epqstate=epqstate@entry=0x55a81318c518, estate=0x55a81318c018, canSetTag=1 > '\001') at > /build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/executor/nodeModifyTable.c:1113 it seems fairly clear that somebody passed a NULL scandesc pointer to index_markpos. Looking at the only two callers of that function, this must mean that either an IndexScan's node->iss_ScanDesc or an IndexOnlyScan's node->ioss_ScanDesc was null. (We don't see ExecIndexMarkPos in the trace because the compiler optimized the tail call.) And that leads me to commit 09529a70b, which changed the logic in those node types to allow initialization of the index scandesc to be delayed to the first tuple fetch, rather than necessarily performed during ExecInitNode. Because this is happening inside an EvalPlanQual, it's unsurprising that we'd be taking an unusual code path. I believe what happened was that the IndexScan node returned a jammed-in EPQ tuple on its first call, and so hadn't opened the scandesc at all, while ExecMergeJoin would do an ExecMarkPos if the tuple matched (which it typically would if we'd gotten to EPQ), whereupon kaboom. It's tempting to think that this is an oversight in commit 09529a70b and we need to rectify it by something along the lines of teaching ExecIndexMarkPos and ExecIndexOnlyMarkPos to initialize the scandesc if needed before calling index_markpos. However, on further reflection, it seems like this is a bug of far older standing, to wit that ExecIndexMarkPos/ExecIndexRestrPos are doing entirely the wrong thing when EPQ is active. It's not meaningful, or at least not correct, to be messing with the index scan state at all in that case. Rather, what the scan is supposed to do is return the single jammed-in EPQ tuple, and what "restore" ought to mean is "clear my es_epqScanDone flag so that that tuple can be returned again". It's not clear to me whether the failure to do that has any real consequences though. It would only matter if there's more than one tuple available on the outer side of the mergejoin, which I think there never would be in an EPQ situation. Still, if there ever were more outer tuples, the mergejoin would misbehave and maybe even crash itself (because it probably assumes that restoring to a point where there had been a tuple would allow it to re-fetch that tuple successfully). So what I'm inclined to do is teach the mark/restore infrastructure to do the right thing with EPQ state when EPQ is active. But I'm not clear on whether that needs to be back-patched earlier than v10. regards, tom lane
pgsql-bugs by date: