Thread: mark/restore failures on unsorted merge joins
From a report by user "kes" on irc, I constructed this test case: create table marktst (id integer, val numrange, exclude using gist (val with &&)); insert into marktst select i, numrange(i, i+1, '[)') from generate_series(1,1000) i; vacuum marktst; analyze marktst; select * from (select val from marktst where val && numrange(10,11)) s1 full join (values (1)) v(a) on true; ERROR: function ammarkpos is not defined for index marktst_val_excl for which the query plan is: QUERY PLAN -------------------------------------------------------------------------------------------- Merge Full Join (cost=0.14..4.21 rows=2 width=20) -> Result (cost=0.00..0.01 rows=1 width=4) -> Index Only Scan using marktst_val_excl on marktst (cost=0.14..4.18 rows=2 width=16) Index Cond: (val && '[10,11)'::numrange) (4 rows) The problem is that the planner calls ExecSupportsMarkRestore to find out whether a Materialize node is needed, and that function looks no further than the Path type of T_Index[Only]Path in order to return true, even though in this case it's a GiST index which does not support mark/restore. (Usually this can't be a problem because the merge join would need sorted input, thus the index scan would be a btree; but a merge join that doesn't actually have any sort keys could take unsorted input from any index type.) Going forward, this looks like IndexOptInfo needs another am* boolean field, but that's probably not appropriate for the back branches; maybe as a workaround, ExecSupportsMarkRestore should just check for btree? -- Andrew (irc:RhodiumToad)
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > The problem is that the planner calls ExecSupportsMarkRestore to find > out whether a Materialize node is needed, and that function looks no > further than the Path type of T_Index[Only]Path in order to return true, > even though in this case it's a GiST index which does not support > mark/restore. > (Usually this can't be a problem because the merge join would need > sorted input, thus the index scan would be a btree; but a merge join > that doesn't actually have any sort keys could take unsorted input from > any index type.) Sounds like the right analysis. > Going forward, this looks like IndexOptInfo needs another am* boolean > field, but that's probably not appropriate for the back branches; maybe > as a workaround, ExecSupportsMarkRestore should just check for btree? Uh, why would you not just look to see if the ammarkpos/amrestrpos fields are non-null? regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: >> The problem is that the planner calls ExecSupportsMarkRestore to >> find out whether a Materialize node is needed, and that function >> looks no further than the Path type of T_Index[Only]Path in order to >> return true, even though in this case it's a GiST index which does >> not support mark/restore. >> (Usually this can't be a problem because the merge join would need >> sorted input, thus the index scan would be a btree; but a merge join >> that doesn't actually have any sort keys could take unsorted input >> from any index type.) Tom> Sounds like the right analysis. >> Going forward, this looks like IndexOptInfo needs another am* >> boolean field, but that's probably not appropriate for the back >> branches; maybe as a workaround, ExecSupportsMarkRestore should just >> check for btree? Tom> Uh, why would you not just look to see if the ammarkpos/amrestrpos Tom> fields are non-null? We don't (in the back branches) seem to have a pointer to the IndexAmRoutine handy, only the oid? Obviously we can look it up from the oid, but is that more overhead than we want in a join cost function, given that this will be called for all potential mergejoins considered, not just JOIN_FULL? Or is the overhead not worth bothering about? -- Andrew (irc:RhodiumToad)
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: > Tom> Uh, why would you not just look to see if the ammarkpos/amrestrpos > Tom> fields are non-null? > We don't (in the back branches) seem to have a pointer to the > IndexAmRoutine handy, only the oid? Oh, sorry, I misread your comment to be that you wanted to add a field to IndexAmRoutine. You're right, the real issue here is that ExecSupportsMarkRestore lacks any convenient access to the needed info, and we need to add a bool to IndexOptInfo to fix that. I don't see any compelling reason why you couldn't add the field at the end in the back branches; that's what we usually do to avoid ABI breaks. Although actually (counts fields...) it looks like there's at least one pad byte after amcanparallel, so you could add a bool there without any ABI consequence, resulting in a reasonably natural field order in all branches. regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> Oh, sorry, I misread your comment to be that you wanted to add a Tom> field to IndexAmRoutine. You're right, the real issue here is that Tom> ExecSupportsMarkRestore lacks any convenient access to the needed Tom> info, and we need to add a bool to IndexOptInfo to fix that. Tom> I don't see any compelling reason why you couldn't add the field Tom> at the end in the back branches; that's what we usually do to Tom> avoid ABI breaks. Although actually (counts fields...) it looks Tom> like there's at least one pad byte after amcanparallel, so you Tom> could add a bool there without any ABI consequence, resulting in a Tom> reasonably natural field order in all branches. I guess that's close enough; this should suffice then. -- Andrew (irc:RhodiumToad) diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c index e2154ba86a..0c10f1d35c 100644 --- a/src/backend/executor/execAmi.c +++ b/src/backend/executor/execAmi.c @@ -417,6 +417,11 @@ ExecSupportsMarkRestore(Path *pathnode) { case T_IndexScan: case T_IndexOnlyScan: + /* + * Not all index types support mark/restore. + */ + return castNode(IndexPath, pathnode)->indexinfo->amcanmarkpos; + case T_Material: case T_Sort: return true; diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index 52c01eb86b..3e94256d34 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -284,6 +284,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, info->amhasgettuple = (amroutine->amgettuple != NULL); info->amhasgetbitmap = amroutine->amgetbitmap != NULL && relation->rd_tableam->scan_bitmap_next_block != NULL; + info->amcanmarkpos = (amroutine->ammarkpos != NULL && + amroutine->amrestrpos != NULL); info->amcostestimate = amroutine->amcostestimate; Assert(info->amcostestimate != NULL); diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h index abe6f570e3..5a10c1855d 100644 --- a/src/include/nodes/pathnodes.h +++ b/src/include/nodes/pathnodes.h @@ -864,6 +864,7 @@ struct IndexOptInfo bool amhasgettuple; /* does AM have amgettuple interface? */ bool amhasgetbitmap; /* does AM have amgetbitmap interface? */ bool amcanparallel; /* does AM support parallel scan? */ + bool amcanmarkpos; /* does AM support mark/restore? */ /* Rather than include amapi.h here, we declare amcostestimate like this */ void (*amcostestimate) (); /* AM's cost estimator */ };
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > I guess that's close enough; this should suffice then. Looks about right. Not sure if we need to bother with a regression test case; once that's in, it'd be hard to break it. regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: >> I guess that's close enough; this should suffice then. Tom> Looks about right. Not sure if we need to bother with a regression Tom> test case; once that's in, it'd be hard to break it. We could check the EXPLAIN output (since the Materialize node would show up), but it's not easy to get stable plans since the choice of which path to put on the outside is not fixed. Based on what I found when actually testing the code, it probably wouldn't be worth the effort. -- Andrew (irc:RhodiumToad)
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: > Tom> Looks about right. Not sure if we need to bother with a regression > Tom> test case; once that's in, it'd be hard to break it. > We could check the EXPLAIN output (since the Materialize node would show > up), but it's not easy to get stable plans since the choice of which > path to put on the outside is not fixed. Based on what I found when > actually testing the code, it probably wouldn't be worth the effort. If it's not easy to test, I agree it's not worth it. (Given how long it took anyone to notice this, the difficulty of making a stable test case is unsurprising, perhaps.) regards, tom lane