Re: [BUGS] Improper const-evaluation of HAVING with grouping sets and subquery pullup - Mailing list pgsql-bugs
From | Tom Lane |
---|---|
Subject | Re: [BUGS] Improper const-evaluation of HAVING with grouping sets and subquery pullup |
Date | |
Msg-id | 22850.1515710814@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [BUGS] Improper const-evaluation of HAVING with grouping sets and subquery pullup (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [BUGS] Improper const-evaluation of HAVING with grouping sets and subquery pullup
|
List | pgsql-bugs |
I wrote: > FWIW, I'm not really comfortable that the proposed patch is correct > or complete. It may just need more study to get there. I've done another round of study on this patch. The attached updated version is the same code as Heikki proposed (minus the incorrect restriction to queries with HAVING quals), but I reworked the comments and expanded the regression test cases. One thing that wasn't clear to me before was whether we need wrap_non_vars for this case or not; we don't for outer joins, so I was unconvinced about it here. It turns out we do: the point of the wrapper is to prevent constant folding or other expression preprocessing from merging a pulled-up expression with the surrounding expression, resulting in something that won't match the grouping set expression when it comes time to do that matching. For instance if we have a boolean subquery output expression, say "x = y as cond", and that gets hoisted into an upper expression "not cond", then without the PHV wrapper we will happily simplify that to "x <> y" which will not match the grouping set expression. There's a regression test below that misbehaves if you take out the "wrap_non_vars = true" line. I spent some time thinking about Andrew's observation that we don't really need the wrappers everyplace. It's true, but pullup_replace_vars is far from being able to do the right thing there, and I'm not sure that trying to teach it to do so is reasonable. (I'm inclined to think that the idea I threw out upthread about converting grouping expressions into Vars belonging to a new RTE kind might be the way to go.) In any case I don't think we'd possibly come out with a patch simple enough to back-patch. So let's leave that optimization for future work. I think the attached is probably ready to go, though I've not checked yet whether it will work pre-9.6. Anyone want to do more review? regards, tom lane diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c index 0e2a220..45d82da 100644 *** a/src/backend/optimizer/prep/prepjointree.c --- b/src/backend/optimizer/prep/prepjointree.c *************** pull_up_simple_subquery(PlannerInfo *roo *** 1003,1013 **** /* * The subquery's targetlist items are now in the appropriate form to ! * insert into the top query, but if we are under an outer join then ! * non-nullable items and lateral references may have to be turned into ! * PlaceHolderVars. If we are dealing with an appendrel member then ! * anything that's not a simple Var has to be turned into a ! * PlaceHolderVar. Set up required context data for pullup_replace_vars. */ rvcontext.root = root; rvcontext.targetlist = subquery->targetList; --- 1003,1010 ---- /* * The subquery's targetlist items are now in the appropriate form to ! * insert into the top query, except that we may need to wrap them in ! * PlaceHolderVars. Set up required context data for pullup_replace_vars. */ rvcontext.root = root; rvcontext.targetlist = subquery->targetList; *************** pull_up_simple_subquery(PlannerInfo *roo *** 1019,1032 **** rvcontext.relids = NULL; rvcontext.outer_hasSubLinks = &parse->hasSubLinks; rvcontext.varno = varno; ! rvcontext.need_phvs = (lowest_nulling_outer_join != NULL || ! containing_appendrel != NULL); ! rvcontext.wrap_non_vars = (containing_appendrel != NULL); /* initialize cache array with indexes 0 .. length(tlist) */ rvcontext.rv_cache = palloc0((list_length(subquery->targetList) + 1) * sizeof(Node *)); /* * Replace all of the top query's references to the subquery's outputs * with copies of the adjusted subtlist items, being careful not to * replace any of the jointree structure. (This'd be a lot cleaner if we --- 1016,1064 ---- rvcontext.relids = NULL; rvcontext.outer_hasSubLinks = &parse->hasSubLinks; rvcontext.varno = varno; ! /* these flags will be set below, if needed */ ! rvcontext.need_phvs = false; ! rvcontext.wrap_non_vars = false; /* initialize cache array with indexes 0 .. length(tlist) */ rvcontext.rv_cache = palloc0((list_length(subquery->targetList) + 1) * sizeof(Node *)); /* + * If we are under an outer join then non-nullable items and lateral + * references may have to be turned into PlaceHolderVars. + */ + if (lowest_nulling_outer_join != NULL) + rvcontext.need_phvs = true; + + /* + * If we are dealing with an appendrel member then anything that's not a + * simple Var has to be turned into a PlaceHolderVar. We force this to + * ensure that what we pull up doesn't get merged into a surrounding + * expression during later processing and then fail to match the + * expression actually available from the appendrel. + */ + if (containing_appendrel != NULL) + { + rvcontext.need_phvs = true; + rvcontext.wrap_non_vars = true; + } + + /* + * If the parent query uses grouping sets, we need a PlaceHolderVar for + * anything that's not a simple Var. Again, this ensures that expressions + * retain their separate identity so that they will match grouping set + * columns when appropriate. (It'd be sufficient to wrap values used in + * grouping set columns, and do so only in non-aggregated portions of the + * tlist and havingQual, but that would require a lot of infrastructure + * that pullup_replace_vars hasn't currently got.) + */ + if (parse->groupingSets) + { + rvcontext.need_phvs = true; + rvcontext.wrap_non_vars = true; + } + + /* * Replace all of the top query's references to the subquery's outputs * with copies of the adjusted subtlist items, being careful not to * replace any of the jointree structure. (This'd be a lot cleaner if we diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out index 833d515..cbfdbfd 100644 *** a/src/test/regress/expected/groupingsets.out --- b/src/test/regress/expected/groupingsets.out *************** select g as alias1, g as alias2 *** 389,394 **** --- 389,439 ---- 3 | (6 rows) + -- check that pulled-up subquery outputs still go to null when appropriate + select four, x + from (select four, ten, 'foo'::text as x from tenk1) as t + group by grouping sets (four, x) + having x = 'foo'; + four | x + ------+----- + | foo + (1 row) + + select four, x || 'x' + from (select four, ten, 'foo'::text as x from tenk1) as t + group by grouping sets (four, x) + order by four; + four | ?column? + ------+---------- + 0 | + 1 | + 2 | + 3 | + | foox + (5 rows) + + select (x+y)*1, sum(z) + from (select 1 as x, 2 as y, 3 as z) s + group by grouping sets (x+y, x); + ?column? | sum + ----------+----- + 3 | 3 + | 3 + (2 rows) + + select x, not x as not_x, q2 from + (select *, q1 = 1 as x from int8_tbl i1) as t + group by grouping sets(x, q2) + order by x, q2; + x | not_x | q2 + ---+-------+------------------- + f | t | + | | -4567890123456789 + | | 123 + | | 456 + | | 4567890123456789 + (5 rows) + -- simple rescan tests select a, b, sum(v.x) from (values (1),(2)) v(x), gstest_data(v.x) diff --git a/src/test/regress/sql/groupingsets.sql b/src/test/regress/sql/groupingsets.sql index 2b4ab69..b28d821 100644 *** a/src/test/regress/sql/groupingsets.sql --- b/src/test/regress/sql/groupingsets.sql *************** select g as alias1, g as alias2 *** 152,157 **** --- 152,177 ---- from generate_series(1,3) g group by alias1, rollup(alias2); + -- check that pulled-up subquery outputs still go to null when appropriate + select four, x + from (select four, ten, 'foo'::text as x from tenk1) as t + group by grouping sets (four, x) + having x = 'foo'; + + select four, x || 'x' + from (select four, ten, 'foo'::text as x from tenk1) as t + group by grouping sets (four, x) + order by four; + + select (x+y)*1, sum(z) + from (select 1 as x, 2 as y, 3 as z) s + group by grouping sets (x+y, x); + + select x, not x as not_x, q2 from + (select *, q1 = 1 as x from int8_tbl i1) as t + group by grouping sets(x, q2) + order by x, q2; + -- simple rescan tests select a, b, sum(v.x)
pgsql-bugs by date: