Thread: pgsql: Fix incorrect matching of subexpressions in outer-join plan node
Fix incorrect matching of subexpressions in outer-join plan nodes. Previously we would re-use input subexpressions in all expression trees attached to a Join plan node. However, if it's an outer join and the subexpression appears in the nullable-side input, this is potentially incorrect for apparently-matching subexpressions that came from above the outer join (ie, targetlist and qpqual expressions), because the executor will treat the subexpression value as NULL when maybe it should not be. The case is fairly hard to hit because (a) you need a non-strict subexpression (else NULL is correct), and (b) we don't usually compute expressions in the outputs of non-toplevel plan nodes. But we might do so if the expressions are sort keys for a mergejoin, for example. Probably in the long run we should make a more explicit distinction between Vars appearing above and below an outer join, but that will be a major planner redesign and not at all back-patchable. For the moment, just hack set_join_references so that it will not match any non-Var expressions coming from nullable inputs to expressions that came from above the join. (This is somewhat overkill, in that a strict expression could still be matched, but it doesn't seem worth the effort to check that.) Per report from Qingqing Zhou. The added regression test case is based on his example. This has been broken for a very long time, so back-patch to all active branches. Branch ------ master Details ------- http://git.postgresql.org/pg/commitdiff/ca6805338fba010cc3f8b842905d7a62e280b7ab Modified Files -------------- src/backend/optimizer/plan/setrefs.c | 67 +++++++++++++++++++++++++++------- src/test/regress/expected/join.out | 48 ++++++++++++++++++++++++ src/test/regress/sql/join.sql | 20 ++++++++++ 3 files changed, 121 insertions(+), 14 deletions(-)
Re: pgsql: Fix incorrect matching of subexpressions in outer-join plan node
From
Qingqing Zhou
Date:
On Sat, Apr 4, 2015 at 4:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > planner redesign and not at all back-patchable. For the moment, just hack > set_join_references so that it will not match any non-Var expressions > coming from nullable inputs to expressions that came from above the join. > (This is somewhat overkill, in that a strict expression could still be > matched, but it doesn't seem worth the effort to check that.) > Just curious, checking an expression strictness is convenient in build_tlist_index() like this: TargetEntry *tle = (TargetEntry *) lfirst(l); + if (contain_nonstrict_functions((Node *)tle)) + *hasnonstrict = true; + And this will almost avoid the overkill at once. Why not do this? Thanks, Qingqing
Qingqing Zhou <zhouqq.postgres@gmail.com> writes: > On Sat, Apr 4, 2015 at 4:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> (This is somewhat overkill, in that a strict expression could still be >> matched, but it doesn't seem worth the effort to check that.) > Just curious, checking an expression strictness is convenient in > build_tlist_index() like this: > TargetEntry *tle = (TargetEntry *) lfirst(l); > + if (contain_nonstrict_functions((Node *)tle)) > + *hasnonstrict = true; > + > And this will almost avoid the overkill at once. Why not do this? Because it's expensive (a syscache lookup per function or operator). And that test alone would be insufficient anyway: you'd also have to check that there was an appropriate Var in the expression, cf the comment for contain_nonstrict_functions. regards, tom lane
Re: pgsql: Fix incorrect matching of subexpressions in outer-join plan node
From
Qingqing Zhou
Date:
On Mon, Apr 6, 2015 at 4:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Because it's expensive (a syscache lookup per function or operator). > And that test alone would be insufficient anyway: you'd also have to > check that there was an appropriate Var in the expression, cf the > comment for contain_nonstrict_functions. > Looks like not only contain_nonstrict_functions() is expensive, but other contain_xxx family members. In FuncExpr structure, for example, we already cached values like retset, resulttype, but why not volatile and isstrict? In this way, we avoid cache lookup every time. So as OpExpr etc. Regards, Qingqing
[ catching up on email post-vacation ] Qingqing Zhou <zhouqq.postgres@gmail.com> writes: > On Mon, Apr 6, 2015 at 4:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Because it's expensive (a syscache lookup per function or operator). >> And that test alone would be insufficient anyway: you'd also have to >> check that there was an appropriate Var in the expression, cf the >> comment for contain_nonstrict_functions. > Looks like not only contain_nonstrict_functions() is expensive, but > other contain_xxx family members. > In FuncExpr structure, for example, we already cached values like > retset, resulttype, but why not volatile and isstrict? In this way, we > avoid cache lookup every time. So as OpExpr etc. The reason we don't store those in the expression tree is that changing those properties (eg with CREATE OR REPLACE FUNCTION) would break stored views referencing the function. It's okay to store resulttype and retset because we disallow changing the output type (including set-ness) of a function; but we don't want to disallow changing strictness for instance. regards, tom lane
Re: pgsql: Fix incorrect matching of subexpressions in outer-join plan node
From
Andres Freund
Date:
On 2015-04-24 13:30:49 -0400, Tom Lane wrote: > [ catching up on email post-vacation ] Welcome back Tom! Hope you had a nice and relaxing time. > It's okay to store resulttype and retset > because we disallow changing the output type (including set-ness) of a > function; but we don't want to disallow changing strictness for > instance. I don't think we forbid changing the volatility. I'm pretty sure I've changed that using CREATE OR REPLACE in the past. Yep: postgres[l]=# CREATE FUNCTION blarg() RETURNS int VOLATILE LANGUAGE SQL AS $$SELECT 1;$$; CREATE FUNCTION postgres[l]=# CREATE OR REPLACE FUNCTION blarg() RETURNS int STABLE LANGUAGE SQL AS $$SELECT 1;$$; CREATE FUNCTION Greetings, Andres Freund