Thread: pgsql: Fix incorrect matching of subexpressions in outer-join plan node

Fix incorrect matching of subexpressions in outer-join plan nodes.

Previously we would re-use input subexpressions in all expression trees
attached to a Join plan node.  However, if it's an outer join and the
subexpression appears in the nullable-side input, this is potentially
incorrect for apparently-matching subexpressions that came from above
the outer join (ie, targetlist and qpqual expressions), because the
executor will treat the subexpression value as NULL when maybe it should
not be.

The case is fairly hard to hit because (a) you need a non-strict
subexpression (else NULL is correct), and (b) we don't usually compute
expressions in the outputs of non-toplevel plan nodes.  But we might do
so if the expressions are sort keys for a mergejoin, for example.

Probably in the long run we should make a more explicit distinction between
Vars appearing above and below an outer join, but that will be a major
planner redesign and not at all back-patchable.  For the moment, just hack
set_join_references so that it will not match any non-Var expressions
coming from nullable inputs to expressions that came from above the join.
(This is somewhat overkill, in that a strict expression could still be
matched, but it doesn't seem worth the effort to check that.)

Per report from Qingqing Zhou.  The added regression test case is based
on his example.

This has been broken for a very long time, so back-patch to all active
branches.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/ca6805338fba010cc3f8b842905d7a62e280b7ab

Modified Files
--------------
src/backend/optimizer/plan/setrefs.c |   67 +++++++++++++++++++++++++++-------
src/test/regress/expected/join.out   |   48 ++++++++++++++++++++++++
src/test/regress/sql/join.sql        |   20 ++++++++++
3 files changed, 121 insertions(+), 14 deletions(-)


Re: pgsql: Fix incorrect matching of subexpressions in outer-join plan node

From
Qingqing Zhou
Date:
On Sat, Apr 4, 2015 at 4:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> planner redesign and not at all back-patchable.  For the moment, just hack
> set_join_references so that it will not match any non-Var expressions
> coming from nullable inputs to expressions that came from above the join.
> (This is somewhat overkill, in that a strict expression could still be
> matched, but it doesn't seem worth the effort to check that.)
>

Just curious, checking an expression strictness is convenient in
build_tlist_index() like this:

   TargetEntry *tle = (TargetEntry *) lfirst(l);

+  if (contain_nonstrict_functions((Node *)tle))
+            *hasnonstrict = true;
+

And this will almost avoid the overkill at once. Why not do this?

Thanks,
Qingqing


Qingqing Zhou <zhouqq.postgres@gmail.com> writes:
> On Sat, Apr 4, 2015 at 4:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> (This is somewhat overkill, in that a strict expression could still be
>> matched, but it doesn't seem worth the effort to check that.)

> Just curious, checking an expression strictness is convenient in
> build_tlist_index() like this:

>    TargetEntry *tle = (TargetEntry *) lfirst(l);

> +  if (contain_nonstrict_functions((Node *)tle))
> +            *hasnonstrict = true;
> +

> And this will almost avoid the overkill at once. Why not do this?

Because it's expensive (a syscache lookup per function or operator).
And that test alone would be insufficient anyway: you'd also have to
check that there was an appropriate Var in the expression, cf the
comment for contain_nonstrict_functions.

            regards, tom lane


Re: pgsql: Fix incorrect matching of subexpressions in outer-join plan node

From
Qingqing Zhou
Date:
On Mon, Apr 6, 2015 at 4:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Because it's expensive (a syscache lookup per function or operator).
> And that test alone would be insufficient anyway: you'd also have to
> check that there was an appropriate Var in the expression, cf the
> comment for contain_nonstrict_functions.
>

Looks like not only contain_nonstrict_functions() is expensive, but
other contain_xxx family members.

In FuncExpr structure, for example, we already cached values like
retset, resulttype, but why not volatile and isstrict? In this way, we
avoid cache lookup every time. So as OpExpr etc.

Regards,
Qingqing


[ catching up on email post-vacation ]

Qingqing Zhou <zhouqq.postgres@gmail.com> writes:
> On Mon, Apr 6, 2015 at 4:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Because it's expensive (a syscache lookup per function or operator).
>> And that test alone would be insufficient anyway: you'd also have to
>> check that there was an appropriate Var in the expression, cf the
>> comment for contain_nonstrict_functions.

> Looks like not only contain_nonstrict_functions() is expensive, but
> other contain_xxx family members.

> In FuncExpr structure, for example, we already cached values like
> retset, resulttype, but why not volatile and isstrict? In this way, we
> avoid cache lookup every time. So as OpExpr etc.

The reason we don't store those in the expression tree is that changing
those properties (eg with CREATE OR REPLACE FUNCTION) would break stored
views referencing the function.  It's okay to store resulttype and retset
because we disallow changing the output type (including set-ness) of a
function; but we don't want to disallow changing strictness for instance.

            regards, tom lane


Re: pgsql: Fix incorrect matching of subexpressions in outer-join plan node

From
Andres Freund
Date:
On 2015-04-24 13:30:49 -0400, Tom Lane wrote:
> [ catching up on email post-vacation ]

Welcome back Tom! Hope you had a nice and relaxing time.

> It's okay to store resulttype and retset
> because we disallow changing the output type (including set-ness) of a
> function; but we don't want to disallow changing strictness for
> instance.

I don't think we forbid changing the volatility. I'm pretty sure I've
changed that using CREATE OR REPLACE in the past.

Yep:
postgres[l]=# CREATE FUNCTION blarg() RETURNS int VOLATILE LANGUAGE SQL AS $$SELECT 1;$$;
CREATE FUNCTION
postgres[l]=# CREATE OR REPLACE FUNCTION blarg() RETURNS int STABLE LANGUAGE SQL AS $$SELECT 1;$$;
CREATE FUNCTION

Greetings,

Andres Freund