Home > mailing lists

Re: Expression push down from Join Node to below node. - Mailing list pgsql-hackers

From	Andy Fan
Subject	Re: Expression push down from Join Node to below node.
Date	June 14 06:33:25
Msg-id	871prnyp2y.fsf@163.com Whole thread Raw
In response to	Expression push down from Join Node to below node. (Shubhankar Anand Kulkarni <shubhankar.ak@zohocorp.com>)
List	pgsql-hackers

Tree view

Shubhankar Anand Kulkarni <shubhankar.ak@zohocorp.com> writes:

Hi, 
> Apologies for delayed response, 
> I believe there may be a slight misunderstanding regarding the use case we're aiming to improve. Let me clarify it
from
> the beginning.
>
> """
> Example :
> SELECT udf1(t1.a) FROM t1_1000row t1, t2_1row t2 where udf1(t1.a) = t2.a and t1.a > 5;
> """
>
...
>
> Whereas, in your first reply you stated that we need to call the udf1  prior
> to the filter( here t1.a >5 ), hence needed to discuss this point once
> with you.

You changed my query at [1] and then reuse the rest words there.
My query was:

"""
SELECT udf1(t1.a) FROM t1_1000row t1, t2_1row t2 where t2.fid = t1.id; 
"""

and the filter I meant was the "join filter (t2.fid = t1.id)" rather than
the scan filter (t1.a > 5) in your case. 

You may argue that my query is not exactly same with yours, you query
is (where there is a same UDF as targetlist in the join qual): 

"""
SELECT udf1(t1.a) FROM t1_1000row t1, t2_1row t2 where udf1(t1.a) = t2.a and t1.a > 5;
"""

But the keypoint we need to pay much more attention to impacts of such
push down. we have to distinguish the middle cases. e.g.

SELECT udf1(t1.a) FROM t1_1000row t1, t2_1row t2
where udf1(t1.a) = t2.a
and t2.fid = t1.id  (*****)  -- this is added by me.
and t1.a > 5;

Otherwise the UDF would be called more than the current master code.

NOTE: it is possible to run "t2.fid = t1.id" before udf1(t1.a) = t2.a. 

> From execScan.c, we can see that the qual is evaluated first and then
> the projection is formed.

sure..

>> Hash Cond involves two sides of the relations, how could we push down
>> it under one side of it? I can't follow up on this well, have you a draft
>> plan / code for your idea?
>
> We are not pushing down the entire hash condition as a whole which is not possible.
> Instead, we just push those expression, which involves only columns from any one side ( inner side or outer side)
> of the join.
>
> In our case, udf1(t1.a) is a function expression that only involves
> columns from the outer side (t1). Therefore, we can safely push it down to the
> outer join path's target list.
>
> Actually in general, we are assuming that it is safe and not a overhead
> to push down any such expression involved in hash-condition to the respective
> lower node's targetlist, as the expression will be evaluated same number of times
> as it would have been evaluated without the push down.
> This enables us to reuse the result of the expression that's present  in both the join condition and the final
> projection.
>
> If you see in createplan.c, we are calling the following functions:
>
>  hashclauses = get_actual_clauses(best_path->path_hashclauses);
>  /*
>  * Rearrange hashclauses, if needed, so that the outer variable is always
>  * on the left.
>  */
>  hashclauses = get_switched_clauses(best_path->path_hashclauses,
>     best_path->jpath.outerjoinpath->parent->relids);
>  OpExpr *clause = (OpExpr *) linitial(hashclauses);
>
> After this, we know that the outer varaiable is on left side and inner is in right side
> of opExpr so we can recurse into, linitial(clause) and find any FuncExpr ( or any other expr node)
> and then push it to the outer join-path as shown below:
>
>  FuncExpr *fexpr = (FuncExpr *) node;
>
>  best_path->jpath.outerjoinpath->pathtarget->exprs =
>  lappend(best_path->jpath.outerjoinpath->pathtarget->exprs, fexpr);
>
> Hence the tentative draft code would be like:
>
>  OpExpr *clause = (OpExpr *) linitial(hashclauses);
>  node = (Node *) linitial(clause->args)
>  if(IsA(node, FuncExpr)) {               //for now directly checking for funcexpr
>
>  FuncExpr *fexpr = (FuncExpr *) node;
>  best_path->jpath.outerjoinpath->pathtarget->exprs =
>  lappend(best_path->jpath.outerjoinpath->pathtarget->exprs, fexpr);
>  }
>
>  The next step would be to replace the funcExpr in this clause, with a new Var
>
>  newVar = makeVar(
>  OUTER_VAR,
>  list_length(best_path->jpath.outerjoinpath->pathtarget->exprs),
>  exprType(((Node *) fexpr)), exprTypmod((Node *) fexpr),
>  exprCollation((Node *) fexpr), 0
>              );
>
>  linitial(clause->args) = (Node *) newVar; //replace the funcExpr with
>  the new Var.

I am worried about if this is safe to do and there is no such kind of
change in create_plan stage with my current knowledge. But I am sure
the costing will be mismatched between the real (udf is called only in
scan stage after your patch) and the number in explain  (udf is called in
join stage, both targetlist and qual). and It is not easy to say that
the costing mismatch doesn't matter.

In my knowledge, if we could know a UDF push down would never increase
the calls of the UDF before costing with a limitted effort, then we
could push down it and costing with the result and get the bestpath at
the final. 

-- 
Best Regards
Andy Fan

pgsql-hackers by date:

From: Corey Huinker
Date: 14 June, 05:29:35
Subject: Re: pg_dump --with-* options

From: Junwang Zhao
Date: 14 June, 07:10:32
Subject: Re: Add a bound check to TidRangeEval

Re: Expression push down from Join Node to below node. - Mailing list pgsql-hackers

Previous

Next