Re: Expression push down from Join Node to below node. - Mailing list pgsql-hackers
From | Andy Fan |
---|---|
Subject | Re: Expression push down from Join Node to below node. |
Date | |
Msg-id | 871prnyp2y.fsf@163.com Whole thread Raw |
In response to | Expression push down from Join Node to below node. (Shubhankar Anand Kulkarni <shubhankar.ak@zohocorp.com>) |
List | pgsql-hackers |
Shubhankar Anand Kulkarni <shubhankar.ak@zohocorp.com> writes: Hi, > Apologies for delayed response, > I believe there may be a slight misunderstanding regarding the use case we're aiming to improve. Let me clarify it from > the beginning. > > """ > Example : > SELECT udf1(t1.a) FROM t1_1000row t1, t2_1row t2 where udf1(t1.a) = t2.a and t1.a > 5; > """ > ... > > Whereas, in your first reply you stated that we need to call the udf1 prior > to the filter( here t1.a >5 ), hence needed to discuss this point once > with you. You changed my query at [1] and then reuse the rest words there. My query was: """ SELECT udf1(t1.a) FROM t1_1000row t1, t2_1row t2 where t2.fid = t1.id; """ and the filter I meant was the "join filter (t2.fid = t1.id)" rather than the scan filter (t1.a > 5) in your case. You may argue that my query is not exactly same with yours, you query is (where there is a same UDF as targetlist in the join qual): """ SELECT udf1(t1.a) FROM t1_1000row t1, t2_1row t2 where udf1(t1.a) = t2.a and t1.a > 5; """ But the keypoint we need to pay much more attention to impacts of such push down. we have to distinguish the middle cases. e.g. SELECT udf1(t1.a) FROM t1_1000row t1, t2_1row t2 where udf1(t1.a) = t2.a and t2.fid = t1.id (*****) -- this is added by me. and t1.a > 5; Otherwise the UDF would be called more than the current master code. NOTE: it is possible to run "t2.fid = t1.id" before udf1(t1.a) = t2.a. > From execScan.c, we can see that the qual is evaluated first and then > the projection is formed. sure.. >> Hash Cond involves two sides of the relations, how could we push down >> it under one side of it? I can't follow up on this well, have you a draft >> plan / code for your idea? > > We are not pushing down the entire hash condition as a whole which is not possible. > Instead, we just push those expression, which involves only columns from any one side ( inner side or outer side) > of the join. > > In our case, udf1(t1.a) is a function expression that only involves > columns from the outer side (t1). Therefore, we can safely push it down to the > outer join path's target list. > > Actually in general, we are assuming that it is safe and not a overhead > to push down any such expression involved in hash-condition to the respective > lower node's targetlist, as the expression will be evaluated same number of times > as it would have been evaluated without the push down. > This enables us to reuse the result of the expression that's present in both the join condition and the final > projection. > > If you see in createplan.c, we are calling the following functions: > > hashclauses = get_actual_clauses(best_path->path_hashclauses); > /* > * Rearrange hashclauses, if needed, so that the outer variable is always > * on the left. > */ > hashclauses = get_switched_clauses(best_path->path_hashclauses, > best_path->jpath.outerjoinpath->parent->relids); > OpExpr *clause = (OpExpr *) linitial(hashclauses); > > After this, we know that the outer varaiable is on left side and inner is in right side > of opExpr so we can recurse into, linitial(clause) and find any FuncExpr ( or any other expr node) > and then push it to the outer join-path as shown below: > > FuncExpr *fexpr = (FuncExpr *) node; > > best_path->jpath.outerjoinpath->pathtarget->exprs = > lappend(best_path->jpath.outerjoinpath->pathtarget->exprs, fexpr); > > Hence the tentative draft code would be like: > > OpExpr *clause = (OpExpr *) linitial(hashclauses); > node = (Node *) linitial(clause->args) > if(IsA(node, FuncExpr)) { //for now directly checking for funcexpr > > FuncExpr *fexpr = (FuncExpr *) node; > best_path->jpath.outerjoinpath->pathtarget->exprs = > lappend(best_path->jpath.outerjoinpath->pathtarget->exprs, fexpr); > } > > The next step would be to replace the funcExpr in this clause, with a new Var > > newVar = makeVar( > OUTER_VAR, > list_length(best_path->jpath.outerjoinpath->pathtarget->exprs), > exprType(((Node *) fexpr)), exprTypmod((Node *) fexpr), > exprCollation((Node *) fexpr), 0 > ); > > linitial(clause->args) = (Node *) newVar; //replace the funcExpr with > the new Var. I am worried about if this is safe to do and there is no such kind of change in create_plan stage with my current knowledge. But I am sure the costing will be mismatched between the real (udf is called only in scan stage after your patch) and the number in explain (udf is called in join stage, both targetlist and qual). and It is not easy to say that the costing mismatch doesn't matter. In my knowledge, if we could know a UDF push down would never increase the calls of the UDF before costing with a limitted effort, then we could push down it and costing with the result and get the bestpath at the final. -- Best Regards Andy Fan
pgsql-hackers by date: