Some problems regarding the self-join elimination code - Mailing list pgsql-hackers
From | Richard Guo |
---|---|
Subject | Some problems regarding the self-join elimination code |
Date | |
Msg-id | CAMbWs49PE3CvnV8vrQ0Dr=HqgZZmX0tdNbzVNJxqc8yg-8kDQQ@mail.gmail.com Whole thread Raw |
Responses |
Re: Some problems regarding the self-join elimination code
|
List | pgsql-hackers |
While working on another patch, I looked at ChangeVarNodes() and found some issues introduced by the self-join elimination commit. I think it'd better to fix or improve them. * ChangeVarNodes_walker includes special handling for RestrictInfo nodes. And it adjusts RestrictInfo.orclause only if the rt_index of the relation to be removed is contained in RestrictInfo.clause_relids. I don't think this is correct. Even if the to-be-removed relation is not present in clause_relids, it can still be found somewhere within the orclause. As an example, consider: create table t (a int primary key, b int, c int); explain (costs off) select * from t t1 inner join t t2 on t1.a = t2.a left join t t3 on t1.c = 1 and (t2.b = t3.b or t2.b = 1); server closed the connection unexpectedly This query triggers an Assert failure in match_orclause_to_indexcol, and the root cause is that the removed relation is still present in the outer_relids of the OR-clauses. To fix, I think we may need to run ChangeVarNodes_walker on the orclause in all cases. * After the adjustment, the num_base_rels is recalculated as: + rinfo->num_base_rels = bms_num_members(rinfo->clause_relids); I don't think this is correct either. num_base_rels is the number of base rels in clause_relids and should not count outer-join relids. And we have no guarantee that clause_relids does not include any outer-join relids. To fix, I think we should exclude all outer-join relids. Perhaps we can leverage root->outer_join_rels to achieve this. * I'm not sure why the self-join elimination commit removed all the comments in ChangeVarNodes(Extended). I think these comments are very helpful for understanding the code. - /* - * Must be prepared to start with a Query or a bare expression tree; if - * it's a Query, go straight to query_tree_walker to make sure that - * sublevels_up doesn't get incremented prematurely. - */ if (node && IsA(node, Query)) { Query *qry = (Query *) node; - /* - * If we are starting at a Query, and sublevels_up is zero, then we - * must also fix rangetable indexes in the Query itself --- namely - * resultRelation, mergeTargetRelation, exclRelIndex and rowMarks - * entries. sublevels_up cannot be zero when recursing into a - * subquery, so there's no need to have the same logic inside - * ChangeVarNodes_walker. - */ if (sublevels_up == 0) { ListCell *l; @@ -701,7 +772,6 @@ ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up) if (qry->mergeTargetRelation == rt_index) qry->mergeTargetRelation = new_index; - /* this is unlikely to ever be used, but ... */ * Speaking of comments, I’ve noticed that some changes are lacking comments. For example, there are almost no comments regarding the special handling of RestrictInfo nodes in ChangeVarNodes_walker, except for an explanation about adding the NullTest. BTW, there's a typo in that explanation: qual() should be equal(). This is not a thorough review of the code; I just randomly looked at ChangeVarNodes(). It's possible that there are other issues that haven't been discovered, but I think we should at least fix these. I'll provide a patch for the fixes later. Apologies for being so nitpicky. Thanks Richard
pgsql-hackers by date: