Re: Common Table Expressions (WITH RECURSIVE) patch - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Common Table Expressions (WITH RECURSIVE) patch |
Date | |
Msg-id | 13449.1221589943@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Common Table Expressions (WITH RECURSIVE) patch (Tatsuo Ishii <ishii@postgresql.org>) |
Responses |
Re: Common Table Expressions (WITH RECURSIVE) patch
Re: Common Table Expressions (WITH RECURSIVE) patch Re: Common Table Expressions (WITH RECURSIVE) patch Re: Common Table Expressions (WITH RECURSIVE) patch Re: Common Table Expressions (WITH RECURSIVE) patch |
List | pgsql-hackers |
Tatsuo Ishii <ishii@postgresql.org> writes: > Included is the latest patches against CVS HEAD. I spent some time reading this patch. Here are a few comments in no particular order: RangeRecursive node lacks copyfuncs/equalfuncs support. Query.recursive is missed in equalfuncs.c. But rather than fix that, get rid of it entirely. AFAICS the only use is in qual_is_pushdown_safe, and what is the value of that test? The callers know perfectly well whether they are operating on a recursive RTE or not. You might as well just delete all the useless qual-pushdown-attempt code from set_recursion_pathlist, and not need to touch qual_is_pushdown_safe at all. Is physical_tlist optimization sensible for RecursiveScan? We seem to use it for every other Scan node type. I dislike putting state into ExecutorState; that makes it impossible to have multiple recursion nodes in one plan tree. It would probably be better for the Recursion and RecursiveScan nodes to talk to each other directly (compare HashJoin/Hash); although since they are not adjacent in the plan tree I admit I'm not sure how to do that. es_disallow_tuplestore doesn't seem to need to be in ExecutorState at all, it could as well be in RecursionState. I don't really like the way that Append nodes are being abused here. It would be better to allow nodeRecursion.c to duplicate a little code from nodeAppend.c, and have the child plans be direct children of the Recursion node. BTW, is it actually possible to have more than two children? I didn't spend enough time analyzing the restrictions in parse_cte to be sure. If there are always just two then you could simplify the representation by treating it like a join node instead of an append. (The RTE_RECURSIVE representation sure makes it look like there can be only two...) Mark/restore support seems useless ... note the comment on ExecSupportsMarkRestore (which should be updated if this code isn't removed). RecursiveScan claims to support backwards fetch, but does not in fact contain code to do so. (Given that it will always be underneath Recursion, which can't do backwards fetch, I see little point in adding such code; fix execAmi.c instead.) ExecInitRecursion doesn't seem to be on the same page about whether it supports backward scan as execAmi.c, either. This comment in nodeRecursivescan.c seems bogus:/* * Do not initialize scan tuple type, result tuple type and * projectioninfo in ExecInitRecursivescan. These types are * initialized after initializing Recursion node. */ because the code seems to be doing exactly what the comment says it doesn't. Numerous comments appear to have been copied-and-pasted and not modified from the original. Somebody will have to go over all that text. ruleutils.c fails completely for non-recursive WITH. It *must* regenerate such a query with a WITH clause, not as a flattened subquery which is what you seem to be doing. This isn't negotiable because the semantics are different. This will mean at least some change in the parsetree representation. Perhaps we could add a bool to subquery RTEs to mark them as coming from a nonrecursive WITH? The tests added for RTE_RECURSIVE seem a bit ugly too. If it thinks that can't happen it should Assert so, not just fall through silently. commentary for ParseState.p_ctenamespace is gratuitously unlike the comment style for the other fields, and p_recursive_namespace isn't documented at all. ParseState.p_in_with_clause is unused, should be removed. The WithClause struct definition is poorly commented. It should be stated that it is used only pre-parse-analysis (assuming that continues to be true after you get done fixing ruleutils.c...), and it doesn't say what the elements of the subquery list are (specifically, what node type). A lot of the other added structs and fields could use better commenting too. For that matter "subquery" is a poor name for WithClause's list of CTEs, especially so since it's hard to search for. It should be a plural name and I'd be inclined to use something like "ctes" not "subqueries". The term "subquery" is too overloaded already, so any place you can refer to a WITH-list member as a CTE you should do so. WithClause node may need a location field, and almost certainly has to be handled somehow in exprLocation(). The error reports in parse_cte.c *desperately* need error locations. Why does transformWithClause do parse_sub_analyze twice? I'm not sure that's even safe, and it's surely unnecessary. Also, what happens if a subquery isn't a SelectStmt? Silently doing nothing doesn't seem like a good plan there. Why are we putting essentially the same information into both p_recursive_namespace and p_ctenamespace? Is there really a need for both lists? The code added to transformFromClauseItem seems quite wrong since it searches both lists even if it found a match in the first one. This whole area looks like it needs refactoring. Costing is all bogus, but we knew that... Why does set_recursion_pathlist think that the subquery might have useful pathkeys? We know it must always be a UNION ALL, no? PlanState.has_recursivescan seems like a complete kluge. Can't it just be removed? It looks to me like it is working around bugs that hopefully aren't there anymore. There is certainly no reason why a recursive CTE should be more in need of rescanning than any other kind of plan. If it is needed then the current implementation is completely broken anyway, since it would only detect a RecursiveScan node that is directly underneath an agg or hash node. Please pay some attention to keeping things in logical, consistent orders. For instance the withClause field was inserted into _copySelectStmt() in a different place from where it was inserted in the actual struct declaration, which is confusing. parseTypeString() ought to check for null withClause. expression_tree_walker/mutator support seems entirely broken for RTE_RECURSIVE RTEs. Shouldn't it be recursing into the subquery? Missed adding non_recursive_query to the "zap unneeded substructure" part of set_plan_references (assuming it really is unneeded). There seem to be quite a number of places where RTE_SUBQUERY RTEs are handled but the patch fails to add RTE_RECURSIVE handling ... It's a really bad idea to use RTE subquery field over again for RTE_RECURSIVE, especially without any comment saying you did that. I would suggest two pointers in the RTE_RECURSIVE field list instead. Do we really have to make RECURSIVE a fully reserved keyword? (Actually, the patch makes it worse than reserved, by failing to add it to the reserved_keywords list.) checkCteTargetList is completely broken: it will only notice illegal sublinks that are at the very top level of a targetlist expression. checkWhereClause is very far short of adequate as well. Need to recurse here, or find some other way. Given that the subexpressions haven't been analyzed yet, this seems a bit messy --- expression_tree_walker doesn't know about pre-analysis node trees, so you can't use it. I'd suggest replacing this whole set of routines with just one recursive routine that doesn't make pre-assumptions about which node types can be found where. Alternatively, is there any way of delaying the validity checks until *after* parse analysis of the expressions, so that you could use expression_tree_walker et al? BTW, it seems like a lot of the logic there could be simplified by depending on the enum ordering RECURSIVE_OTHER > RECURSIVE_SELF > NON_RECURSIVE. There are a number of places that are taking the larger of two values in baroque, hard-to-follow ways. I wonder if checkCteSelectStmt is detecting nonlinearity correctly. Since RECURSIVE_OTHER dominates RECURSIVE_SELF, couldn't it fail to miss the problem in something like (self union (self union other)) ? Maybe what you really need is a bitmask: NON_RECURSIVE = 0, RECURSIVE_SELF = 1, RECURSIVE_OTHER = 2, RECURSIVE_BOTH = 3 /* the OR of RECURSIVE_SELF and RECURSIVE_OTHER */ and then you can merge two values via OR instead of MAX. regards, tom lane
pgsql-hackers by date: