Thread: WIP: Hierarchical Queries - stage 1
Hi everyone, After spending several days reading PostgreSQL source code (and another couple of days coding), I've managed to come up with some alpha code that attempts to implement non-recursive WITH common table expressions. Having got this far, I feel that I need to ask for advice from some more experienced PostgreSQL hackers and so I post the alpha version here. The patch attempts to alter the parser grammar, and using code modified from that used when processing subselects in the FROM clause, attempts to treat the CTE as if it were presented the same as a FROM subselect. My main issue at the moment is that the code in transformFromClauseItem seems a terrible hack, mainly because the grammar returns each string within the FROM clause as a RangeVar, and transformFromClauseItem assumes that each RangeVar represents a physical relation. Of course, this is not the case when referencing a CTE and so the code first checks to see if an entry has already been created when processing the WITH clause; if it does then we return NULL to indicate that transformFromClause should do nothing. Messy, but I wanted to see what other developers thought before jumping in and rewriting this part of the code. Another point to think about is what should a query return if the SELECT doesn't refer to a CTE? For example: - WITH foo(a, b) AS (SELECT * FROM pg_class) SELECT * FROM pg_class; Since I've inserted the CTE as a range table entry of type RTE_SUBQUERY then the current behaviour is to perform a cross-join between foo and bar which I'm not sure is the correct behaviour since CTEs seem to be more like views in this respect. Finally, the current code fails when supplying an additional alias to a CTE in the select statement and then trying to refer to it, e.g. - with myrel(p1) as (select oid from pg_class) select myrel.p1 from myrel AS foo, pg_class AS bar WHERE myrel.p1 = bar.oid; -- WORKS - with myrel(p1) as (select oid from pg_class) select myrel.p1 from myrel AS foo, pg_class AS bar WHERE foo.p1 = bar.oid; -- FAILS ERROR: missing FROM-clause entry for table "foo" at character 103 So in this case, should foo be accepted as a valid alias for myrel? My feeling is that I will end up with an RTE_CTE range table entry kind which borrows from the current subquery behaviour, but it would be very useful to see where the similarity between the two range table entry types ends. TIA, Mark.
Attachment
Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes: > My main issue at the moment is that the code in transformFromClauseItem > seems a terrible hack, mainly because the grammar returns each string > within the FROM clause as a RangeVar, and transformFromClauseItem > assumes that each RangeVar represents a physical relation. Of course, > this is not the case when referencing a CTE and so the code first checks > to see if an entry has already been created when processing the WITH > clause; if it does then we return NULL to indicate that > transformFromClause should do nothing. Messy, but I wanted to see what > other developers thought before jumping in and rewriting this part of > the code. You really can't get away with having the identical representation for CTEs and ordinary sub-selects in the range table. For instance, it looks like your patch will think that select ... from (select ...) as x, x, ... is legal when it certainly is not. I think you need either a new RTEKind or an additional flag in the RTE to show that it's a CTE rather than a plain subselect. I'm not entirely sure that you even want the CTEs in the rangetable at all --- that still needs some thought. > Another point to think about is what should a query return if the SELECT > doesn't refer to a CTE? The spec ought to make this perfectly clear ... or perhaps not so clear, but I'm sure it's defined. > - with myrel(p1) as (select oid from pg_class) select myrel.p1 from > myrel AS foo, pg_class AS bar WHERE foo.p1 = bar.oid; -- FAILS > So in this case, should foo be accepted as a valid alias for myrel? This comes back to the question of whether the CTE per se should be an RTE at all. Maybe only the reference to it should be an RTE. The behavior when seeing a plain RangeVar in FROM would be to first search the side list of valid CTEs, and only on failure go looking for a real table. regards, tom lane
Hi Tom, Thanks for your initial thoughts on this. On Wed, 2006-09-20 at 20:13 -0400, Tom Lane wrote: (cut) > You really can't get away with having the identical representation for > CTEs and ordinary sub-selects in the range table. For instance, it > looks like your patch will think that > > select ... from (select ...) as x, x, ... > > is legal when it certainly is not. I think you need either a new > RTEKind or an additional flag in the RTE to show that it's a CTE rather > than a plain subselect. I'm not entirely sure that you even want the > CTEs in the rangetable at all --- that still needs some thought. For semantic reasons, I can see why you are questioning whether or not the CTE should be contained within the rangetable - there is an implicit hint that RTEs reflect entries within the FROM clause, but then I also see that the rewriter adds RTEs when substituting view definitions into queries. The comments in parsenodes.h also suggest that an RTE is a namespace/data source reference for a named entity within the query. The main problem I can see with keeping the CTEs outside the rangetable is that according to the source, jointree nodes must currently have RANGETBLREF nodes as leaf nodes; as I understand it, your suggestion of maintaining the CTEs separately would involve something along the lines of keeping a separate CTETable and creating some form of CTETBLREF node that could be referenced within the jointree. While arguably it may be semantically neater, it appears to involve quite a bit of extra work... could you explain in more detail as to why you feel that CTEs should remain outside the rangetable? > This comes back to the question of whether the CTE per se should be an > RTE at all. Maybe only the reference to it should be an RTE. The > behavior when seeing a plain RangeVar in FROM would be to first search > the side list of valid CTEs, and only on failure go looking for a real > table. This is effectively what the patch does, albeit not particularly elegantly. I'll spend some time on making those changes a bit more refined. Kind regards, Mark.
Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes: > The main problem I can see with keeping the CTEs outside the rangetable > is that according to the source, jointree nodes must currently have > RANGETBLREF nodes as leaf nodes; as I understand it, your suggestion of > maintaining the CTEs separately would involve something along the lines > of keeping a separate CTETable and creating some form of CTETBLREF node > that could be referenced within the jointree. No, what I'm thinking is that a *reference* to a CTE, from within the main query's FROM list, would create a "CTERef" RTE and then you'd have a normal RANGETBLREF node linking to that in the jointree. This solves the problem of where do you put the alias: on the RTE. What's not clear to me at this point is whether there can be multiple references in a query to the same CTE --- if there can, I suspect you must have a data structure like this. regards, tom lane