Thread: Common Table Expressions (WITH RECURSIVE) patch
These are my initial comments about the Common Table Expressions (CTE) patch, also known as WITH [RECURSIVE]. These comments are based on the patch here: http://archives.postgresql.org/pgsql-patches/2008-08/msg00021.php This is a semantically complex feature, and the standard is fairly complex as well. So I'm approaching this by making my own interpretations from the standard first (I included my interpretations and section references at the end of this email) and comparing to the behavior of the patch. The following examples may be inconsistent with the standard. Some have already been mentioned, and I don't think they all need to be fixed for 8.4, but I mention them here for completeness. * Mutual Recursion: with recursive foo(i) as (values(1) union all select i+1 from bar where i < 10), bar(i) as (values(1) union all selecti+1 from foo where i < 10) select * from foo; ERROR: mutual recursive call is not supported The standard allows mutual recursion. * Single Evaluation: with foo(i) as (select random() as i) select * from foo union all select * from foo; i ------------------- 0.233165248762816 0.62126633618027 (2 rows) The standard specifies that non-recursive WITH should be evaluated once. * RHS only: with recursive foo(i) as (select i+1 from foo where i < 10 union all values(1)) select * from foo; ERROR: Left hand sideof UNION ALL must be a non-recursive term in a recursive query The standard does not require that the recursive term be on the RHS. * UNION ALL only: with recursive foo(i) as (values(1) union select i+1 from foo where i < 10) select * from foo; ERROR: non-recursive termand recursive term must be combined with UNION ALL The standard seems to allow UNION ALL, UNION, INTERSECT, and EXCEPT (when the recursive term is not on the RHS of the EXCEPT). * Binary recursion and subselect strangeness: with recursive foo(i) as (values (1) union all select * from (select i+1 from foo where i < 10 union all select i+1 from foo where i < X) t) select * from foo; Produces 10 rows of output regardless of what "X" is. This should be fixed for 8.4. Also, this is non-linear recursion, which the standard seems to disallow. * Multiple recursive references: with recursive foo(i) as (values (1) union all select i+1 from foo where i < 10 union all select i+1 from foowhere i < 20) select * from foo; ERROR: Left hand side of UNION ALL must be a non-recursive term in a recursive query If we're going to allow non-linear recursion (which the standard does not), this seems like it should be a valid case. * Strange result with except: with recursive foo(i) as (values (1) union all select * from (select i+1 from foo where i < 10 except selecti+1 from foo where i < 5) t) select * from foo; ERROR: table "foo" has 0 columns available but 1 columns specified This query works if you replace "except" with "union". This should be fixed for 8.4. * Aggregates allowed: with recursive foo(i) as (values(1) union all select max(i)+1 from foo where i < 10) select * from foo; Aggregates should be blocked according to the standard. Also, causes an infinite loop. This should be fixed for 8.4. * DISTINCT should supress duplicates: with recursive foo(i) as (select distinct * from (values(1),(2)) t union all select distinct i+1 from foo where i< 10) select * from foo; This outputs a lot of duplicates, but they should be supressed according to the standard. This query is essentially the same as supporting UNION for recursive queries, so we should either fix both for 8.4 or block both for consistency. * outer joins on a recursive reference should be blocked: with recursive foo(i) as (values(1) union all select i+1 from foo left join (values(1)) t on (i=column1)) select *from foo; Causes an infinite loop, but the standard says using an outer join in this situation should be prohibited. This should befixed for 8.4. * ORDER BY, LIMIT, and OFFSET are rejected for recursive queries. The standard does not seem to say that these should be rejected. The following are my interpretations of relevant parts of the SQL standard (200n), and the associated sections. These are only my interpretation, so let me know if you interpret the standard differently. Non-linear recursion forbidden: 7.13: Syntax Rules: 2.g.ii My interpretation of 2.g.ii.2 is that WQN[k] and WQN[l] maybe the same <query name>. 7.13: Syntax Rules: 2.g.iv EXCEPT can't be used for recursive queries if a recursive reference appears on the RHS: 7.13: Syntax Rules: 2.g.iii.1 INTERSECT ALL/EXCEPT ALL can't be used for recursive queries: 7.13: Syntax Rules: 2.g.iii.5 recursive references must appear in FROM clause: 7.13: Syntax Rules: 2.g.iii.3 My interpretation of this rule is that itdoes not allow a recursive reference in a subquery in the targetlist or a subquery in the where clause. stratum defined: 7.13: Syntax Rules: 2.f 7.13: Syntax Rules: 2.g.i.1 recursive query must have anchor for every stratum: 7.13: Syntax Rules: 2.g.i.3 outer joins not allowed to join recursive references: 7.13: Syntax Rules: 2.g.iii.6 7.13: Syntax Rules: 2.g.iii.7 Aggregates/HAVING disallowed: 7.13: Syntax Rules: 2.g.iii.4.B Mutual recursion defined: 7.13: Syntax Rules: 2.g.i.1 Evaluate each WITH entry once, even if it's referenced multiple times: 7.13: General Rules: 1 7.13: General Rules: 2.b Seealso: http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php Evaluation order with mutual recursion: 7.13: General Rules: 2.a 7.13: General Rules: 2.b Evaluation semantics: 7.13: General Rules: 2.c DISTINCT: 7.13: General Rules: 2.c.iv 7.13: General Rules: 2.c.ix.3.A I will provide comments about the code and documentation soon. This is a very useful feature. Regards,Jeff Davis
>>>>> "Jeff" == Jeff Davis <pgsql@j-davis.com> writes: Jeff> * Mutual Recursion: This limitation isn't at all uncommon in other implementations; DB2 docs for example say: "If more than one common table expression is defined in the same statement, cyclic references between the common table expressions are not permitted. A cyclic reference occurs when two common table expressions dt1 and dt2 are created such that dt1 refers to dt2 and dt2 refers to dt1." http://publib.boulder.ibm.com/infocenter/iadthelp/v7r0/index.jsp?topic=/com.ibm.etools.iseries.langref2.doc/rbafzmst295.htm MSSQL's documentation is less clear, but it also appears not to allow mutual recursion (not allowing forward references to CTEs). "A CTE can reference itself and previously defined CTEs in the same WITH clause. Forward referencing is not allowed." http://msdn.microsoft.com/en-us/library/ms175972.aspx Jeff> * RHS only: Jeff> with recursiveJeff> foo(i) as (select i+1 from foo where i < 10 union all values(1))Jeff> select * from foo;Jeff> ERROR: Left hand side of UNION ALL must be a non-recursive term in aJeff> recursive query Jeff> The standard does not require that the recursive term be onJeff> the RHS. Again, the standard may not, but existing implementations do: MSSQL: "The recursive CTE definition must contain at least two CTE query definitions, an anchor member and a recursive member. Multiple anchor members and recursive members can be defined; however, all anchor member query definitions must be put before the first recursive member definition. All CTE query definitions are anchor members unless they reference the CTE itself. " DB2: "The following restrictions apply to a recursive common-table-expression: * A list of column-names must be specified followingthe table-identifier. * The UNION ALL set operator must be specified. * The first fullselect of the firstunion (the initialization fullselect) must not include a reference to the common-table-expression itself inany FROM clause. " Jeff> * UNION ALL only: Jeff> with recursiveJeff> foo(i) as (values(1) union select i+1 from foo where i < 10)Jeff> select * from foo;Jeff> ERROR: non-recursive term and recursive term must be combined withJeff> UNION ALL Jeff> The standard seems to allow UNION ALL, UNION, INTERSECT, andJeff> EXCEPT (when the recursive term is not on theRHS of theJeff> EXCEPT). Again, existing implementations disagree. See above for DB2, and for MSSQL: "Anchor members must be combined by one of these set operators: UNION ALL, UNION, INTERSECT, or EXCEPT. UNION ALL is the only set operator allowed between the last anchor member and first recursive member, and when combining multiple recursive members." Jeff> * Binary recursion and subselect strangeness: Jeff> with recursive foo(i) asJeff> (values (1)Jeff> union allJeff> select * fromJeff> (select i+1 fromfoo where i < 10Jeff> union allJeff> select i+1 from foo where i < X) t)Jeff> select * from foo; Jeff> Produces 10 rows of output regardless of what "X" is. ThisJeff> should be fixed for 8.4. Also, this is non-linearrecursion,Jeff> which the standard seems to disallow. That looks like it should be disallowed somehow. Jeff> * Multiple recursive references:[snip] Jeff> If we're going to allow non-linear recursion (which theJeff> standard does not), this seems like it should be avalidJeff> case. We probably shouldn't allow it (as above). [snip * Strange result with except: which looks like a bug] Jeff> * Aggregates allowed: which Jeff> with recursive foo(i) asJeff> (values(1)Jeff> union allJeff> select max(i)+1 from foo where i < 10)Jeff> select * from foo; Jeff> Aggregates should be blocked according to the standard.Jeff> Also, causes an infinite loop. This should be fixedfor 8.4. Does the standard require anywhere that non-conforming statements must be diagnosed? (seems impractical, since it would forbid extensions) Jeff> * DISTINCT should supress duplicates: Jeff> with recursive foo(i) asJeff> (select distinct * from (values(1),(2)) tJeff> union allJeff> select distincti+1 from foo where i < 10)Jeff> select * from foo; Jeff> This outputs a lot of duplicates, but they should beJeff> supressed according to the standard. This query is essentiallyJeff>the same as supporting UNION for recursive queries, so weJeff> should either fix both for 8.4 or block bothfor consistency. Yeah, though the standard's use of DISTINCT in this way is something of a violation of the POLA. Jeff> * outer joins on a recursive reference should be blocked: Jeff> with recursive foo(i) asJeff> (values(1)Jeff> union allJeff> select i+1 from foo left join (values(1))t on (i=column1))Jeff> select * from foo; Jeff> Causes an infinite loop, but the standard says using an outerJeff> join in this situation should be prohibited.This should beJeff> fixed for 8.4. No. This has already been discussed; it's neither possible nor desirable to diagnose all cases which can result in infinite loops, and there are important types of queries which would be unnecessarily forbidden. Besides, you've misread the spec here: it prohibits the recursive reference ONLY on the nullable side of the join. You cite: Jeff> outer joins not allowed to join recursive references:Jeff> 7.13: Syntax Rules: 2.g.iii.6Jeff> 7.13: Syntax Rules:2.g.iii.7 6) WQEi shall not contain a <qualified join> QJ in which: A) QJ immediately contains a <join type> that specifies FULL and a <table reference> or <table factor> that containsa <query name> referencing WQNj. [no recursive FULL JOIN at all] B) QJ immediately contains a <join type> that specifies LEFT and a <table factor> following the <join type> that containsa <query name> referencing WQNj. [note "following the <join type>", so WQNj LEFT JOIN foo is allowed, but foo LEFT JOIN WQNj is not] C) QJ immediately contains a <join type> that specifies RIGHT and a <table reference> preceding the <join type> thatcontains a <query name> referencing WQNj. [note "preceding the <join type>", so foo RIGHT JOIN WQNj is allowed, but WQNj RIGHT JOIN foo is not] para (7) is identical but for natural rather than qualified joins. Jeff> * ORDER BY, LIMIT, and OFFSET are rejected for recursiveJeff> queries. The standard does not seem to say that theseshould beJeff> rejected. Note that supporting those in subqueries (including CTEs) is a separate optional feature of the standard. -- Andrew (irc:RhodiumToad)
Jeff Davis <pgsql@j-davis.com> writes: > * Mutual Recursion: > > with recursive > foo(i) as (values(1) union all select i+1 from bar where i < 10), > bar(i) as (values(1) union all select i+1 from foo where i < 10) > select * from foo; > ERROR: mutual recursive call is not supported > > The standard allows mutual recursion. This seems to be a point of confusion. I originally read the standard and concluded that mutual recursion was an optional feature. Itagaki-san showed me a copy of the spec where it seemed there was a clear blanket prohibition on mutually recursive queries and in fact anything but simple linearly expandable queries. I wonder if there are different versions of the spec floating around on this point. Take a second look at your spec and read on to where it defines "linear" and "expandable". If it doesn't define those terms then it's definitely different from what I read. If it does, read on to see what it does with them. The main reason to define them appeared to be to use them to say that supporting mutual recursion is not required. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!
On Mon, 2008-09-08 at 21:13 +0100, Gregory Stark wrote: > Jeff Davis <pgsql@j-davis.com> writes: > > > * Mutual Recursion: > > > > with recursive > > foo(i) as (values(1) union all select i+1 from bar where i < 10), > > bar(i) as (values(1) union all select i+1 from foo where i < 10) > > select * from foo; > > ERROR: mutual recursive call is not supported > > > > The standard allows mutual recursion. > > This seems to be a point of confusion. I originally read the standard and > concluded that mutual recursion was an optional feature. Itagaki-san showed me > a copy of the spec where it seemed there was a clear blanket prohibition on > mutually recursive queries and in fact anything but simple linearly expandable > queries. I wonder if there are different versions of the spec floating around > on this point. > > Take a second look at your spec and read on to where it defines "linear" and > "expandable". If it doesn't define those terms then it's definitely different > from what I read. If it does, read on to see what it does with them. The main > reason to define them appeared to be to use them to say that supporting mutual > recursion is not required. I think we're reading the same version of the spec. I'm reading 200n. My interpretation (Syntax Rule 2.h) is that expandable is only used to determine whether it can contain a <search or cycle>. That being said, I don't think it should be a requirement for 8.4. The CTE patch is an important feature, and we shouldn't hold it up over something comparatively obscure like mutual recursion. As Andrew Gierth cited, other systems don't support it anyway. It should be kept in mind for future work though. Regards,Jeff Davis
On Mon, 2008-09-08 at 18:08 +0100, Andrew Gierth wrote: > Jeff> * Mutual Recursion: > > This limitation isn't at all uncommon in other implementations; DB2 > docs for example say: As with some other things in my list, this doesn't need to be supported in 8.4. I just wanted to lay out my interpretation of the standard, and places that we might (currently) fall short of it. The fact that other DBMSs don't support mutual recursion is a good indication that it's not important immediately. > Jeff> The standard does not require that the recursive term be on > Jeff> the RHS. > > Again, the standard may not, but existing implementations do: > Again, I don't think we need this for 8.4. However, I think it's probably more important than mutual recursion. > Jeff> * UNION ALL only: > > Jeff> with recursive > Jeff> foo(i) as (values(1) union select i+1 from foo where i < 10) > Jeff> select * from foo; > Jeff> ERROR: non-recursive term and recursive term must be combined with > Jeff> UNION ALL > > Jeff> The standard seems to allow UNION ALL, UNION, INTERSECT, and > Jeff> EXCEPT (when the recursive term is not on the RHS of the > Jeff> EXCEPT). > > Again, existing implementations disagree. See above for DB2, and for > MSSQL: > And again, I agree that it's not important for 8.4. At some point we need to determine what the goalposts are though. Are we copying existing implementations, or are we implementing the standard? > Jeff> Produces 10 rows of output regardless of what "X" is. This > Jeff> should be fixed for 8.4. Also, this is non-linear recursion, > Jeff> which the standard seems to disallow. > > That looks like it should be disallowed somehow. Agreed. I think it should just throw an error, probably. > [snip * Strange result with except: which looks like a bug] > > Jeff> * Aggregates allowed: which > > Jeff> with recursive foo(i) as > Jeff> (values(1) > Jeff> union all > Jeff> select max(i)+1 from foo where i < 10) > Jeff> select * from foo; > > Jeff> Aggregates should be blocked according to the standard. > Jeff> Also, causes an infinite loop. This should be fixed for 8.4. > > Does the standard require anywhere that non-conforming statements must > be diagnosed? (seems impractical, since it would forbid extensions) > 2.g.iii.4.B explicitly says aggregates should be rejected, unless I have misinterpreted. > > Yeah, though the standard's use of DISTINCT in this way is something > of a violation of the POLA. > I agree that's kind of a funny requirement. But that's pretty typical of the SQL standard. If DB2 or SQL Server follow the standard here, we should, too. If not, it's open for discussion. > No. This has already been discussed; it's neither possible nor desirable > to diagnose all cases which can result in infinite loops, and there are > important types of queries which would be unnecessarily forbidden. I didn't say we should forbid all infinite loops. But we should forbid ones that the standard tells us to forbid. > Besides, you've misread the spec here: it prohibits the recursive > reference ONLY on the nullable side of the join. You cite: > Thank you for the correction. It does properly reject the outer joins that the standard says should be rejected. > Jeff> * ORDER BY, LIMIT, and OFFSET are rejected for recursive > Jeff> queries. The standard does not seem to say that these should be > Jeff> rejected. > > Note that supporting those in subqueries (including CTEs) is a separate > optional feature of the standard. > I don't feel strongly about this either way, but I prefer that we are consistent when possible. We do support these things in a subquery, so shouldn't we support them in all subqueries? Regards,Jeff Davis
>>>>> "Jeff" == Jeff Davis <jdavis@truviso.com> writes: Jeff> Aggregates should be blocked according to the standard.Jeff> Also, causes an infinite loop. This should be fixed for8.4. >> Does the standard require anywhere that non-conforming statements>> must be diagnosed? (seems impractical, since it wouldforbid>> extensions) Jeff> 2.g.iii.4.B explicitly says aggregates should be rejected,Jeff> unless I have misinterpreted. Yes, you've misinterpreted. When the spec says that a query "shall not" do such-and-such, it means that a conforming client isn't allowed to do that, it does NOT mean that the server is required to produce an error when it sees it. Chapter and verse on this is given in the Framework doc, at 6.3.3.2: In the Syntax Rules, the term "shall" defines conditions that are required to be true of syntactically conforming SQL language.When such conditions depend on the contents of one or more schemas, they are required to be true just before theactions specified by the General Rules are performed. The treatment of language that does not conform to the SQL Formatsand Syntax Rules is implementation-dependent. If any condition required by Syntax Rules is not satisfied when theevaluation of Access or General Rules is attempted and the implementation is neither processing non-conforming SQL languagenor processing conforming SQL language in a non-conforming manner, then an exception condition is raised: "syntaxerror or access rule violation". Including an aggregate violates a "shall" in a syntax rule, therefore the query is non-conforming, therefore the server can either process it in an implementation-dependent manner or reject it with an exception. >> Yeah, though the standard's use of DISTINCT in this way is something>> of a violation of the POLA. Jeff> I agree that's kind of a funny requirement. But that's prettyJeff> typical of the SQL standard. If DB2 or SQL Serverfollow theJeff> standard here, we should, too. If not, it's open for discussion. MSSQL does not: "The following items are not allowed in the CTE_query_definition of arecursive member: * SELECT DISTINCT * GROUP BY * HAVING * Scalar aggregation * TOP * LEFT, RIGHT, OUTER JOIN (INNER JOIN is allowed) * Subqueries * A hint applied to a recursive reference to a CTE inside a CTE_query_definition. " For DB2 the docs do not appear to specify either way; they don't seem to forbid the use of SELECT DISTINCT inside a recursive CTE, but neither do they seem to mention any unusual effect of including it. Jeff> * ORDER BY, LIMIT, and OFFSET are rejected for recursiveJeff> queries. The standard does not seem to say that theseshould beJeff> rejected.>> Note that supporting those in subqueries (including CTEs) is a>> separate optional featureof the standard. Jeff> I don't feel strongly about this either way, but I prefer that weJeff> are consistent when possible. We do supportthese things in aJeff> subquery, so shouldn't we support them in all subqueries? Ideally we should. DB2 appears to (other than OFFSET which it doesn't seem to have at all). But it's not at all clear that it would be either useful or easy to do so. -- Andrew.
On Mon, 2008-09-08 at 22:53 +0100, Andrew Gierth wrote: > Yes, you've misinterpreted. When the spec says that a query "shall > not" do such-and-such, it means that a conforming client isn't allowed > to do that, it does NOT mean that the server is required to produce an > error when it sees it. > Interesting. Thanks for the clear reference. So, we either need to reject it or define some implementation-dependent behavior, right? The patch currently rejects HAVING, which means that it's a little difficult to use an aggregate effectively. I lean toward rejecting aggregates if we reject HAVING, for consistency. If we allow it, we should allow HAVING as well. > Jeff> I agree that's kind of a funny requirement. But that's pretty > Jeff> typical of the SQL standard. If DB2 or SQL Server follow the > Jeff> standard here, we should, too. If not, it's open for discussion. > > MSSQL does not: Thanks again for a reference. If MSSQL rejects SELECT DISTINCT for a recursive <query expression>, then I think we should reject it. Right now the patch allows it but provides a result that is inconsistent with the standard. If we reject SELECT DISTINCT for recursive queries now, we can always meet the standard later, or decide that the standard behavior is too likely to cause confusion and just continue to block it. > Ideally we should. DB2 appears to (other than OFFSET which it doesn't > seem to have at all). But it's not at all clear that it would be either > useful or easy to do so. Ok. If it's easy to support it should probably be done. As an aside, you seem to be an expert with the standard, have you had a chance to look at the question I asked earlier?: http://archives.postgresql.org/pgsql-hackers/2008-09/msg00487.php Regards,Jeff Davis
Thanks for the review. [I dropped y-asaba@sraoss.co.jp from the Cc list since he has left our company and the email address is being deleted.] I'm going to look into issues which are seem to be bug (of course if you know what to fix, patches are always welcome:-). > These are my initial comments about the Common Table Expressions (CTE) > patch, also known as WITH [RECURSIVE]. These comments are based on the > patch here: > > http://archives.postgresql.org/pgsql-patches/2008-08/msg00021.php > > This is a semantically complex feature, and the standard is fairly > complex as well. So I'm approaching this by making my own > interpretations from the standard first (I included my interpretations > and section references at the end of this email) and comparing to the > behavior of the patch. > > The following examples may be inconsistent with the standard. Some > have already been mentioned, and I don't think they all need to be > fixed for 8.4, but I mention them here for completeness. > > * Mutual Recursion: > > with recursive > foo(i) as (values(1) union all select i+1 from bar where i < 10), > bar(i) as (values(1) union all select i+1 from foo where i < 10) > select * from foo; > ERROR: mutual recursive call is not supported > > The standard allows mutual recursion. The discussion seems to agree that let it leave for post 8.4. > * Single Evaluation: > > with > foo(i) as (select random() as i) > select * from foo union all select * from foo; > i > ------------------- > 0.233165248762816 > 0.62126633618027 > (2 rows) > > The standard specifies that non-recursive WITH should be evaluated > once. What shall we do? I don't think there's a easy way to fix this. Maybe we should not allow WITH clause without RECURISVE? > * RHS only: > > with recursive > foo(i) as (select i+1 from foo where i < 10 union all values(1)) > select * from foo; > ERROR: Left hand side of UNION ALL must be a non-recursive term in a > recursive query > > The standard does not require that the recursive term be on the RHS. The discussion seems to agree that let it leave for post 8.4. > * UNION ALL only: > > with recursive > foo(i) as (values(1) union select i+1 from foo where i < 10) > select * from foo; > ERROR: non-recursive term and recursive term must be combined with > UNION ALL > > The standard seems to allow UNION ALL, UNION, INTERSECT, and EXCEPT > (when the recursive term is not on the RHS of the EXCEPT). The discussion seems to agree that let it leave for post 8.4. > * Binary recursion and subselect strangeness: > > with recursive foo(i) as > (values (1) > union all > select * from > (select i+1 from foo where i < 10 > union all > select i+1 from foo where i < X) t) > select * from foo; > > Produces 10 rows of output regardless of what "X" is. This should be > fixed for 8.4. > Also, this is non-linear recursion, which the standard seems to > disallow. I will try to fix this. However detecting the query being not a non-linear one is not so easy. > * Multiple recursive references: > > with recursive foo(i) as > (values (1) > union all > select i+1 from foo where i < 10 > union all > select i+1 from foo where i < 20) > select * from foo; > ERROR: Left hand side of UNION ALL must be a non-recursive term in a > recursive query > > If we're going to allow non-linear recursion (which the standard > does not), this seems like it should be a valid case. I will try to disallow this. > * Strange result with except: > > with recursive foo(i) as > (values (1) > union all > select * from > (select i+1 from foo where i < 10 > except > select i+1 from foo where i < 5) t) > select * from foo; > ERROR: table "foo" has 0 columns available but 1 columns specified > > This query works if you replace "except" with "union". This should be > fixed for 8.4. I will try to fix this. > * Aggregates allowed: > > with recursive foo(i) as > (values(1) > union all > select max(i)+1 from foo where i < 10) > select * from foo; > > Aggregates should be blocked according to the standard. > Also, causes an infinite loop. This should be fixed for 8.4. I will try to fix this. > * DISTINCT should supress duplicates: > > with recursive foo(i) as > (select distinct * from (values(1),(2)) t > union all > select distinct i+1 from foo where i < 10) > select * from foo; > > This outputs a lot of duplicates, but they should be supressed > according to the standard. This query is essentially the same as > supporting UNION for recursive queries, so we should either fix both for > 8.4 or block both for consistency. I'm not sure if it's possible to fix this. Will look into. > * outer joins on a recursive reference should be blocked: > > with recursive foo(i) as > (values(1) > union all > select i+1 from foo left join (values(1)) t on (i=column1)) > select * from foo; > > Causes an infinite loop, but the standard says using an outer join > in this situation should be prohibited. This should be fixed for 8.4. Not an issue, I think. > * ORDER BY, LIMIT, and OFFSET are rejected for recursive queries. The > standard does not seem to say that these should be rejected. > > > The following are my interpretations of relevant parts of the SQL > standard (200n), and the associated sections. These are only my > interpretation, so let me know if you interpret the standard > differently. > > Non-linear recursion forbidden: > 7.13: Syntax Rules: 2.g.ii > My interpretation of 2.g.ii.2 is that WQN[k] and WQN[l] may be the > same <query name>. > 7.13: Syntax Rules: 2.g.iv > > EXCEPT can't be used for recursive queries if a recursive reference > appears on the RHS: > 7.13: Syntax Rules: 2.g.iii.1 > > INTERSECT ALL/EXCEPT ALL can't be used for recursive queries: > 7.13: Syntax Rules: 2.g.iii.5 > > recursive references must appear in FROM clause: > 7.13: Syntax Rules: 2.g.iii.3 > My interpretation of this rule is that it does not allow a > recursive reference in a subquery in the targetlist or a subquery > in the where clause. > > stratum defined: > 7.13: Syntax Rules: 2.f > 7.13: Syntax Rules: 2.g.i.1 > > recursive query must have anchor for every stratum: > 7.13: Syntax Rules: 2.g.i.3 > > outer joins not allowed to join recursive references: > 7.13: Syntax Rules: 2.g.iii.6 > 7.13: Syntax Rules: 2.g.iii.7 > > Aggregates/HAVING disallowed: > 7.13: Syntax Rules: 2.g.iii.4.B > > Mutual recursion defined: > 7.13: Syntax Rules: 2.g.i.1 > > Evaluate each WITH entry once, even if it's referenced multiple times: > 7.13: General Rules: 1 > 7.13: General Rules: 2.b > See also: > http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php > > Evaluation order with mutual recursion: > 7.13: General Rules: 2.a > 7.13: General Rules: 2.b > > Evaluation semantics: > 7.13: General Rules: 2.c > > DISTINCT: > 7.13: General Rules: 2.c.iv > 7.13: General Rules: 2.c.ix.3.A > > I will provide comments about the code and documentation soon. This is a > very useful feature. Thanks. Enclosed is the latest patch to adopt CVS HEAD. -- Tatsuo Ishii SRA OSS, Inc. Japan
Thanks for the review. [I dropped y-asaba@sraoss.co.jp from the Cc list since he has left our company and the email address is being deleted.] I'm going to look into issues which are seem to be bug (of course if you know what to fix, patches are always welcome:-). > These are my initial comments about the Common Table Expressions (CTE) > patch, also known as WITH [RECURSIVE]. These comments are based on the > patch here: > > http://archives.postgresql.org/pgsql-patches/2008-08/msg00021.php > > This is a semantically complex feature, and the standard is fairly > complex as well. So I'm approaching this by making my own > interpretations from the standard first (I included my interpretations > and section references at the end of this email) and comparing to the > behavior of the patch. > > The following examples may be inconsistent with the standard. Some > have already been mentioned, and I don't think they all need to be > fixed for 8.4, but I mention them here for completeness. > > * Mutual Recursion: > > with recursive > foo(i) as (values(1) union all select i+1 from bar where i < 10), > bar(i) as (values(1) union all select i+1 from foo where i < 10) > select * from foo; > ERROR: mutual recursive call is not supported > > The standard allows mutual recursion. The discussion seems to agree that let it leave for post 8.4. > * Single Evaluation: > > with > foo(i) as (select random() as i) > select * from foo union all select * from foo; > i > ------------------- > 0.233165248762816 > 0.62126633618027 > (2 rows) > > The standard specifies that non-recursive WITH should be evaluated > once. What shall we do? I don't think there's an easy way to fix this as Tom suggested. Maybe we should not allow WITH clause without RECURISVE for 8.4? > * RHS only: > > with recursive > foo(i) as (select i+1 from foo where i < 10 union all values(1)) > select * from foo; > ERROR: Left hand side of UNION ALL must be a non-recursive term in a > recursive query > > The standard does not require that the recursive term be on the RHS. The discussion seems to agree that let it leave for post 8.4. > * UNION ALL only: > > with recursive > foo(i) as (values(1) union select i+1 from foo where i < 10) > select * from foo; > ERROR: non-recursive term and recursive term must be combined with > UNION ALL > > The standard seems to allow UNION ALL, UNION, INTERSECT, and EXCEPT > (when the recursive term is not on the RHS of the EXCEPT). The discussion seems to agree that let it leave for post 8.4. > * Binary recursion and subselect strangeness: > > with recursive foo(i) as > (values (1) > union all > select * from > (select i+1 from foo where i < 10 > union all > select i+1 from foo where i < X) t) > select * from foo; > > Produces 10 rows of output regardless of what "X" is. This should be > fixed for 8.4. > Also, this is non-linear recursion, which the standard seems to > disallow. I will try to fix this. However detecting the query being not a non-linear one is not so easy. > * Multiple recursive references: > > with recursive foo(i) as > (values (1) > union all > select i+1 from foo where i < 10 > union all > select i+1 from foo where i < 20) > select * from foo; > ERROR: Left hand side of UNION ALL must be a non-recursive term in a > recursive query > > If we're going to allow non-linear recursion (which the standard > does not), this seems like it should be a valid case. I will try to disallow this. > * Strange result with except: > > with recursive foo(i) as > (values (1) > union all > select * from > (select i+1 from foo where i < 10 > except > select i+1 from foo where i < 5) t) > select * from foo; > ERROR: table "foo" has 0 columns available but 1 columns specified > > This query works if you replace "except" with "union". This should be > fixed for 8.4. I will try to fix this. > * Aggregates allowed: > > with recursive foo(i) as > (values(1) > union all > select max(i)+1 from foo where i < 10) > select * from foo; > > Aggregates should be blocked according to the standard. > Also, causes an infinite loop. This should be fixed for 8.4. I will try to fix this. > * DISTINCT should supress duplicates: > > with recursive foo(i) as > (select distinct * from (values(1),(2)) t > union all > select distinct i+1 from foo where i < 10) > select * from foo; > > This outputs a lot of duplicates, but they should be supressed > according to the standard. This query is essentially the same as > supporting UNION for recursive queries, so we should either fix both for > 8.4 or block both for consistency. I'm not sure if it's possible to fix this. Will look into. > * outer joins on a recursive reference should be blocked: > > with recursive foo(i) as > (values(1) > union all > select i+1 from foo left join (values(1)) t on (i=column1)) > select * from foo; > > Causes an infinite loop, but the standard says using an outer join > in this situation should be prohibited. This should be fixed for 8.4. Not an issue, I think. > * ORDER BY, LIMIT, and OFFSET are rejected for recursive queries. The > standard does not seem to say that these should be rejected. > > > The following are my interpretations of relevant parts of the SQL > standard (200n), and the associated sections. These are only my > interpretation, so let me know if you interpret the standard > differently. > > Non-linear recursion forbidden: > 7.13: Syntax Rules: 2.g.ii > My interpretation of 2.g.ii.2 is that WQN[k] and WQN[l] may be the > same <query name>. > 7.13: Syntax Rules: 2.g.iv > > EXCEPT can't be used for recursive queries if a recursive reference > appears on the RHS: > 7.13: Syntax Rules: 2.g.iii.1 > > INTERSECT ALL/EXCEPT ALL can't be used for recursive queries: > 7.13: Syntax Rules: 2.g.iii.5 > > recursive references must appear in FROM clause: > 7.13: Syntax Rules: 2.g.iii.3 > My interpretation of this rule is that it does not allow a > recursive reference in a subquery in the targetlist or a subquery > in the where clause. > > stratum defined: > 7.13: Syntax Rules: 2.f > 7.13: Syntax Rules: 2.g.i.1 > > recursive query must have anchor for every stratum: > 7.13: Syntax Rules: 2.g.i.3 > > outer joins not allowed to join recursive references: > 7.13: Syntax Rules: 2.g.iii.6 > 7.13: Syntax Rules: 2.g.iii.7 > > Aggregates/HAVING disallowed: > 7.13: Syntax Rules: 2.g.iii.4.B > > Mutual recursion defined: > 7.13: Syntax Rules: 2.g.i.1 > > Evaluate each WITH entry once, even if it's referenced multiple times: > 7.13: General Rules: 1 > 7.13: General Rules: 2.b > See also: > http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php > > Evaluation order with mutual recursion: > 7.13: General Rules: 2.a > 7.13: General Rules: 2.b > > Evaluation semantics: > 7.13: General Rules: 2.c > > DISTINCT: > 7.13: General Rules: 2.c.iv > 7.13: General Rules: 2.c.ix.3.A > > I will provide comments about the code and documentation soon. This is a > very useful feature. Thanks. Enclosed is the latest patch to adopt CVS HEAD. -- Tatsuo Ishii SRA OSS, Inc. Japan
>> * Single Evaluation: >> >> with >> foo(i) as (select random() as i) >> select * from foo union all select * from foo; >> i >> ------------------- >> 0.233165248762816 >> 0.62126633618027 >> (2 rows) >> >> The standard specifies that non-recursive WITH should be evaluated >> once. > > What shall we do? I don't think there's a easy way to fix this. Maybe > we should not allow WITH clause without RECURISVE? ISTM that kind of misses the point. Even if it's WITH RECURSIVE rather than simply WITH, one wouldn't expect multiple evaluations of any non-recursive portion of the query. ...Robert
On Tue, 2008-09-09 at 13:45 +0900, Tatsuo Ishii wrote: > Thanks for the review. > > > The standard specifies that non-recursive WITH should be evaluated > > once. > > What shall we do? I don't think there's a easy way to fix this. Maybe > we should not allow WITH clause without RECURISVE? My interpretation of 7.13: General Rules: 2.b is that it should be single evaluation, even if RECURSIVE is present. The previous discussion was here: http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php The important arguments in the thread seemed to be: 1. People will generally expect single evaluation, so might be disappointed if they can't use this feature for that purpose. 2. It's a spec violation in the case of volatile functions. 3. "I think this is a "must fix" because of the point about volatile functions --- changing it later will result in user-visible semantics changes, so we have to get it right the first time." I don't entirely agree with #3. It is user-visible, but only in the sense that someone is depending on undocumented multiple-evaluation behavior. Tom Lane said that multiple evaluation is grounds for rejection: http://archives.postgresql.org/pgsql-hackers/2008-07/msg01318.php Is there hope of correcting this before November? > I will try to fix this. However detecting the query being not a > non-linear one is not so easy. If we don't allow mutual recursion, the only kind of non-linear recursion that might exist would be multiple references to the same recursive query name in a recursive query, is that correct? > > * DISTINCT should supress duplicates: > > > > with recursive foo(i) as > > (select distinct * from (values(1),(2)) t > > union all > > select distinct i+1 from foo where i < 10) > > select * from foo; > > > > This outputs a lot of duplicates, but they should be supressed > > according to the standard. This query is essentially the same as > > supporting UNION for recursive queries, so we should either fix both for > > 8.4 or block both for consistency. > > I'm not sure if it's possible to fix this. Will look into. > Can't we just reject queries with top-level DISTINCT, similar to how UNION is rejected? > > * outer joins on a recursive reference should be blocked: > > > > with recursive foo(i) as > > (values(1) > > union all > > select i+1 from foo left join (values(1)) t on (i=column1)) > > select * from foo; > > > > Causes an infinite loop, but the standard says using an outer join > > in this situation should be prohibited. This should be fixed for 8.4. > > Not an issue, I think. Agreed, Andrew Gierth corrected me here. Regards,Jeff Davis
> 3. "I think this is a "must fix" because of the point about volatile > functions --- changing it later will result in user-visible semantics > changes, so we have to get it right the first time." > > I don't entirely agree with #3. It is user-visible, but only in the > sense that someone is depending on undocumented multiple-evaluation > behavior. What makes you think it's going to be undocumented? Single versus multiple evaluation is a keep aspect of this feature and certainly needs to be documented one way or the other. I can't understand why we would introduce a standard syntax with non-standard behavior, but if we do, it certainly had better be mentioned in the documentation. I think that the most likely result of a CTE implementation that doesn't guarantee single evaluation is that people simply won't use it. But anyone who does will expect that their queries will return the same results in release N and release N+1, for all values of N. The only way that an incompatible change of this type won't break people's applications is if they're not using the feature in the first place, in which case there is no point in committing it anyway. I wonder if the whole approach to this patch is backward. Instead of worrying about how to implement WITH RECURSIVE, maybe it would be better to implement a really solid, spec-compliant version of WITH, and add the RECURSIVE functionality in a later patch/release. ...Robert
> On Tue, 2008-09-09 at 13:45 +0900, Tatsuo Ishii wrote: > > Thanks for the review. > > > > > The standard specifies that non-recursive WITH should be evaluated > > > once. > > > > What shall we do? I don't think there's a easy way to fix this. Maybe > > we should not allow WITH clause without RECURISVE? > > My interpretation of 7.13: General Rules: 2.b is that it should be > single evaluation, even if RECURSIVE is present. > > The previous discussion was here: > > http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php > > The important arguments in the thread seemed to be: > > 1. People will generally expect single evaluation, so might be > disappointed if they can't use this feature for that purpose. > > 2. It's a spec violation in the case of volatile functions. > > 3. "I think this is a "must fix" because of the point about volatile > functions --- changing it later will result in user-visible semantics > changes, so we have to get it right the first time." > > I don't entirely agree with #3. It is user-visible, but only in the > sense that someone is depending on undocumented multiple-evaluation > behavior. > > Tom Lane said that multiple evaluation is grounds for rejection: > http://archives.postgresql.org/pgsql-hackers/2008-07/msg01318.php > > Is there hope of correcting this before November? According to Tom, to implement "single evaluation" we need to make big infrastructure enhancement which is likely slip the schedule for 8.4 release which Tom does not want. So as long as Tom and other people think that is a "must fix", there seems no hope probably. Anyway I will continue to work on existing patches... -- Tatsuo Ishii SRA OSS, Inc. Japan > > I will try to fix this. However detecting the query being not a > > non-linear one is not so easy. > > If we don't allow mutual recursion, the only kind of non-linear > recursion that might exist would be multiple references to the same > recursive query name in a recursive query, is that correct? > > > > * DISTINCT should supress duplicates: > > > > > > with recursive foo(i) as > > > (select distinct * from (values(1),(2)) t > > > union all > > > select distinct i+1 from foo where i < 10) > > > select * from foo; > > > > > > This outputs a lot of duplicates, but they should be supressed > > > according to the standard. This query is essentially the same as > > > supporting UNION for recursive queries, so we should either fix both for > > > 8.4 or block both for consistency. > > > > I'm not sure if it's possible to fix this. Will look into. > > > > Can't we just reject queries with top-level DISTINCT, similar to how > UNION is rejected? > > > > * outer joins on a recursive reference should be blocked: > > > > > > with recursive foo(i) as > > > (values(1) > > > union all > > > select i+1 from foo left join (values(1)) t on (i=column1)) > > > select * from foo; > > > > > > Causes an infinite loop, but the standard says using an outer join > > > in this situation should be prohibited. This should be fixed for 8.4. > > > > Not an issue, I think. > > Agreed, Andrew Gierth corrected me here. > > Regards, > Jeff Davis > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
Hello 2008/9/9 Tatsuo Ishii <ishii@postgresql.org>: >> On Tue, 2008-09-09 at 13:45 +0900, Tatsuo Ishii wrote: >> > Thanks for the review. >> > >> > > The standard specifies that non-recursive WITH should be evaluated >> > > once. >> > >> > What shall we do? I don't think there's a easy way to fix this. Maybe >> > we should not allow WITH clause without RECURISVE? >> >> My interpretation of 7.13: General Rules: 2.b is that it should be >> single evaluation, even if RECURSIVE is present. >> >> The previous discussion was here: >> >> http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php >> >> The important arguments in the thread seemed to be: >> >> 1. People will generally expect single evaluation, so might be >> disappointed if they can't use this feature for that purpose. >> >> 2. It's a spec violation in the case of volatile functions. >> >> 3. "I think this is a "must fix" because of the point about volatile >> functions --- changing it later will result in user-visible semantics >> changes, so we have to get it right the first time." >> >> I don't entirely agree with #3. It is user-visible, but only in the >> sense that someone is depending on undocumented multiple-evaluation >> behavior. >> >> Tom Lane said that multiple evaluation is grounds for rejection: >> http://archives.postgresql.org/pgsql-hackers/2008-07/msg01318.php >> >> Is there hope of correcting this before November? > > According to Tom, to implement "single evaluation" we need to make big > infrastructure enhancement which is likely slip the schedule for 8.4 > release which Tom does not want. why? why don't use a materialisation? > > So as long as Tom and other people think that is a "must fix", there > seems no hope probably. > > Anyway I will continue to work on existing patches... > -- I would to see your patch in core early. I am working on grouping sets and I cannot finish my patch before your patch will be commited. Regards Pavel Stehule > Tatsuo Ishii > SRA OSS, Inc. Japan > >> > I will try to fix this. However detecting the query being not a >> > non-linear one is not so easy. >> >> If we don't allow mutual recursion, the only kind of non-linear >> recursion that might exist would be multiple references to the same >> recursive query name in a recursive query, is that correct? >> >> > > * DISTINCT should supress duplicates: >> > > >> > > with recursive foo(i) as >> > > (select distinct * from (values(1),(2)) t >> > > union all >> > > select distinct i+1 from foo where i < 10) >> > > select * from foo; >> > > >> > > This outputs a lot of duplicates, but they should be supressed >> > > according to the standard. This query is essentially the same as >> > > supporting UNION for recursive queries, so we should either fix both for >> > > 8.4 or block both for consistency. >> > >> > I'm not sure if it's possible to fix this. Will look into. >> > >> >> Can't we just reject queries with top-level DISTINCT, similar to how >> UNION is rejected? >> >> > > * outer joins on a recursive reference should be blocked: >> > > >> > > with recursive foo(i) as >> > > (values(1) >> > > union all >> > > select i+1 from foo left join (values(1)) t on (i=column1)) >> > > select * from foo; >> > > >> > > Causes an infinite loop, but the standard says using an outer join >> > > in this situation should be prohibited. This should be fixed for 8.4. >> > >> > Not an issue, I think. >> >> Agreed, Andrew Gierth corrected me here. >> >> Regards, >> Jeff Davis >> >> >> -- >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-hackers > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
> Hello > > 2008/9/9 Tatsuo Ishii <ishii@postgresql.org>: > >> On Tue, 2008-09-09 at 13:45 +0900, Tatsuo Ishii wrote: > >> > Thanks for the review. > >> > > >> > > The standard specifies that non-recursive WITH should be evaluated > >> > > once. > >> > > >> > What shall we do? I don't think there's a easy way to fix this. Maybe > >> > we should not allow WITH clause without RECURISVE? > >> > >> My interpretation of 7.13: General Rules: 2.b is that it should be > >> single evaluation, even if RECURSIVE is present. > >> > >> The previous discussion was here: > >> > >> http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php > >> > >> The important arguments in the thread seemed to be: > >> > >> 1. People will generally expect single evaluation, so might be > >> disappointed if they can't use this feature for that purpose. > >> > >> 2. It's a spec violation in the case of volatile functions. > >> > >> 3. "I think this is a "must fix" because of the point about volatile > >> functions --- changing it later will result in user-visible semantics > >> changes, so we have to get it right the first time." > >> > >> I don't entirely agree with #3. It is user-visible, but only in the > >> sense that someone is depending on undocumented multiple-evaluation > >> behavior. > >> > >> Tom Lane said that multiple evaluation is grounds for rejection: > >> http://archives.postgresql.org/pgsql-hackers/2008-07/msg01318.php > >> > >> Is there hope of correcting this before November? > > > > According to Tom, to implement "single evaluation" we need to make big > > infrastructure enhancement which is likely slip the schedule for 8.4 > > release which Tom does not want. > > why? why don't use a materialisation? See: http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php > > > > So as long as Tom and other people think that is a "must fix", there > > seems no hope probably. > > > > Anyway I will continue to work on existing patches... > > -- > > I would to see your patch in core early. I am working on grouping sets > and I cannot finish my patch before your patch will be commited. > > Regards > Pavel Stehule > > > Tatsuo Ishii > > SRA OSS, Inc. Japan > > > >> > I will try to fix this. However detecting the query being not a > >> > non-linear one is not so easy. > >> > >> If we don't allow mutual recursion, the only kind of non-linear > >> recursion that might exist would be multiple references to the same > >> recursive query name in a recursive query, is that correct? > >> > >> > > * DISTINCT should supress duplicates: > >> > > > >> > > with recursive foo(i) as > >> > > (select distinct * from (values(1),(2)) t > >> > > union all > >> > > select distinct i+1 from foo where i < 10) > >> > > select * from foo; > >> > > > >> > > This outputs a lot of duplicates, but they should be supressed > >> > > according to the standard. This query is essentially the same as > >> > > supporting UNION for recursive queries, so we should either fix both for > >> > > 8.4 or block both for consistency. > >> > > >> > I'm not sure if it's possible to fix this. Will look into. > >> > > >> > >> Can't we just reject queries with top-level DISTINCT, similar to how > >> UNION is rejected? > >> > >> > > * outer joins on a recursive reference should be blocked: > >> > > > >> > > with recursive foo(i) as > >> > > (values(1) > >> > > union all > >> > > select i+1 from foo left join (values(1)) t on (i=column1)) > >> > > select * from foo; > >> > > > >> > > Causes an infinite loop, but the standard says using an outer join > >> > > in this situation should be prohibited. This should be fixed for 8.4. > >> > > >> > Not an issue, I think. > >> > >> Agreed, Andrew Gierth corrected me here. > >> > >> Regards, > >> Jeff Davis > >> > >> > >> -- > >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > >> To make changes to your subscription: > >> http://www.postgresql.org/mailpref/pgsql-hackers > > > > -- > > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > > To make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-hackers > >
> Hello > > 2008/9/9 Tatsuo Ishii <ishii@postgresql.org>: > >> On Tue, 2008-09-09 at 13:45 +0900, Tatsuo Ishii wrote: > >> > Thanks for the review. > >> > > >> > > The standard specifies that non-recursive WITH should be evaluated > >> > > once. > >> > > >> > What shall we do? I don't think there's a easy way to fix this. Maybe > >> > we should not allow WITH clause without RECURISVE? > >> > >> My interpretation of 7.13: General Rules: 2.b is that it should be > >> single evaluation, even if RECURSIVE is present. > >> > >> The previous discussion was here: > >> > >> http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php > >> > >> The important arguments in the thread seemed to be: > >> > >> 1. People will generally expect single evaluation, so might be > >> disappointed if they can't use this feature for that purpose. > >> > >> 2. It's a spec violation in the case of volatile functions. > >> > >> 3. "I think this is a "must fix" because of the point about volatile > >> functions --- changing it later will result in user-visible semantics > >> changes, so we have to get it right the first time." > >> > >> I don't entirely agree with #3. It is user-visible, but only in the > >> sense that someone is depending on undocumented multiple-evaluation > >> behavior. > >> > >> Tom Lane said that multiple evaluation is grounds for rejection: > >> http://archives.postgresql.org/pgsql-hackers/2008-07/msg01318.php > >> > >> Is there hope of correcting this before November? > > > > According to Tom, to implement "single evaluation" we need to make big > > infrastructure enhancement which is likely slip the schedule for 8.4 > > release which Tom does not want. > > why? why don't use a materialisation? See: http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php > > So as long as Tom and other people think that is a "must fix", there > > seems no hope probably. > > > > Anyway I will continue to work on existing patches... > > -- > > I would to see your patch in core early. I am working on grouping sets > and I cannot finish my patch before your patch will be commited. > > Regards > Pavel Stehule > > > Tatsuo Ishii > > SRA OSS, Inc. Japan > > > >> > I will try to fix this. However detecting the query being not a > >> > non-linear one is not so easy. > >> > >> If we don't allow mutual recursion, the only kind of non-linear > >> recursion that might exist would be multiple references to the same > >> recursive query name in a recursive query, is that correct? > >> > >> > > * DISTINCT should supress duplicates: > >> > > > >> > > with recursive foo(i) as > >> > > (select distinct * from (values(1),(2)) t > >> > > union all > >> > > select distinct i+1 from foo where i < 10) > >> > > select * from foo; > >> > > > >> > > This outputs a lot of duplicates, but they should be supressed > >> > > according to the standard. This query is essentially the same as > >> > > supporting UNION for recursive queries, so we should either fix both for > >> > > 8.4 or block both for consistency. > >> > > >> > I'm not sure if it's possible to fix this. Will look into. > >> > > >> > >> Can't we just reject queries with top-level DISTINCT, similar to how > >> UNION is rejected? > >> > >> > > * outer joins on a recursive reference should be blocked: > >> > > > >> > > with recursive foo(i) as > >> > > (values(1) > >> > > union all > >> > > select i+1 from foo left join (values(1)) t on (i=column1)) > >> > > select * from foo; > >> > > > >> > > Causes an infinite loop, but the standard says using an outer join > >> > > in this situation should be prohibited. This should be fixed for 8.4. > >> > > >> > Not an issue, I think. > >> > >> Agreed, Andrew Gierth corrected me here. > >> > >> Regards, > >> Jeff Davis > >> > >> > >> -- > >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > >> To make changes to your subscription: > >> http://www.postgresql.org/mailpref/pgsql-hackers > > > > -- > > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > > To make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-hackers > >
> > * Aggregates allowed: > > > > with recursive foo(i) as > > (values(1) > > union all > > select max(i)+1 from foo where i < 10) > > select * from foo; > > > > Aggregates should be blocked according to the standard. > > Also, causes an infinite loop. This should be fixed for 8.4. > > I will try to fix this. We already reject: select max(i) from foo where i < 10) But max(i)+1 seems to slip the check. I looked into this I found the patch tried to detect the case before analyzing(see parser/parse_cte.c) which is not a right thing I think. I think we could detect the case by adding more checking in parseCheckAggregates(): /* * Check if there's aggregate function in a recursive term. */foreach(l, qry->rtable){ RangeTblEntry *rte = (RangeTblEntry*) lfirst(l); if (qry->hasAggs && rte->rtekind == RTE_RECURSIVE && rte->self_reference) { ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("aggregate functions in a recursive term not allowed"))); }} What do you think? -- Tatsuo Ishii SRA OSS, Inc. Japan
2008/9/9 Tatsuo Ishii <ishii@sraoss.co.jp>: >> Hello >> >> 2008/9/9 Tatsuo Ishii <ishii@postgresql.org>: >> >> On Tue, 2008-09-09 at 13:45 +0900, Tatsuo Ishii wrote: >> >> > Thanks for the review. >> >> > >> >> > > The standard specifies that non-recursive WITH should be evaluated >> >> > > once. >> >> > >> >> > What shall we do? I don't think there's a easy way to fix this. Maybe >> >> > we should not allow WITH clause without RECURISVE? >> >> >> >> My interpretation of 7.13: General Rules: 2.b is that it should be >> >> single evaluation, even if RECURSIVE is present. >> >> >> >> The previous discussion was here: >> >> >> >> http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php >> >> I am blind, I didn't find any reason, why materialisation isn't useable. Regards Pavel >> >> The important arguments in the thread seemed to be: >> >> >> >> 1. People will generally expect single evaluation, so might be >> >> disappointed if they can't use this feature for that purpose. >> >> >> >> 2. It's a spec violation in the case of volatile functions. >> >> >> >> 3. "I think this is a "must fix" because of the point about volatile >> >> functions --- changing it later will result in user-visible semantics >> >> changes, so we have to get it right the first time." >> >> >> >> I don't entirely agree with #3. It is user-visible, but only in the >> >> sense that someone is depending on undocumented multiple-evaluation >> >> behavior. >> >> >> >> Tom Lane said that multiple evaluation is grounds for rejection: >> >> http://archives.postgresql.org/pgsql-hackers/2008-07/msg01318.php >> >> >> >> Is there hope of correcting this before November? >> > >> > According to Tom, to implement "single evaluation" we need to make big >> > infrastructure enhancement which is likely slip the schedule for 8.4 >> > release which Tom does not want. >> >> why? why don't use a materialisation? > > See: > http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php > >> > >> > So as long as Tom and other people think that is a "must fix", there >> > seems no hope probably. >> > >> > Anyway I will continue to work on existing patches... >> > -- >> >> I would to see your patch in core early. I am working on grouping sets >> and I cannot finish my patch before your patch will be commited. >> >> Regards >> Pavel Stehule >> >> > Tatsuo Ishii >> > SRA OSS, Inc. Japan >> > >> >> > I will try to fix this. However detecting the query being not a >> >> > non-linear one is not so easy. >> >> >> >> If we don't allow mutual recursion, the only kind of non-linear >> >> recursion that might exist would be multiple references to the same >> >> recursive query name in a recursive query, is that correct? >> >> >> >> > > * DISTINCT should supress duplicates: >> >> > > >> >> > > with recursive foo(i) as >> >> > > (select distinct * from (values(1),(2)) t >> >> > > union all >> >> > > select distinct i+1 from foo where i < 10) >> >> > > select * from foo; >> >> > > >> >> > > This outputs a lot of duplicates, but they should be supressed >> >> > > according to the standard. This query is essentially the same as >> >> > > supporting UNION for recursive queries, so we should either fix both for >> >> > > 8.4 or block both for consistency. >> >> > >> >> > I'm not sure if it's possible to fix this. Will look into. >> >> > >> >> >> >> Can't we just reject queries with top-level DISTINCT, similar to how >> >> UNION is rejected? >> >> >> >> > > * outer joins on a recursive reference should be blocked: >> >> > > >> >> > > with recursive foo(i) as >> >> > > (values(1) >> >> > > union all >> >> > > select i+1 from foo left join (values(1)) t on (i=column1)) >> >> > > select * from foo; >> >> > > >> >> > > Causes an infinite loop, but the standard says using an outer join >> >> > > in this situation should be prohibited. This should be fixed for 8.4. >> >> > >> >> > Not an issue, I think. >> >> >> >> Agreed, Andrew Gierth corrected me here. >> >> >> >> Regards, >> >> Jeff Davis >> >> >> >> >> >> -- >> >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> >> To make changes to your subscription: >> >> http://www.postgresql.org/mailpref/pgsql-hackers >> > >> > -- >> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> > To make changes to your subscription: >> > http://www.postgresql.org/mailpref/pgsql-hackers >> > >
>>> >> My interpretation of 7.13: General Rules: 2.b is that it should be >>> >> single evaluation, even if RECURSIVE is present. >>> >> >>> >> The previous discussion was here: >>> >> >>> >> http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php > > I am blind, I didn't find any reason, why materialisation isn't useable. I believe it's because of these two (closely related) problems: # The basic # implementation clearly ought to be to dump the result of the subquery # into a tuplestore and then have the upper level read out from that. # However, we don't have any infrastructure for having multiple # upper-level RTEs reference the same tuplestore. (Perhaps the InitPlan # infrastructure could be enhanced in that direction, but it's not ready # for non-scalar outputs today.) Also, I think we'd have to teach # tuplestore how to support multiple readout cursors. For example, # consider # WITH foo AS (SELECT ...) SELECT ... FROM foo a, foo b WHERE ... # If the planner chooses to do the join as a nested loop then each # Scan node needs to keep track of its own place in the tuplestore, # concurrently with the other node having a different place. ...Robert
2008/9/9 Robert Haas <robertmhaas@gmail.com>: >>>> >> My interpretation of 7.13: General Rules: 2.b is that it should be >>>> >> single evaluation, even if RECURSIVE is present. >>>> >> >>>> >> The previous discussion was here: >>>> >> >>>> >> http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php >> >> I am blind, I didn't find any reason, why materialisation isn't useable. > > I believe it's because of these two (closely related) problems: > > # The basic > # implementation clearly ought to be to dump the result of the subquery > # into a tuplestore and then have the upper level read out from that. > # However, we don't have any infrastructure for having multiple > # upper-level RTEs reference the same tuplestore. (Perhaps the InitPlan > # infrastructure could be enhanced in that direction, but it's not ready > # for non-scalar outputs today.) Also, I think we'd have to teach > # tuplestore how to support multiple readout cursors. For example, > # consider > # WITH foo AS (SELECT ...) SELECT ... FROM foo a, foo b WHERE ... > # If the planner chooses to do the join as a nested loop then each > # Scan node needs to keep track of its own place in the tuplestore, > # concurrently with the other node having a different place. > hmm. I solve similar problem in grouping sets :( etc SELECT ... FROM ... GROUP BY GROUPING SETS (a,b) is almost same as With foo AS (SELECT ... FROM) SELECT ... FROM foo GROUP BY a UNION ALL SELECT ... FROM foo GROUP BY b; Regards Pavel > ...Robert >
On Tue, 2008-09-09 at 18:51 +0200, Pavel Stehule wrote: > hmm. I solve similar problem in grouping sets :( etc > How did you solve it? Regards,Jeff Davis
2008/9/9 Jeff Davis <pgsql@j-davis.com>: > On Tue, 2008-09-09 at 18:51 +0200, Pavel Stehule wrote: >> hmm. I solve similar problem in grouping sets :( etc >> I have special executor node - feeder, it hold one tuple and others nodes read from this node. It's usable for hash aggregates. Pavel plan is like: grouping sets --> seq scan ... --> hash aggg --> feeder --> hash agg --> feeder > > How did you solve it? > > Regards, > Jeff Davis > >
"Robert Haas" <robertmhaas@gmail.com> writes: >> I am blind, I didn't find any reason, why materialisation isn't useable. > I believe it's because of these two (closely related) problems: > # The basic > # implementation clearly ought to be to dump the result of the subquery > # into a tuplestore and then have the upper level read out from that. > # However, we don't have any infrastructure for having multiple > # upper-level RTEs reference the same tuplestore. (Perhaps the InitPlan > # infrastructure could be enhanced in that direction, but it's not ready > # for non-scalar outputs today.) Also, I think we'd have to teach > # tuplestore how to support multiple readout cursors. For example, > # consider > # WITH foo AS (SELECT ...) SELECT ... FROM foo a, foo b WHERE ... > # If the planner chooses to do the join as a nested loop then each > # Scan node needs to keep track of its own place in the tuplestore, > # concurrently with the other node having a different place. The amount of new code needed for that seems a pittance compared to the size of the patch already, so I'm not seeing why Tatsuo-san considers it infeasible. regards, tom lane
On Tue, 2008-09-09 at 09:47 -0400, Robert Haas wrote: > > 3. "I think this is a "must fix" because of the point about volatile > > functions --- changing it later will result in user-visible semantics > > changes, so we have to get it right the first time." > > > > I don't entirely agree with #3. It is user-visible, but only in the > > sense that someone is depending on undocumented multiple-evaluation > > behavior. > > What makes you think it's going to be undocumented? Single versus > multiple evaluation is a keep aspect of this feature and certainly > needs to be documented one way or the other. I can't understand why > we would introduce a standard syntax with non-standard behavior, but > if we do, it certainly had better be mentioned in the documentation. > I meant that -- hypothetically if we did accept the feature as-is -- the number of evaluations would be documented to be undefined, not N. That would avoid the backwards-compatibility problem. This one point is probably not worth discussing now, because argument #1 and #2 stand on their own. Regards,Jeff Davis
> I meant that -- hypothetically if we did accept the feature as-is -- the > number of evaluations would be documented to be undefined, not N. That > would avoid the backwards-compatibility problem. > > This one point is probably not worth discussing now, because argument > #1 and #2 stand on their own. Agreed. Plus, both Tom and Pavel seem to think this is a relatively solvable problem. ...Robert
> > * Single Evaluation: > > > > with > > foo(i) as (select random() as i) > > select * from foo union all select * from foo; > > i > > ------------------- > > 0.233165248762816 > > 0.62126633618027 > > (2 rows) > > > > The standard specifies that non-recursive WITH should be evaluated > > once. > > What shall we do? I don't think there's an easy way to fix this as Tom > suggested. Maybe we should not allow WITH clause without RECURISVE for > 8.4? This is a still remaing issue... > > * Binary recursion and subselect strangeness: > > > > with recursive foo(i) as > > (values (1) > > union all > > select * from > > (select i+1 from foo where i < 10 > > union all > > select i+1 from foo where i < X) t) > > select * from foo; > > > > Produces 10 rows of output regardless of what "X" is. This should be > > fixed for 8.4. > > Also, this is non-linear recursion, which the standard seems to > > disallow. > > I will try to fix this. However detecting the query being not a > non-linear one is not so easy. I have implemented rejection of non-linear recursion and now this type of query will not be executed anyway. > > * Multiple recursive references: > > > > with recursive foo(i) as > > (values (1) > > union all > > select i+1 from foo where i < 10 > > union all > > select i+1 from foo where i < 20) > > select * from foo; > > ERROR: Left hand side of UNION ALL must be a non-recursive term in a > > recursive query > > > > If we're going to allow non-linear recursion (which the standard > > does not), this seems like it should be a valid case. > > I will try to disallow this. Non-linear recursion is not allowed now. > > * Strange result with except: > > > > with recursive foo(i) as > > (values (1) > > union all > > select * from > > (select i+1 from foo where i < 10 > > except > > select i+1 from foo where i < 5) t) > > select * from foo; > > ERROR: table "foo" has 0 columns available but 1 columns specified > > > > This query works if you replace "except" with "union". This should be > > fixed for 8.4. > > I will try to fix this. This is a non-linear recursion too and will not be executed anyway. > > * Aggregates allowed: > > > > with recursive foo(i) as > > (values(1) > > union all > > select max(i)+1 from foo where i < 10) > > select * from foo; > > > > Aggregates should be blocked according to the standard. > > Also, causes an infinite loop. This should be fixed for 8.4. > > I will try to fix this. Fixed. > > * DISTINCT should supress duplicates: > > > > with recursive foo(i) as > > (select distinct * from (values(1),(2)) t > > union all > > select distinct i+1 from foo where i < 10) > > select * from foo; > > > > This outputs a lot of duplicates, but they should be supressed > > according to the standard. This query is essentially the same as > > supporting UNION for recursive queries, so we should either fix both for > > 8.4 or block both for consistency. > > I'm not sure if it's possible to fix this. Will look into. Ok, now this type of DISTINCT is not allowed. Included is the latest patches against CVS HEAD. -- Tatsuo Ishii SRA OSS, Inc. Japan
On Mon, Sep 15, 2008 at 06:46:16PM +0900, Tatsuo Ishii wrote: > > > * Single Evaluation: > > > > > > with > > > foo(i) as (select random() as i) > > > select * from foo union all select * from foo; > > > i > > > ------------------- > > > 0.233165248762816 > > > 0.62126633618027 > > > (2 rows) > > > > > > The standard specifies that non-recursive WITH should be > > > evaluated once. > > > > What shall we do? I don't think there's an easy way to fix this as > > Tom suggested. Maybe we should not allow WITH clause without > > RECURISVE for 8.4? > > This is a still remaing issue... I don't think that the spec explicitly forbids this. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Mon, 2008-09-15 at 22:41 -0700, David Fetter wrote: > On Mon, Sep 15, 2008 at 06:46:16PM +0900, Tatsuo Ishii wrote: > > > > * Single Evaluation: > > > > > > > > with > > > > foo(i) as (select random() as i) > > > > select * from foo union all select * from foo; > > > > i > > > > ------------------- > > > > 0.233165248762816 > > > > 0.62126633618027 > > > > (2 rows) > > > > > > > > The standard specifies that non-recursive WITH should be > > > > evaluated once. > > > > > > What shall we do? I don't think there's an easy way to fix this as > > > Tom suggested. Maybe we should not allow WITH clause without > > > RECURISVE for 8.4? > > > > This is a still remaing issue... > > I don't think that the spec explicitly forbids this. > This has been discussed before. Regardless of the nuances of the spec (and whether it says so explicitly or implicitly), there are people in the community that see single evaluation as important, and important enough to be a showstopper. Tom Lane has suggested that this is a reasonable amount of work to complete, and minor in comparison to the rest of the patch. I think the right approach is to try to complete it so that everyone is happy. I will work on this, but unfortunately I don't have a lot of time right now, so I can't make any promises. The rest of the patch looks good so far (there are still a few things I want to look at), so I think this is the biggest open issue. Regards,Jeff Davis
Tatsuo Ishii <ishii@postgresql.org> writes: > Included is the latest patches against CVS HEAD. I spent some time reading this patch. Here are a few comments in no particular order: RangeRecursive node lacks copyfuncs/equalfuncs support. Query.recursive is missed in equalfuncs.c. But rather than fix that, get rid of it entirely. AFAICS the only use is in qual_is_pushdown_safe, and what is the value of that test? The callers know perfectly well whether they are operating on a recursive RTE or not. You might as well just delete all the useless qual-pushdown-attempt code from set_recursion_pathlist, and not need to touch qual_is_pushdown_safe at all. Is physical_tlist optimization sensible for RecursiveScan? We seem to use it for every other Scan node type. I dislike putting state into ExecutorState; that makes it impossible to have multiple recursion nodes in one plan tree. It would probably be better for the Recursion and RecursiveScan nodes to talk to each other directly (compare HashJoin/Hash); although since they are not adjacent in the plan tree I admit I'm not sure how to do that. es_disallow_tuplestore doesn't seem to need to be in ExecutorState at all, it could as well be in RecursionState. I don't really like the way that Append nodes are being abused here. It would be better to allow nodeRecursion.c to duplicate a little code from nodeAppend.c, and have the child plans be direct children of the Recursion node. BTW, is it actually possible to have more than two children? I didn't spend enough time analyzing the restrictions in parse_cte to be sure. If there are always just two then you could simplify the representation by treating it like a join node instead of an append. (The RTE_RECURSIVE representation sure makes it look like there can be only two...) Mark/restore support seems useless ... note the comment on ExecSupportsMarkRestore (which should be updated if this code isn't removed). RecursiveScan claims to support backwards fetch, but does not in fact contain code to do so. (Given that it will always be underneath Recursion, which can't do backwards fetch, I see little point in adding such code; fix execAmi.c instead.) ExecInitRecursion doesn't seem to be on the same page about whether it supports backward scan as execAmi.c, either. This comment in nodeRecursivescan.c seems bogus:/* * Do not initialize scan tuple type, result tuple type and * projectioninfo in ExecInitRecursivescan. These types are * initialized after initializing Recursion node. */ because the code seems to be doing exactly what the comment says it doesn't. Numerous comments appear to have been copied-and-pasted and not modified from the original. Somebody will have to go over all that text. ruleutils.c fails completely for non-recursive WITH. It *must* regenerate such a query with a WITH clause, not as a flattened subquery which is what you seem to be doing. This isn't negotiable because the semantics are different. This will mean at least some change in the parsetree representation. Perhaps we could add a bool to subquery RTEs to mark them as coming from a nonrecursive WITH? The tests added for RTE_RECURSIVE seem a bit ugly too. If it thinks that can't happen it should Assert so, not just fall through silently. commentary for ParseState.p_ctenamespace is gratuitously unlike the comment style for the other fields, and p_recursive_namespace isn't documented at all. ParseState.p_in_with_clause is unused, should be removed. The WithClause struct definition is poorly commented. It should be stated that it is used only pre-parse-analysis (assuming that continues to be true after you get done fixing ruleutils.c...), and it doesn't say what the elements of the subquery list are (specifically, what node type). A lot of the other added structs and fields could use better commenting too. For that matter "subquery" is a poor name for WithClause's list of CTEs, especially so since it's hard to search for. It should be a plural name and I'd be inclined to use something like "ctes" not "subqueries". The term "subquery" is too overloaded already, so any place you can refer to a WITH-list member as a CTE you should do so. WithClause node may need a location field, and almost certainly has to be handled somehow in exprLocation(). The error reports in parse_cte.c *desperately* need error locations. Why does transformWithClause do parse_sub_analyze twice? I'm not sure that's even safe, and it's surely unnecessary. Also, what happens if a subquery isn't a SelectStmt? Silently doing nothing doesn't seem like a good plan there. Why are we putting essentially the same information into both p_recursive_namespace and p_ctenamespace? Is there really a need for both lists? The code added to transformFromClauseItem seems quite wrong since it searches both lists even if it found a match in the first one. This whole area looks like it needs refactoring. Costing is all bogus, but we knew that... Why does set_recursion_pathlist think that the subquery might have useful pathkeys? We know it must always be a UNION ALL, no? PlanState.has_recursivescan seems like a complete kluge. Can't it just be removed? It looks to me like it is working around bugs that hopefully aren't there anymore. There is certainly no reason why a recursive CTE should be more in need of rescanning than any other kind of plan. If it is needed then the current implementation is completely broken anyway, since it would only detect a RecursiveScan node that is directly underneath an agg or hash node. Please pay some attention to keeping things in logical, consistent orders. For instance the withClause field was inserted into _copySelectStmt() in a different place from where it was inserted in the actual struct declaration, which is confusing. parseTypeString() ought to check for null withClause. expression_tree_walker/mutator support seems entirely broken for RTE_RECURSIVE RTEs. Shouldn't it be recursing into the subquery? Missed adding non_recursive_query to the "zap unneeded substructure" part of set_plan_references (assuming it really is unneeded). There seem to be quite a number of places where RTE_SUBQUERY RTEs are handled but the patch fails to add RTE_RECURSIVE handling ... It's a really bad idea to use RTE subquery field over again for RTE_RECURSIVE, especially without any comment saying you did that. I would suggest two pointers in the RTE_RECURSIVE field list instead. Do we really have to make RECURSIVE a fully reserved keyword? (Actually, the patch makes it worse than reserved, by failing to add it to the reserved_keywords list.) checkCteTargetList is completely broken: it will only notice illegal sublinks that are at the very top level of a targetlist expression. checkWhereClause is very far short of adequate as well. Need to recurse here, or find some other way. Given that the subexpressions haven't been analyzed yet, this seems a bit messy --- expression_tree_walker doesn't know about pre-analysis node trees, so you can't use it. I'd suggest replacing this whole set of routines with just one recursive routine that doesn't make pre-assumptions about which node types can be found where. Alternatively, is there any way of delaying the validity checks until *after* parse analysis of the expressions, so that you could use expression_tree_walker et al? BTW, it seems like a lot of the logic there could be simplified by depending on the enum ordering RECURSIVE_OTHER > RECURSIVE_SELF > NON_RECURSIVE. There are a number of places that are taking the larger of two values in baroque, hard-to-follow ways. I wonder if checkCteSelectStmt is detecting nonlinearity correctly. Since RECURSIVE_OTHER dominates RECURSIVE_SELF, couldn't it fail to miss the problem in something like (self union (self union other)) ? Maybe what you really need is a bitmask: NON_RECURSIVE = 0, RECURSIVE_SELF = 1, RECURSIVE_OTHER = 2, RECURSIVE_BOTH = 3 /* the OR of RECURSIVE_SELF and RECURSIVE_OTHER */ and then you can merge two values via OR instead of MAX. regards, tom lane
Jeff Davis <pgsql@j-davis.com> writes: > I think the right approach is to try to complete it so that everyone is > happy. I will work on this, but unfortunately I don't have a lot of time > right now, so I can't make any promises. I think there are two significant bits there: * Fixing the parsetree representation so that the distinction between a CTE and an RTE that references the CTE is preserved. * Implementing some kind of "multiple readout tuplestore" to permit multiple RTEs to scan a CTE plan without causing any row to be evaluated more than once. The first of these seems relatively straightforward: the WITH clause has to be preserved explicitly in the Query representation, and RTEs for CTEs should just carry indexes into the WITH list, not copies of the subqueries. (Hm, we might need both an index and a levelsup counter, to deal with nested queries...) This would solve ruleutils' problem, too. I haven't thought much about the multiple readout tuplestore though. Anyone have a clear idea how to do that? In particular, can the existing tuplestore code be enhanced to do it (without sacrificing performance in the simple case), or do we need something new? regards, tom lane
I had either on that a while back. I think the existing tuplestore can be made to work without bad performance penaltiesbin the usual case. The trick was to do it without making the code unreadable. I can send the code I had but I doubt you want to use it as-is. -- greg
Greg Stark <greg.stark@enterprisedb.com> writes: > I had either on that a while back. I think the existing tuplestore can > be made to work without bad performance penaltiesbin the usual case. > The trick was to do it without making the code unreadable. > I can send the code I had but I doubt you want to use it as-is. Sure. Maybe someone else will see a way to make it more readable. regards, tom lane
> Do we really have to make RECURSIVE a fully reserved keyword? According to the standard, RECURSIVE is a reserved keyword, I believe. > (Actually, the patch makes it worse than reserved, by failing to > add it to the reserved_keywords list.) Yes, it's a bug. Will fix... -- Tatsuo Ishii SRA OSS, Inc. Japan
Tatsuo Ishii <ishii@postgresql.org> writes: >> Do we really have to make RECURSIVE a fully reserved keyword? > According to the standard, RECURSIVE is a reserved keyword, I believe. Sure, but our general rule is to make keywords no more reserved than is absolutely necessary to make the bison grammar unambiguous. I haven't tested, but I'm thinking that if WITH is fully reserved then RECURSIVE shouldn't have to be. regards, tom lane
2008/9/17 Tom Lane <tgl@sss.pgh.pa.us>: > Tatsuo Ishii <ishii@postgresql.org> writes: >>> Do we really have to make RECURSIVE a fully reserved keyword? > >> According to the standard, RECURSIVE is a reserved keyword, I believe. > > Sure, but our general rule is to make keywords no more reserved than > is absolutely necessary to make the bison grammar unambiguous. I > haven't tested, but I'm thinking that if WITH is fully reserved then > RECURSIVE shouldn't have to be. I am not sure, if these rule is good. Somebody who develop on postgresql should have a problems when they will be port to other databases in future. Reserved words in standards should be respected. regards Pavel Stehule > > regards, tom lane > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
> I am not sure, if these rule is good. Somebody who develop on > postgresql should have a problems when they will be port to other > databases in future. Reserved words in standards should be respected. I disagree. I have never ported an app written for PostgreSQL to another database system, and have no plans to start. The fact that some other database system might barf on a particular bit of SQL is insufficient reason for PostgreSQL to do the same thing. If people want to write code that will work on multiple databases, they should of course avoid using any SQL reserved words for anything other than their reserved purposes. But there is no reason for the database system to unilaterally shove that down everyone's throat. It is very easy to overdo the idea of protecting users from themselves. ...Robert
"Robert Haas" <robertmhaas@gmail.com> writes: >> I am not sure, if these rule is good. Somebody who develop on >> postgresql should have a problems when they will be port to other >> databases in future. Reserved words in standards should be respected. > I disagree. I have never ported an app written for PostgreSQL to > another database system, and have no plans to start. The fact that > some other database system might barf on a particular bit of SQL is > insufficient reason for PostgreSQL to do the same thing. > If people want to write code that will work on multiple databases, > they should of course avoid using any SQL reserved words for anything > other than their reserved purposes. More than that, they have to actually test their SQL on each target DB. Every DB (including us) is going to have some reserved words that are not in the standard; so imagining that Postgres can all by itself protect you from this type of problem is doomed to failure anyway. regards, tom lane
>>> Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Robert Haas" <robertmhaas@gmail.com> writes: >>> I am not sure, if these rule is good. Somebody who develop on >>> postgresql should have a problems when they will be port to other >>> databases in future. Reserved words in standards should be respected. > >> If people want to write code that will work on multiple databases, >> they should of course avoid using any SQL reserved words for anything >> other than their reserved purposes. > > More than that, they have to actually test their SQL on each target DB. > Every DB (including us) is going to have some reserved words that are > not in the standard; so imagining that Postgres can all by itself > protect you from this type of problem is doomed to failure anyway. If someone wants portable code, they can use a development tool which wraps ALL identifiers in quotes, every time. That's what we do. The important thing is that, to the extent practicable, standard SQL code is accepted and behaves in compliance with the standard. I don't see that it does anything to compromise that if you support additional, non-standard syntax for extensions. -Kevin
> Is physical_tlist optimization sensible for RecursiveScan? We seem > to use it for every other Scan node type. To enable physical_tlist optimization, it seems build_physical_tlist, use_physical_tlist and disuse_physical_tlist need to be changed. build_physical_tlist and use_physical_tlist have been already patched and only disuse_physical_tlist needs to be patched. Any other place I miss to enable the optimization? -- Tatsuo Ishii SRA OSS, Inc. Japan
Tatsuo Ishii <ishii@postgresql.org> writes: >> Is physical_tlist optimization sensible for RecursiveScan? We seem >> to use it for every other Scan node type. > To enable physical_tlist optimization, it seems build_physical_tlist, > use_physical_tlist and disuse_physical_tlist need to be > changed. build_physical_tlist and use_physical_tlist have been already > patched and only disuse_physical_tlist needs to be patched. Any other > place I miss to enable the optimization? IIRC, the comment for build_physical_tlist hadn't been patched, but yeah that seems like about it. regards, tom lane
> > To enable physical_tlist optimization, it seems build_physical_tlist, > > use_physical_tlist and disuse_physical_tlist need to be > > changed. build_physical_tlist and use_physical_tlist have been already > > patched and only disuse_physical_tlist needs to be patched. Any other > > place I miss to enable the optimization? > > IIRC, the comment for build_physical_tlist hadn't been patched, but > yeah that seems like about it. Yeah, I need to fix sloppy comments in the existing patches all over the places:-) -- Tatsuo Ishii SRA OSS, Inc. Japan
Tom, thanks for the review. Here is an in-progress report. Patches against CVS HEAD attached. (uncommented items are work-in-progress). -- Tatsuo Ishii SRA OSS, Inc. Japan > Tatsuo Ishii <ishii@postgresql.org> writes: > > Included is the latest patches against CVS HEAD. > > I spent some time reading this patch. Here are a few comments in > no particular order: > > RangeRecursive node lacks copyfuncs/equalfuncs support. Functions added. > Query.recursive is missed in equalfuncs.c. But rather than fix that, > get rid of it entirely. AFAICS the only use is in qual_is_pushdown_safe, > and what is the value of that test? The callers know perfectly well > whether they are operating on a recursive RTE or not. You might as well > just delete all the useless qual-pushdown-attempt code from > set_recursion_pathlist, and not need to touch qual_is_pushdown_safe > at all. Query.recursive removed and qual_is_pushdown_safe is untouched. > Is physical_tlist optimization sensible for RecursiveScan? We seem > to use it for every other Scan node type. Fixed and physical_tlist optimization is enabled for RecursiveScan I believe. > I dislike putting state into ExecutorState; that makes it impossible > to have multiple recursion nodes in one plan tree. It would probably > be better for the Recursion and RecursiveScan nodes to talk to each > other directly (compare HashJoin/Hash); although since they are not > adjacent in the plan tree I admit I'm not sure how to do that. > > es_disallow_tuplestore doesn't seem to need to be in ExecutorState > at all, it could as well be in RecursionState. It was for a workaround to avoid an infinit recursion in some cases. Discussion came to the conclusion that it's user's responsibilty to avoid such that case (otherwise the semantics of our recursive query becomes to be different from the one defined in the standard) I believe. es_disallow_tuplestore removed. > I don't really like the way that Append nodes are being abused here. > It would be better to allow nodeRecursion.c to duplicate a little code > from nodeAppend.c, and have the child plans be direct children of > the Recursion node. BTW, is it actually possible to have more than > two children? I didn't spend enough time analyzing the restrictions > in parse_cte to be sure. If there are always just two then you could > simplify the representation by treating it like a join node instead > of an append. (The RTE_RECURSIVE representation sure makes it look > like there can be only two...) > > Mark/restore support seems useless ... note the comment on > ExecSupportsMarkRestore (which should be updated if this code > isn't removed). > > RecursiveScan claims to support backwards fetch, but does not in fact > contain code to do so. (Given that it will always be underneath > Recursion, which can't do backwards fetch, I see little point in adding > such code; fix execAmi.c instead.) > > ExecInitRecursion doesn't seem to be on the same page about whether > it supports backward scan as execAmi.c, either. > > This comment in nodeRecursivescan.c seems bogus: > /* > * Do not initialize scan tuple type, result tuple type and > * projection info in ExecInitRecursivescan. These types are > * initialized after initializing Recursion node. > */ > because the code seems to be doing exactly what the comment says it > doesn't. > > Numerous comments appear to have been copied-and-pasted and not modified > from the original. Somebody will have to go over all that text. > > ruleutils.c fails completely for non-recursive WITH. It *must* regenerate > such a query with a WITH clause, not as a flattened subquery which is what > you seem to be doing. This isn't negotiable because the semantics are > different. This will mean at least some change in the parsetree > representation. Perhaps we could add a bool to subquery RTEs to mark them > as coming from a nonrecursive WITH? The tests added for RTE_RECURSIVE > seem a bit ugly too. If it thinks that can't happen it should Assert so, > not just fall through silently. > > commentary for ParseState.p_ctenamespace is gratuitously unlike the > comment style for the other fields, and p_recursive_namespace isn't > documented at all. > > ParseState.p_in_with_clause is unused, should be removed. Done. > The WithClause struct definition is poorly commented. It should be > stated that it is used only pre-parse-analysis (assuming that continues > to be true after you get done fixing ruleutils.c...), and it doesn't > say what the elements of the subquery list are (specifically, what > node type). A lot of the other added structs and fields could use > better commenting too. > > For that matter "subquery" is a poor name for WithClause's list of CTEs, > especially so since it's hard to search for. It should be a plural name > and I'd be inclined to use something like "ctes" not "subqueries". > The term "subquery" is too overloaded already, so any place you can > refer to a WITH-list member as a CTE you should do so. > > WithClause node may need a location field, and almost certainly has to > be handled somehow in exprLocation(). > > The error reports in parse_cte.c *desperately* need error locations. > > Why does transformWithClause do parse_sub_analyze twice? > I'm not sure that's even safe, and it's surely unnecessary. > Also, what happens if a subquery isn't a SelectStmt? Silently > doing nothing doesn't seem like a good plan there. > > Why are we putting essentially the same information into both > p_recursive_namespace and p_ctenamespace? Is there really a need > for both lists? The code added to transformFromClauseItem > seems quite wrong since it searches both lists even if it found a > match in the first one. This whole area looks like it needs > refactoring. > > Costing is all bogus, but we knew that... > > Why does set_recursion_pathlist think that the subquery might have > useful pathkeys? We know it must always be a UNION ALL, no? > > PlanState.has_recursivescan seems like a complete kluge. Can't it just be > removed? It looks to me like it is working around bugs that hopefully aren't > there anymore. There is certainly no reason why a recursive CTE should be > more in need of rescanning than any other kind of plan. If it is needed then > the current implementation is completely broken anyway, since it would only > detect a RecursiveScan node that is directly underneath an agg or hash node. > > Please pay some attention to keeping things in logical, consistent orders. > For instance the withClause field was inserted into _copySelectStmt() > in a different place from where it was inserted in the actual struct > declaration, which is confusing. > > parseTypeString() ought to check for null withClause. Done. > expression_tree_walker/mutator support seems entirely broken for > RTE_RECURSIVE RTEs. Shouldn't it be recursing into the subquery? > > Missed adding non_recursive_query to the "zap unneeded substructure" part > of set_plan_references (assuming it really is unneeded). > > There seem to be quite a number of places where RTE_SUBQUERY RTEs > are handled but the patch fails to add RTE_RECURSIVE handling ... > > It's a really bad idea to use RTE subquery field over again for > RTE_RECURSIVE, especially without any comment saying you did that. > I would suggest two pointers in the RTE_RECURSIVE field list instead. > > Do we really have to make RECURSIVE a fully reserved keyword? > (Actually, the patch makes it worse than reserved, by failing to > add it to the reserved_keywords list.) Changed to unreserved keyword. > checkCteTargetList is completely broken: it will only notice illegal > sublinks that are at the very top level of a targetlist expression. > checkWhereClause is very far short of adequate as well. Need to recurse > here, or find some other way. Given that the subexpressions haven't been > analyzed yet, this seems a bit messy --- expression_tree_walker doesn't > know about pre-analysis node trees, so you can't use it. I'd suggest > replacing this whole set of routines with just one recursive routine that > doesn't make pre-assumptions about which node types can be found where. > Alternatively, is there any way of delaying the validity checks until > *after* parse analysis of the expressions, so that you could use > expression_tree_walker et al? > > BTW, it seems like a lot of the logic there could be simplified by depending > on the enum ordering RECURSIVE_OTHER > RECURSIVE_SELF > NON_RECURSIVE. > There are a number of places that are taking the larger of two values > in baroque, hard-to-follow ways. > > I wonder if checkCteSelectStmt is detecting nonlinearity correctly. > Since RECURSIVE_OTHER dominates RECURSIVE_SELF, couldn't it fail to > miss the problem in something like (self union (self union other)) ? > Maybe what you really need is a bitmask: > NON_RECURSIVE = 0, > RECURSIVE_SELF = 1, > RECURSIVE_OTHER = 2, > RECURSIVE_BOTH = 3 /* the OR of RECURSIVE_SELF and RECURSIVE_OTHER */ > and then you can merge two values via OR instead of MAX. > > regards, tom lane
> Why does set_recursion_pathlist think that the subquery might have > useful pathkeys? We know it must always be a UNION ALL, no? Right. But someday we might implement "UNION" (without ALL) then we have useful pathkeys... Or shall I completely remove the step to generate patheys and do not pass pathkeys to create_recursion_path? pathkeys = convert_subquery_pathkeys(root, rel, subroot->query_pathkeys); -- Tatsuo Ishii SRA OSS, Inc. Japan
Tatsuo Ishii <ishii@postgresql.org> writes: >> Why does set_recursion_pathlist think that the subquery might have >> useful pathkeys? We know it must always be a UNION ALL, no? > Right. But someday we might implement "UNION" (without ALL) then we > have useful pathkeys... Good point. Might as well leave it in. regards, tom lane
> PlanState.has_recursivescan seems like a complete kluge. Can't it just be > removed? It looks to me like it is working around bugs that hopefully aren't > there anymore. There is certainly no reason why a recursive CTE should be > more in need of rescanning than any other kind of plan. I don't think so. Recursion plan needs the hash table used by sublan be re-created at each recursion loop stage. Remember that in each evaluation of recursive plan, the recursive name is replaced by a working table which is holding previous evalution result of recursion stage. Thus the hash table corresponding to the work table needs to be re-created. > If it is needed then > the current implementation is completely broken anyway, since it would only > detect a RecursiveScan node that is directly underneath an agg or hash node. Yeah, that's right. What I have in my mind is to implement something similar to UpdateChangedParamSet family like mechanism which will inherit working table change event to child node. -- Tatsuo Ishii SRA OSS, Inc. Japan
Tatsuo Ishii <ishii@postgresql.org> writes: >> PlanState.has_recursivescan seems like a complete kluge. Can't it just be >> removed? It looks to me like it is working around bugs that hopefully aren't >> there anymore. There is certainly no reason why a recursive CTE should be >> more in need of rescanning than any other kind of plan. > I don't think so. Recursion plan needs the hash table used by sublan > be re-created at each recursion loop stage. Remember that in each > evaluation of recursive plan, the recursive name is replaced by a > working table which is holding previous evalution result of recursion > stage. Thus the hash table corresponding to the work table needs to > be re-created. Oh, I see. I keep getting confused about whether RecursiveScan is at the top or the bottom of the recursion plan tree :-(. Maybe it would help to use a different name for it? RecursionInjector or something like that? >> If it is needed then >> the current implementation is completely broken anyway, since it would only >> detect a RecursiveScan node that is directly underneath an agg or hash node. > Yeah, that's right. What I have in my mind is to implement something > similar to UpdateChangedParamSet family like mechanism which will > inherit working table change event to child node. I think it could actually *be* UpdateChangedParamSet, if you just associate some otherwise-unused Param with each RecursiveScan node, and have the Recursion node signal a change of that Param when it revises the work table. In fact, why not combine that with getting rid of the klugy addition to ExecutorState? Make the param be actually useful: it could contain some internal datastructure that passes the work table down to the RecursiveScan node. regards, tom lane
Jeff Davis <pgsql@j-davis.com> writes: > Here is a patch that is an initial attempt to reorganize the parse tree > representation. Oh dear, we seem to have spent yesterday doing the same work :-( I'll go over this and try to merge it with my own WIP. > * There are a couple of other rough points in places where it's hard to > traverse up the parse tree or query tree. Yeah, I'd been running into that issue too. Adding an explicit pointer to the CTE into the RTE doesn't work because it renders the parse tree un-copiable (at least without something a lot more sophisticated than copyObject; and saving/loading rule parsetrees would be tough too). What I've got at the moment is that creation of an RTE_CTE RTE copies the CTE's lists of output column types/typmods into the RTE. This eliminates the need for expandRTE and a couple of other places to be able to find the CTE; everything they need is in the RTE. So far as I can see, everyplace else that might need to find the CTE from the RTE is in places that either have a ParseState available, or have some comparable structure that could provide a way to search upwards for CTEs (eg, in ruleutils the "deparse context" will need to track uplevel CTE lists as well as rtables). It is a bit tedious though. Can anyone think of another way that would still preserve the notion of multiple RTEs being links to the same CTE rather than independent subqueries? regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Yeah, I'd been running into that issue too. Adding an explicit pointer > to the CTE into the RTE doesn't work because it renders the parse tree > un-copiable (at least without something a lot more sophisticated than > copyObject; and saving/loading rule parsetrees would be tough too). Well the alternative to direct pointers is as you did with subqueries, turning the set into a flat array and storing indexes into the array. I'm not sure if that applies here or not. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!
Gregory Stark <stark@enterprisedb.com> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> Yeah, I'd been running into that issue too. Adding an explicit pointer >> to the CTE into the RTE doesn't work because it renders the parse tree >> un-copiable (at least without something a lot more sophisticated than >> copyObject; and saving/loading rule parsetrees would be tough too). > Well the alternative to direct pointers is as you did with subqueries, turning > the set into a flat array and storing indexes into the array. I'm not sure if > that applies here or not. I think that just changes the problem into "where can I find the array?" ... The real issue here is that simplicity of copying etc requires that child nodes in a parse tree never have back-links leading up to their parent. If we were willing to drop that requirement for Query then we'd not need any auxiliary data structures to chase up to upper-level rtables or CTEs. I'm not sure this cure is better than the disease though. regards, tom lane
Attached is the result of a couple of days hacking on the WITH RECURSIVE patch. This moves us a lot closer to having sanity in the parsing phase, though I'm still quite unhappy with the second half of the processing in parse_cte.c. I added some infrastructure to make the first half's search for CTE references reasonably bulletproof, but the second half needs to be converted to use the same infrastructure, and I didn't do that yet because I didn't understand what it was doing. In particular, what the heck is the exception in findCteName that allows some other CTE's non_recursive_term to be stored into the non_recursive_term for the current one? That seems mighty broken. There are a number of unfinished corner cases (look for XXX in the patch) but they aren't in the way of further progress. The next big thing seems to be to figure out exactly how to do multiple references to CTE outputs, so that we can de-bogotify the planner. regards, tom lane
Attachment
I am re-sending this message to -hackers from yesterday, because the first time it didn't appear to make it through. This time I gzipped the patch. This is just for the archives (and to give context to the replies), and this message is superseded by Tom's patch here: http://archives.postgresql.org/pgsql-hackers/2008-09/msg01521.php On Thu, 2008-09-18 at 12:55 +0900, Tatsuo Ishii wrote: > Tom, thanks for the review. > > Here is an in-progress report. Patches against CVS HEAD attached. > (uncommented items are work-in-progress). Here is a patch that is an initial attempt to reorganize the parse tree representation. The goal of this patch is to separate the RTEs from the CTEs, so that we can, for example, have multiple RTEs refer to the same CTE. This will hopefully allow us to materialize a volatile query once, and have several RTEs refer to that same value, which will meet the SQL standard. Notes: * It makes a p_cte_table in the ParseState, which is a list of CteTblEntries. This replaces p_ctenamespace, which was a list of RangeSubselects. * It copies the p_cte_table into Query.cte_table * I introduced a new type of RTE, RTE_CTE, which holds a cte_index and cte_levelsup. This is used to find the CTE that the RTE references. Weak points: * It does not change the behavior of recursive queries. That is a little more complicated, so I wanted to wait for feedback on my patch so far. * I don't understand set_subquery_pathlist, or that general area of the code. I made a new set_cte_pathlist, that is basically the same thing, except I used a hack "dummy_subquery" variable in the RTE to pass along a pointer to the subquery of the CTE. I think this dummy variable can be removed, but I just don't understand that part of the code well enough to know what should happen. And if it does require a subquery at that point, I'll need to find a way of locating the right cte_table from inside that function. Any advice here would be appreciated. * There are a couple of other rough points in places where it's hard to traverse up the parse tree or query tree. I can probably work around these weak points, but I wanted to send the patch to avoid a lot of conflicts or problems later. Tell me whether you think this is moving in the right direction. Regards, Jeff Davis
Attachment
greg On 24 Sep 2008, at 02:45, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The next big > thing seems to be to figure out exactly how to do multiple references > to CTE outputs, so that we can de-bogotify the planner. I've looked and don't seem to still have the source tree where I worked on this. I remember how I made the changes to tuplestore which was mostly mechanical. The trick I think will be in adding a special purpose executor method which passes the call site to the node below. This depends on the observation that if we always memoize results then each call site can only have one active call. That is, we don't need to maintain a full stack tree.
Tom, > > WithClause node may need a location field, and almost certainly has to > > be handled somehow in exprLocation(). > > > > The error reports in parse_cte.c *desperately* need error locations. Included is a patch for this against your cte-0923.patch.gz. Most errors now have error locations, but some do not. I'm going to think more to enhance this. -- Tatsuo Ishii SRA OSS, Inc. Japan *** pgsql/src/backend/parser/parse_cte.c 2008-09-25 11:06:12.000000000 +0900 --- pgsql.patched/src/backend/parser/parse_cte.c 2008-09-25 10:46:41.000000000 +0900 *************** *** 239,245 **** if (query->intoClause) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg("subquery in WITH cannot have SELECT INTO"))); /* Compute the derived fields if not done yet*/ if (!cte->cterecursive) --- 239,247 ---- if (query->intoClause) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg("subquery in WITH cannot have SELECT INTO"), ! parser_errposition(pstate, ! exprLocation((Node *) query->intoClause)))); /* Compute the derived fields ifnot done yet */ if (!cte->cterecursive) *************** *** 561,625 **** lresult != NON_RECURSIVE) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg("Left hand side of UNION ALL must be a non-recursive term in a recursive query"))); else if (stmt->op == SETOP_INTERSECT) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg("non-recursive term and recursive term must not be combined with INTERSECT"))); else if (stmt->op == SETOP_EXCEPT) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg("non-recursive term and recursive term must not be combined with EXCEPT"))); else if (stmt->op == SETOP_UNION && stmt->all != true) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg("non-recursive term and recursive term must be combined with UNION ALL"))); else if (stmt->op == SETOP_UNION && stmt->all == true && rarg->op == SETOP_UNION) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg("Right hand side of UNION ALL must not contain UNION operation"))); else if(stmt->sortClause) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg("ORDER BY in a recursive query not allowed"))); else if (stmt->limitOffset|| stmt->limitCount) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg("LIMIT OFFSET in a recursive query not allowed"))); else if (stmt->lockingClause) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg("FOR UPDATE in a recursive query not allowed"))); else if (lresult == NON_RECURSIVE&& rresult == RECURSIVE_SELF) { if (larg->distinctClause) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg("DISTINCT in a non recursive term not allowed"))); if (rarg->distinctClause) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg("DISTINCT in a recursive term not allowed"))); if (rarg->groupClause) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg("GROUP BY in a recursive term not allowed"))); if (rarg->havingClause) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg("HAVING in a recursive term not allowed"))); /* *Save non_recursive_term. --- 563,646 ---- lresult != NON_RECURSIVE) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg("Left hand side of UNION ALL must be a non-recursive term in a recursive query"), ! parser_errposition(cstate->pstate, cte->location))); else if (stmt->op == SETOP_INTERSECT) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg("non-recursive term and recursive term must not be combined with INTERSECT"), ! parser_errposition(cstate->pstate, cte->location))); else if (stmt->op == SETOP_EXCEPT) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg("non-recursive term and recursive term must not be combined with EXCEPT"), ! parser_errposition(cstate->pstate, cte->location))); else if (stmt->op == SETOP_UNION&& stmt->all != true) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg("non-recursive term and recursive term must be combined with UNION ALL"), ! parser_errposition(cstate->pstate, cte->location))); else if (stmt->op == SETOP_UNION&& stmt->all == true && rarg->op == SETOP_UNION) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg("Right hand side of UNION ALL must not contain UNION operation"), ! parser_errposition(cstate->pstate, cte->location))); else if (stmt->sortClause) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg("ORDER BY in a recursive query not allowed"), ! parser_errposition(cstate->pstate, ! exprLocation((Node *) stmt->sortClause)))); else if (stmt->limitOffset|| stmt->limitCount) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg("LIMIT OFFSET in a recursive query not allowed"), ! parser_errposition(cstate->pstate, ! exprLocation((Node *) stmt->limitCount)))); else if (stmt->lockingClause) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg("FOR UPDATE in a recursive query not allowed"), ! parser_errposition(cstate->pstate, ! exprLocation((Node *) stmt->lockingClause)))); else if (lresult== NON_RECURSIVE && rresult == RECURSIVE_SELF) { if (larg->distinctClause) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg("DISTINCT in a non recursive term not allowed"), ! parser_errposition(cstate->pstate, ! exprLocation((Node *) larg->distinctClause)))); if (rarg->distinctClause) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg("DISTINCT in a recursive term not allowed"), ! parser_errposition(cstate->pstate, ! exprLocation((Node *) rarg->distinctClause)))); if (rarg->groupClause) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg("GROUP BY in a recursive term not allowed"), ! parser_errposition(cstate->pstate, ! exprLocation((Node *) rarg->groupClause)))); if (rarg->havingClause) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg("HAVING in a recursive term not allowed"), ! parser_errposition(cstate->pstate, ! exprLocation((Node *) rarg->havingClause)))); /* * Save non_recursive_term. *************** *** 668,674 **** if (checkCteTargetList(cstate, n->targetList, myindex) != NON_RECURSIVE) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! (errmsg("target list having subquery which uses recursive name in a recursive term not allowed")))); } if (n->fromClause) --- 689,697 ---- if (checkCteTargetList(cstate, n->targetList, myindex) != NON_RECURSIVE) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! (errmsg("target list having subquery which uses recursive name in a recursive term not allowed"), ! parser_errposition(cstate->pstate, ! exprLocation((Node *) n->targetList))))); } if (n->fromClause) *************** *** 680,687 **** if (checkCteWhereClause(cstate, n->whereClause, myindex) != NON_RECURSIVE) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! (errmsg("WHERE clause having subqury which uses recursive name in a recursive term not allowed")))); ! } return r; --- 703,711 ---- if (checkCteWhereClause(cstate, n->whereClause, myindex) != NON_RECURSIVE) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! (errmsg("WHERE clause having subqury which uses recursive name in a recursive term not allowed"), ! parser_errposition(cstate->pstate, ! exprLocation((Node *) n->whereClause))))); } return r; *** pgsql/src/test/regress/expected/recursive.out 2008-09-25 11:06:12.000000000 +0900 --- pgsql.patched/src/test/regress/expected/recursive.out 2008-09-25 11:11:47.000000000 +0900 *************** *** 404,423 **** --- 404,433 ---- WITH RECURSIVE x(n) AS (SELECT 1 UNION SELECT n+1 FROM x) SELECT * FROM x; ERROR: non-recursive termand recursive term must be combined with UNION ALL + LINE 1: WITH RECURSIVE x(n) AS (SELECT 1 UNION SELECT n+1 FROM x) + ^ -- INTERSECT WITH RECURSIVE x(n) AS (SELECT 1 INTERSECT SELECT n+1 FROM x) SELECT * FROM x;ERROR: non-recursive term and recursive term must not be combined with INTERSECT + LINE 1: WITH RECURSIVE x(n) AS (SELECT 1 INTERSECT SELECT n+1 FROM x... + ^ WITH RECURSIVE x(n) AS (SELECT 1 INTERSECT ALL SELECT n+1 FROM x) SELECT * FROM x; ERROR: non-recursive term and recursive term must not be combined with INTERSECT + LINE 1: WITH RECURSIVE x(n) AS (SELECT 1 INTERSECT ALL SELECT n+1 FR... + ^ -- EXCEPT WITH RECURSIVE x(n) AS (SELECT 1 EXCEPT SELECT n+1 FROM x) SELECT * FROM x; ERROR: non-recursive term and recursive term must not be combined with EXCEPT + LINE 1: WITH RECURSIVE x(n) AS (SELECT 1 EXCEPT SELECT n+1 FROM x) + ^ WITH RECURSIVE x(n) AS (SELECT 1 EXCEPT ALL SELECT n+1 FROM x) SELECT * FROM x; ERROR: non-recursiveterm and recursive term must not be combined with EXCEPT + LINE 1: WITH RECURSIVE x(n) AS (SELECT 1 EXCEPT ALL SELECT n+1 FROM ... + ^ -- no non-recursive term WITH RECURSIVE x(n) AS (SELECT n FROM x) SELECT * FROM x; *************** *** 428,433 **** --- 438,445 ---- WITH RECURSIVE x(n) AS (SELECT n FROM x UNION ALL SELECT 1) SELECT * FROM x; ERROR: Left hand sideof UNION ALL must be a non-recursive term in a recursive query + LINE 1: WITH RECURSIVE x(n) AS (SELECT n FROM x UNION ALL SELECT 1) + ^ CREATE TEMPORARY TABLE y (a INTEGER); INSERT INTO y SELECT generate_series(1, 10); -- LEFT JOIN *************** *** 453,470 **** --- 465,490 ---- WHERE n IN (SELECT * FROM x)) SELECT * FROM x; ERROR: WHERE clause having subqurywhich uses recursive name in a recursive term not allowed + LINE 2: WHERE n IN (SELECT * FROM x)) + ^ WITH RECURSIVE x(n) AS (SELECT 1 UNION ALL SELECT n+1 FROM x WHERE n = 1 AND n IN (SELECT * FROM x)) SELECT * FROM x; ERROR: WHERE clause having subqury which uses recursivename in a recursive term not allowed + LINE 2: WHERE n = 1 AND n IN (SELECT * FRO... + ^ -- GROUP BY WITH RECURSIVE x(n) AS (SELECT 1 UNION ALL SELECT n+1 FROM x GROUPBY n) SELECT * FROM x; ERROR: GROUP BY in a recursive term not allowed + LINE 1: ...VE x(n) AS (SELECT 1 UNION ALL SELECT n+1 FROM x GROUP BY n) + ^ -- HAVING WITH RECURSIVE x(n) AS (SELECT 1 UNIONALL SELECT n+1 FROM x HAVING n < 10) SELECT * FROM x; ERROR: HAVING in a recursive term not allowed + LINE 1: ...x(n) AS (SELECT 1 UNION ALL SELECT n+1 FROM x HAVING n < 10) + ^ -- aggregate functions WITH RECURSIVE x(n) AS (SELECT1 UNION ALL SELECT count(*) FROM x) SELECT * FROM x; *************** *** 485,494 **** --- 505,518 ---- WITH RECURSIVE x(n) AS (SELECT 1 UNION ALL SELECT n+1 FROM x ORDER BY 1) SELECT * FROM x; ERROR: ORDERBY in a recursive query not allowed + LINE 1: ...VE x(n) AS (SELECT 1 UNION ALL SELECT n+1 FROM x ORDER BY 1) + ^ -- LIMIT/OFFSET WITH RECURSIVE x(n) AS (SELECT 1UNION ALL SELECT n+1 FROM x LIMIT 10 OFFSET 1) SELECT * FROM x; ERROR: LIMIT OFFSET in a recursive query not allowed + LINE 1: ...n) AS (SELECT 1 UNION ALL SELECT n+1 FROM x LIMIT 10 OFFSET ... + ^ -- FOR UPDATE WITH RECURSIVE x(n) AS (SELECT 1 UNION ALLSELECT n+1 FROM x FOR UPDATE) SELECT * FROM x; *************** *** 499,504 **** --- 523,530 ---- SELECT (SELECT * FROM x) FROM x WHERE id < 5 ) SELECT * FROM x; ERROR: target list having subquerywhich uses recursive name in a recursive term not allowed + LINE 3: SELECT (SELECT * FROM x) FROM x WHERE id < 5 + ^ -- mutual recursive query WITH RECURSIVE x (id) AS (SELECT 1 UNION ALL SELECT id+1 FROM y WHEREid < 5),
Greg Stark <greg.stark@enterprisedb.com> writes: > On 24 Sep 2008, at 02:45, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The next big >> thing seems to be to figure out exactly how to do multiple references >> to CTE outputs, so that we can de-bogotify the planner. > I've looked and don't seem to still have the source tree where I > worked on this. I remember how I made the changes to tuplestore which > was mostly mechanical. The trick I think will be in adding a special > purpose executor method which passes the call site to the node below. > This depends on the observation that if we always memoize results then > each call site can only have one active call. That is, we don't need > to maintain a full stack tree. I looked at the tuplestore code a bit and decided that this actually doesn't need to be hard at all. Tuplestore already has a notion of a write position and an independent read position, and we don't need more than one write position. So what I think we should do is add "get read position" and "set read position" functions to tuplestore.c, and have each of the reader nodes remember its own read position. That is, each reader has to do set_read_position(tupstore, &local_read_position);tuple = tuplestore_gettuple(tupstore, ...);get_read_position(tupstore,&local_read_position); rather than just tuplestore_gettuple. The set/get functions will be cheap enough that this is no big deal. (Or maybe we should just provide a wrapper function that does this sequence?) regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, Le 30 sept. 08 à 20:03, Tom Lane a écrit : > set_read_position(tupstore, &local_read_position); > tuple = tuplestore_gettuple(tupstore, ...); > get_read_position(tupstore, &local_read_position); > > rather than just tuplestore_gettuple. The set/get functions will be > cheap enough that this is no big deal. (Or maybe we should just > provide a wrapper function that does this sequence?) It seems to me to share some ideas with the MemoryContext concept: what about a TupstoreContext associated with tuplestore, you get a common default one if you don't register your own, and usetuplestore_gettuple(MyTupstoreContext, ...); Maybe some other API would benefit from the idea? Regards, - -- dim -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iEYEARECAAYFAkjigF4ACgkQlBXRlnbh1bkycQCgqs/+JBOd0SiN4xvKwLgEgi9F BOYAoLm0Se6zs8cEAnoTlH6de7pLLh/l =kzm1 -----END PGP SIGNATURE-----
Hi, 2008/10/1 Dimitri Fontaine <dfontaine@hi-media.com>: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi, > > Le 30 sept. 08 à 20:03, Tom Lane a écrit : >> >> set_read_position(tupstore, &local_read_position); >> tuple = tuplestore_gettuple(tupstore, ...); >> get_read_position(tupstore, &local_read_position); >> >> rather than just tuplestore_gettuple. The set/get functions will be >> cheap enough that this is no big deal. (Or maybe we should just >> provide a wrapper function that does this sequence?) > > It seems to me to share some ideas with the MemoryContext concept: what > about a TupstoreContext associated with tuplestore, you get a common default > one if you don't register your own, and use > tuplestore_gettuple(MyTupstoreContext, ...); > > Maybe some other API would benefit from the idea? > I'm just working on tuplestore recording multiple positions for my window function project. Attached patch is still in progress but seems it works in a situation. >From my work, the setting/getting read position and delegate savig positions to the caller will probably have problems, because of memory control for saving positions and tuplestore status changing (memory -> BufFile). Instead, I decided it'd better that we can indicate the row number by integer. Regards, -- Hitoshi Harada
Attachment
Here are the results of a couple more days' hacking on the CTE patch. * I cleaned up the processing in the second part of parse_cte.c, where we are trying to check for validity of a recursive query. The conditions that it's checking for are not exactly the same as what was being looked for previously, so this could do with a bit of review. * I got rid of the kluges in the executor in favor of treating the working table as a PARAM_EXEC Param. Also renamed the plan node types to RecursiveUnion and WorkTableScan --- I'm not wedded to these choices, but they seemed more transparent than the former names. * I have not yet tackled the problem of ensuring single evaluation of CTEs, but there's a few bits of infrastructure for it. There are various small loose ends denoted by XXX in the patch, but the main remaining issue is definitely the single-evaluation business. regards, tom lane
Attachment
"Hitoshi Harada" <umi.tanuki@gmail.com> writes: >> It seems to me to share some ideas with the MemoryContext concept: what >> about a TupstoreContext associated with tuplestore, you get a common default >> one if you don't register your own, and use >> tuplestore_gettuple(MyTupstoreContext, ...); > I'm just working on tuplestore recording multiple positions for my > window function project. Attached patch is still in progress but seems > it works in a situation. > From my work, the setting/getting read position and delegate savig > positions to the caller will probably have problems, because of memory > control for saving positions and tuplestore status changing (memory -> > BufFile). Instead, I decided it'd better that we can indicate the row > number by integer. That seems unreasonably inefficient --- it takes lots more file I/O, and there's really no need for random access, at least not in what I'm doing. I looked at the tuplestore code some more, and my idea of allowing callers to store their current position externally to tuplestore indeed doesn't work. The problem is that if dumptuples() happens there's no way to convert an index-based position to a file-based one. The current code can handle it because it has access to all the positions (the read, write, and mark pointers) but any positions stored externally wouldn't get converted. So it seems like the appropriate generalization is to have an array of read positions inside the tuplestore and allow callers to say "read using position N", plus some API to allow positions to be allocated to different requestors. We could get rid of the separate mark pointer by implementing an API that allows position X to be copied to position Y. But the actual value of a position (a tuple number or file position info) would never be exposed to callers. Comments? regards, tom lane
"Greg Stark" <stark@enterprisedb.com> writes: > On Wed, Oct 1, 2008 at 2:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> So it seems like the appropriate generalization is to have an array of >> read positions inside the tuplestore and allow callers to say "read >> using position N", plus some API to allow positions to be allocated to >> different requestors. > One other reason the tuplestore should know the position of all the > readers is that ideally it would want to be able to discard any tuples > older than the oldest read position. That also means it needs to know > when all the call sites have allocated their position and don't need > to reset it. Good point. So we'd need per-position capability flags, not per-tuplestore. I hadn't realized that this would be relevant to window functions. Now that I know that, I propose fixing tuplestore for multiple positions and committing it separately, before I go back to the CTE patch. Then Hitoshi-san will have something he can work with too. regards, tom lane
On Wed, Oct 1, 2008 at 2:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > So it seems like the appropriate generalization is to have an array of > read positions inside the tuplestore and allow callers to say "read > using position N", plus some API to allow positions to be allocated to > different requestors. We could get rid of the separate mark pointer by > implementing an API that allows position X to be copied to position Y. > But the actual value of a position (a tuple number or file position > info) would never be exposed to callers. That's basicaly what had done (though i had n "readers" which encapsulated the current pointer and mark). One other reason the tuplestore should know the position of all the readers is that ideally it would want to be able to discard any tuples older than the oldest read position. That also means it needs to know when all the call sites have allocated their position and don't need to reset it. -- greg
>> One other reason the tuplestore should know the position of all the >> readers is that ideally it would want to be able to discard any tuples >> older than the oldest read position. That also means it needs to know >> when all the call sites have allocated their position and don't need >> to reset it. > > Good point. So we'd need per-position capability flags, not > per-tuplestore. > > I hadn't realized that this would be relevant to window functions. > Now that I know that, I propose fixing tuplestore for multiple > positions and committing it separately, before I go back to the CTE > patch. Then Hitoshi-san will have something he can work with too. > Yes, tuplestore multiple positioning will give the greate help to the window function. Ideally, it is better that tuplestore'd have all the positions and have some kind of capability to discard old rows so that it can stay in TSS_MEM, which helps window function's sliding frame. But it isn't really critical, just performance matter. Regards, -- Hitoshi Harada
"Hitoshi Harada" <umi.tanuki@gmail.com> writes: >> I hadn't realized that this would be relevant to window functions. >> Now that I know that, I propose fixing tuplestore for multiple >> positions and committing it separately, before I go back to the CTE >> patch. Then Hitoshi-san will have something he can work with too. > Yes, tuplestore multiple positioning will give the greate help to the > window function. Ideally, it is better that tuplestore'd have all the > positions and have some kind of capability to discard old rows so that > it can stay in TSS_MEM, which helps window function's sliding frame. Okay, there's a patch in CVS HEAD that works this way. Let me know if it needs further tweaking for your purposes. regards, tom lane
2008/10/2 Tom Lane <tgl@sss.pgh.pa.us>: > "Hitoshi Harada" <umi.tanuki@gmail.com> writes: >>> I hadn't realized that this would be relevant to window functions. >>> Now that I know that, I propose fixing tuplestore for multiple >>> positions and committing it separately, before I go back to the CTE >>> patch. Then Hitoshi-san will have something he can work with too. > >> Yes, tuplestore multiple positioning will give the greate help to the >> window function. Ideally, it is better that tuplestore'd have all the >> positions and have some kind of capability to discard old rows so that >> it can stay in TSS_MEM, which helps window function's sliding frame. > > Okay, there's a patch in CVS HEAD that works this way. Let me know if > it needs further tweaking for your purposes. > > regards, tom lane > Hmm, I've looked over the patch. Logically window functions can access arbitrary rows that have been stored in a frame. Thus I had thought tuplestore should hold all the positions and allow arbitrary random access indicated by integer. Maybe those functionalities can be abstracted by the window function API itself. For this matter it seems that you'd better to look at my future patch. Everything else is great deal. By improving tuplestore_trim(), sliding frame will be performed better than I'd thought. Regards, -- Hitoshi Harada
"Hitoshi Harada" <umi.tanuki@gmail.com> writes: > 2008/10/2 Tom Lane <tgl@sss.pgh.pa.us>: >> Okay, there's a patch in CVS HEAD that works this way. Let me know if >> it needs further tweaking for your purposes. > Hmm, I've looked over the patch. Logically window functions can access > arbitrary rows that have been stored in a frame. Thus I had thought > tuplestore should hold all the positions and allow arbitrary random > access indicated by integer. Maybe those functionalities can be > abstracted by the window function API itself. For this matter it seems > that you'd better to look at my future patch. Well, the problem with defining it as "arbitrary" random access is that there's no way for the tuplestore to throw away old data. The scheme that I have in mind here is that you keep (at least) two read pointers, one that is where you're actually reading the data and one that is nailing down the oldest point you might need to return to. This is a generalization of the previous mark/restore logic to allow any number of pairs of mark and restore points. regards, tom lane
On 2 Oct 2008, at 05:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Hitoshi Harada" <umi.tanuki@gmail.com> writes: > >> Hmm, I've looked over the patch. Logically window functions can >> access >> arbitrary rows that have been stored in a frame. Thus I had thought >> tuplestore should hold all the positions and allow arbitrary random >> access indicated by integer. Maybe those functionalities can be >> abstracted by the window function API itself. For this matter it >> seems >> that you'd better to look at my future patch. > > Well, the problem with defining it as "arbitrary" random access is > that > there's no way for the tuplestore to throw away old data. And that there's no way to make it work if the tuplestore has spilled to disk.
2008/10/2 Greg Stark <greg.stark@enterprisedb.com>: > > > On 2 Oct 2008, at 05:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> "Hitoshi Harada" <umi.tanuki@gmail.com> writes: >> >>> Hmm, I've looked over the patch. Logically window functions can access >>> arbitrary rows that have been stored in a frame. Thus I had thought >>> tuplestore should hold all the positions and allow arbitrary random >>> access indicated by integer. Maybe those functionalities can be >>> abstracted by the window function API itself. For this matter it seems >>> that you'd better to look at my future patch. >> >> Well, the problem with defining it as "arbitrary" random access is that >> there's no way for the tuplestore to throw away old data. > > And that there's no way to make it work if the tuplestore has spilled to > disk. > In my purpose the "old data" can always be indicated by an integer row position. So the real problem here is how you store all the row positions for arbitrary random access closer to O(1). Yes, you can go to certain row by fetching tuples manytimes from a marked row but it's inefficient. I know my patch sent before is also inefficent. But essentially the window function needs high cost. I'll try to process my work with the patch but maybe more work on tuplestore will be needed after we see the real problem that I don't see now either. Regards, -- Hitoshi Harada
I've successfully taught the WITH patch to do single evaluation of WITH queries. I've also been through all of the planner and executor code for it and am now feeling pretty happy with the whole thing. There are still a small number of loose ends (see XXX in the patch), but I don't believe any of them represent significant work --- I just left them undone because they weren't in the way of testing anything else. One point of interest (at least to Hitoshi-san) is that my first cut at the multiple-readout tuplestore turned out to be not quite the right thing. In the original code, if the read pointer was "at eof" (meaning the previous gettuple call had returned null), then a puttuple call would move the read pointer forward over the added tuple, keeping it "at eof". This was done because nodeMaterial.c wanted it and no other callers cared particularly. I had generalized that to the idea that *all* read pointers that are "at eof" should get moved; but this turns out to be a really bad idea, at least for nodeCtescan's usage. What seems to be the right thing is for only the "active" read pointer to be moved forward, with inactive ones dropping out of "at eof" state. This seems reasonable because the point is to not have to reprocess a tuple you know you just stuck into the tuplestore --- but the other readers of the tuplestore won't know that, and need to see the tuple you added. But it might be that we need to make the behavior configurable somehow, if the window-functions patch turns out to need something different. Barring surprises in the loose ends, I expect to be able to commit this in a couple of days. regards, tom lane
Attachment
On Fri, Oct 03, 2008 at 03:55:56PM -0400, Tom Lane wrote: > I've successfully taught the WITH patch to do single evaluation of > WITH queries. I've also been through all of the planner and > executor code for it and am now feeling pretty happy with the whole > thing. There are still a small number of loose ends (see XXX in the > patch), but I don't believe any of them represent significant work > --- I just left them undone because they weren't in the way of > testing anything else. Great! How hard would it be to add the infrastructure for CYCLE? Here's the kind of awful hack I've been doing lately in order to detect and prevent cycles: CREATE TABLE adjacency( id INTEGER NOT NULL, parent_id INTEGER ); INSERT INTO adjacency VALUES(1,NULL), (2,1),(3,1),(4,1), (5,2),(6,2),(7,2),(8,3),(9,3),(10,4), (11,5),(12,5),(13,6),(14,7),(15,8), (9,1); /* Cycle! */ WITH RECURSIVE t(node, path) AS ( SELECT id, ARRAY[id] FROM adjacency WHERE parent_id IS NULL UNION ALL SELECT a1.id, t.path || a1.id FROM adjacency a1 JOIN t ON (a1.parent_id = t.node) WHERE a1.id <> ANY(t.path)/* Remove cycle using awful hack :P */ ) SELECT CASE WHEN array_upper(path,1)>1 THEN '+-' ELSE '' END || REPEAT('--', array_upper(path,1)-2) || node AS "Branch" FROM t ORDER BY path; I suspect that some kind of hash structure, instantiated only when a CYCLE clause is specified, could help out with a much, much more efficient implementation of cycle prevention. Adding SEARCH will be a lot more complicated, as DEPTH FIRST is completely different from how the implementation works now. Any ideas on how this might be approached? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter <david@fetter.org> writes: > How hard would it be to add the infrastructure for CYCLE? Personally I'm more interested in getting the recursive UNION (without ALL) case to work. I think that this might just be a matter of drawing on support that's already there: have RecursiveUnion maintain a hash table of rows it's already seen, and discard anything coming from either the non-recursive term or the recursive term that is found to be already present in the hash table. Two objections to this are (a) it doesn't work for datatypes without hash equality support, and (b) it would fail for recursion output exceeding available memory. However, the alternative of using sort-and-uniq to detect duplicates seems pretty horrid in this situation --- you'd need a fresh sort and mergejoin-like scan for every iteration of the recursive term. And it would become impractical to return rows immediately after they're emitted by the subqueries. So I'd be willing to live with only supporting the hash implementation. If no objections, I'll look at that next week ... regards, tom lane