Re: Common Table Expressions (WITH RECURSIVE) patch - Mailing list pgsql-hackers

From Andrew Gierth
Subject Re: Common Table Expressions (WITH RECURSIVE) patch
Date
Msg-id 871vzumuoo.fsf@news-spur.riddles.org.uk
Whole thread Raw
In response to Common Table Expressions (WITH RECURSIVE) patch  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Common Table Expressions (WITH RECURSIVE) patch  (Jeff Davis <jdavis@truviso.com>)
List pgsql-hackers
>>>>> "Jeff" == Jeff Davis <pgsql@j-davis.com> writes:
Jeff> * Mutual Recursion:

This limitation isn't at all uncommon in other implementations; DB2
docs for example say:

"If more than one common table expression is defined in the same
statement, cyclic references between the common table expressions are
not permitted. A cyclic reference occurs when two common table
expressions dt1 and dt2 are created such that dt1 refers to dt2 and
dt2 refers to dt1."


http://publib.boulder.ibm.com/infocenter/iadthelp/v7r0/index.jsp?topic=/com.ibm.etools.iseries.langref2.doc/rbafzmst295.htm

MSSQL's documentation is less clear, but it also appears not to allow
mutual recursion (not allowing forward references to CTEs).

"A CTE can reference itself and previously defined CTEs in the same
WITH clause. Forward referencing is not allowed."

http://msdn.microsoft.com/en-us/library/ms175972.aspx
Jeff> * RHS only:
Jeff>   with recursiveJeff>     foo(i) as (select i+1 from foo where i < 10 union all values(1))Jeff>   select * from
foo;Jeff>  ERROR:  Left hand side of UNION ALL must be a non-recursive term in aJeff> recursive query
 
Jeff>   The standard does not require that the recursive term be onJeff>   the RHS.

Again, the standard may not, but existing implementations do:

MSSQL:

"The recursive CTE definition must contain at least two CTE query
definitions, an anchor member and a recursive member. Multiple anchor
members and recursive members can be defined; however, all anchor
member query definitions must be put before the first recursive member
definition. All CTE query definitions are anchor members unless they
reference the CTE itself. "

DB2:

"The following restrictions apply to a recursive common-table-expression:   * A list of column-names must be specified
followingthe     table-identifier.   * The UNION ALL set operator must be specified.   * The first fullselect of the
firstunion (the initialization     fullselect) must not include a reference to the     common-table-expression itself
inany FROM clause.
 
"
Jeff> * UNION ALL only:
Jeff>   with recursiveJeff>     foo(i) as (values(1) union select i+1 from foo where i < 10)Jeff>   select * from
foo;Jeff>  ERROR:  non-recursive term and recursive term must be combined withJeff> UNION ALL
 
Jeff>   The standard seems to allow UNION ALL, UNION, INTERSECT, andJeff>   EXCEPT (when the recursive term is not on
theRHS of theJeff>   EXCEPT).
 

Again, existing implementations disagree. See above for DB2, and for
MSSQL:

"Anchor members must be combined by one of these set operators: UNION
ALL, UNION, INTERSECT, or EXCEPT. UNION ALL is the only set operator
allowed between the last anchor member and first recursive member, and
when combining multiple recursive members."
Jeff> * Binary recursion and subselect strangeness:
Jeff>   with recursive foo(i) asJeff>     (values (1)Jeff>     union allJeff>     select * fromJeff>       (select i+1
fromfoo where i < 10Jeff>       union allJeff>       select i+1 from foo where i < X) t)Jeff>   select * from foo;
 
Jeff>   Produces 10 rows of output regardless of what "X" is. ThisJeff> should be fixed for 8.4.  Also, this is
non-linearrecursion,Jeff> which the standard seems to disallow.
 

That looks like it should be disallowed somehow.
Jeff> * Multiple recursive references:[snip]
Jeff>   If we're going to allow non-linear recursion (which theJeff>   standard does not), this seems like it should be
avalidJeff>   case.
 

We probably shouldn't allow it (as above).

[snip * Strange result with except: which looks like a bug]
Jeff> * Aggregates allowed: which
Jeff>   with recursive foo(i) asJeff>     (values(1)Jeff>     union allJeff>     select max(i)+1 from foo where i <
10)Jeff>  select * from foo;
 
Jeff>   Aggregates should be blocked according to the standard.Jeff>   Also, causes an infinite loop. This should be
fixedfor 8.4.
 

Does the standard require anywhere that non-conforming statements must
be diagnosed? (seems impractical, since it would forbid extensions)
Jeff> * DISTINCT should supress duplicates:
Jeff>   with recursive foo(i) asJeff>     (select distinct * from (values(1),(2)) tJeff>     union allJeff>     select
distincti+1 from foo where i < 10)Jeff>   select * from foo;
 
Jeff>   This outputs a lot of duplicates, but they should beJeff> supressed according to the standard. This query is
essentiallyJeff>the same as supporting UNION for recursive queries, so weJeff> should either fix both for 8.4 or block
bothfor consistency.
 

Yeah, though the standard's use of DISTINCT in this way is something
of a violation of the POLA.
Jeff> * outer joins on a recursive reference should be blocked:
Jeff>   with recursive foo(i) asJeff>     (values(1)Jeff>     union allJeff>     select i+1 from foo left join
(values(1))t on (i=column1))Jeff>   select * from foo;
 
Jeff>   Causes an infinite loop, but the standard says using an outerJeff>   join in this situation should be
prohibited.This should beJeff>   fixed for 8.4.
 

No. This has already been discussed; it's neither possible nor desirable
to diagnose all cases which can result in infinite loops, and there are
important types of queries which would be unnecessarily forbidden.

Besides, you've misread the spec here: it prohibits the recursive
reference ONLY on the nullable side of the join. You cite:
Jeff> outer joins not allowed to join recursive references:Jeff>   7.13: Syntax Rules: 2.g.iii.6Jeff>   7.13: Syntax
Rules:2.g.iii.7
 

6) WQEi shall not contain a <qualified join> QJ in which:
  A) QJ immediately contains a <join type> that specifies FULL and a     <table reference> or <table factor> that
containsa <query name>     referencing WQNj.
 

[no recursive FULL JOIN at all]
  B) QJ immediately contains a <join type> that specifies LEFT and a     <table factor> following the <join type> that
containsa <query     name> referencing WQNj.
 

[note "following the <join type>", so  WQNj LEFT JOIN foo is allowed,
but foo LEFT JOIN WQNj is not]
  C) QJ immediately contains a <join type> that specifies RIGHT and a     <table reference> preceding the <join type>
thatcontains a     <query name> referencing WQNj.
 

[note "preceding the <join type>", so  foo RIGHT JOIN WQNj is allowed,
but WQNj RIGHT JOIN foo is not]

para (7) is identical but for natural rather than qualified joins.
Jeff> * ORDER BY, LIMIT, and OFFSET are rejected for recursiveJeff> queries. The standard does not seem to say that
theseshould beJeff> rejected.
 

Note that supporting those in subqueries (including CTEs) is a separate
optional feature of the standard.

-- 
Andrew (irc:RhodiumToad)


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: sql2008 diff sql2003
Next
From: "Brendan Jurd"
Date:
Subject: Re: [PATCHES] to_date() validation