Thread: SQL standard question about Common Table Expressions

SQL standard question about Common Table Expressions

From
Jeff Davis
Date:
I am looking into the SQL standard to try to determine precisely how the
CTE feature should behave.

Taking a simple case like:
 with recursive   foo(i) as     (values(1)     union all     select i+1 from foo where i < 5) select * from foo;

And looking at the SQL standard 200n 7.13: General Rules: 2.c, it
provides an algorithm for evaluating the recursive query.

In this algorithm, AQEk is a <query expression>. Syntactically, I only
see two <query expression>s, and one is the entire query. The other is:
"(values(1) union all select i+1 from foo where i < 5)", so I'll assume
that AQEk must be equal to that*.

The confusing thing to me is step 2.c.ix.3.B. If the query expression
AQEk is equal to the WQEk, step 2.c.ix.3.B will always set the working
table WTk to some kind of non-empty value, because the "values(1) union
all..." will always return at least one row. This will then cause it to
loop forever.

Where am I going wrong?

Also, 2.c.ii says "If AQEk is immediately contained in some WQEi...". In
the 200n standard, it appears that it's impossible for a <query
expression> to immediately contain another <query expression>. In the
2003 standard it can, but they added another level of indirection in the
200n standard by using an intervening <table subquery>. I'm not an
authority, but I believe this is a mistake.

Regards,Jeff Davis

* Having AQEk = WQEi disturbs me, too, because in the "Framework" part
of the standard, section 6.3.3.1, the definition of contains does not
seem to allow for them to be equal.



Re: SQL standard question about Common Table Expressions

From
Andrew Gierth
Date:
>>>>> "Jeff" == Jeff Davis <pgsql@j-davis.com> writes:
Jeff> I am looking into the SQL standard to try to determineJeff> precisely how the CTE feature should behave.
Jeff> Taking a simple case like:
Jeff>   with recursiveJeff>     foo(i) asJeff>       (values(1)Jeff>       union allJeff>       select i+1 from foo
wherei < 5)Jeff>   select * from foo;
 
Jeff> And looking at the SQL standard 200n 7.13: General Rules: 2.c,Jeff> it provides an algorithm for evaluating the
recursivequery.
 
Jeff> In this algorithm, AQEk is a <query expression>. Syntactically,Jeff> I only see two <query expression>s, and one
isthe entireJeff> query. The other is: "(values(1) union all select i+1 from fooJeff> where i < 5)", so I'll assume
thatAQEk must be equal to that*.
 

The "contains" language in the spec is tricky. And I think there's some
issue here with the spec confusing <query expression> and <query
expression body>; some of the language refers to "If a <query expression>
AQEk not marked as recursive is immediately contained in a <query
expression body>" which is not possible as the syntax production for
<query expression body> does not contain <query expression>.

Working through the spec, we have here only one WLE and therefore only
one partition. So WLE1 is the only element, and WQN1 is "foo" and
WQE1 is "(values(1) union all select i+1 from foo where i < 5)". The
way I suspect it's _intended_ to work is like this (following the order
of steps in the spec):

AQE1 = "values(1) union all select i+1 from foo where i < 5"
AQE2 = "values (1)"
AQE3 = "select i+1 from foo where i < 5"

AQE1 and AQE3 are recursive but AQE2 is not.

AQE1 is associated with WQE1 and therefore RT1 and WT1 are the result
and working tables for WQN1.

AQE2 is not marked as recursive and is contained in a recursive union,
therefore it is marked iteration-ignorable.

None of the AQEn suppress duplicates.

Let RT2 and WT2 be the result of AQE2, so both now contain (1)

AQE2 now becomes "TABLE RTN2" (and hence AQE1/WQE1 are now "TABLE RTN2
union all select i+1 from foo where i < 5"

Evaluate WQE1:
 RT1 and WT1 become (1) RT2/WT2 unchanged RT3 and WT3 remain empty

AQE2 is iteration ignorable, so RT2 is emptied.

WT1 and WT2 are nonempty, so:

associate "foo" with WT1

evaluate WQE1:
 WT1 becomes (2) WT2 is empty WT3 becomes (2)
 RT1 becomes (RT1 UNION ALL WT1) i.e. (1),(2) RT3 becomes (RT3 UNION ALL WT3) i.e. (1),(2)

WT1 and WT2 are nonempty, so:

associate "foo" with WT1

evaluate WQE1:
 WT1 becomes (3) WT2 is empty WT3 becomes (3)
 RT1 becomes (RT1 UNION ALL WT1) i.e. (1),(2),(3) RT3 becomes (RT3 UNION ALL WT3) i.e. (1),(2),(3)

etc.

[The main differences between this and the logic used by the existing
patch are the fact that the spec's version is keeping around many more
working and result tables in order to do the duplicate-suppression logic,
which requires knowing not just whether you've already produced a given
row, but also _which branch of the query_ produced it. Within the
constraints of the patch, it's not possible to trigger the duplicate
elimination, and therefore RT2/WT2 and RT3/WT3 become redundant; also all
non-recursive clauses are iteration-ignorable.]

-- 
Andrew (irc:RhodiumToad)


Re: SQL standard question about Common Table Expressions

From
Jeff Davis
Date:
On Tue, 2008-09-09 at 01:45 +0100, Andrew Gierth wrote:
> The "contains" language in the spec is tricky. And I think there's some
> issue here with the spec confusing <query expression> and <query
> expression body>; some of the language refers to "If a <query expression>
> AQEk not marked as recursive is immediately contained in a <query
> expression body>" which is not possible as the syntax production for
> <query expression body> does not contain <query expression>.

Now that you point that out, I think that is a mistake in the spec. Your
version makes a lot more sense.

Thank you for going into the gory details of the algorithm, there were a
few other things tripping me up that you clarified.

I was going crazy yesterday trying to piece together what is supposed to
happen for, e.g., using INTERSECT to connect the recursive and the
non-recursive parts, and this answers it. I'm not suggesting that we
require support for INTERSECT, but my curiosity got the best of me.

Regards,Jeff Davis