Thread: WITH RECURSIVE patches 0818

WITH RECURSIVE patches 0818

From
Tatsuo Ishii
Date:
Hi,

Here is the latest WITH RECURSIVE patches against CVS HEAD. Besides
syncing to CVS HEAD, followings are main differences from previous
one:

1) Enhance sgml docs. Patches contributed by Jeff Davis.

2) Fix bug with plans using HashAggregate. Hash tables should be
   recreated if recursive scan is used. Otherwise goes into infinite
   recursion.

Also I include an implementation doc slightly enhanced previous one.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
1. What is WITH [RECURSIVE] queries?

WITH clause allows to define a table expression being valid within a
same SELECT statement. The table expression is called "Common Table
Expression"(CTE). The use case for CTE is similar to VIEW but it is
more handy than VIEW. Unlike VIEW you do not need to define a VIEW
beforehand.

CTE allows to use recursive quries. To use a recursive query, you need
to add RECURSIVE keyword. Here is an example for WITH RECURSIVE clause
usage. Table "department" represents the structure of an organization.

CREATE TABLE department (
    id INTEGER PRIMARY KEY,  -- department ID
    parent_department INTEGER REFERENCES department, -- upper department ID
    name TEXT -- department name
);

INSERT INTO department VALUES (0, NULL, 'ROOT');
INSERT INTO department VALUES (1, 0, 'A');
INSERT INTO department VALUES (2, 1, 'B');
INSERT INTO department VALUES (3, 2, 'C');
INSERT INTO department VALUES (4, 2, 'D');
INSERT INTO department VALUES (5, 0, 'E');
INSERT INTO department VALUES (6, 4, 'F');
INSERT INTO department VALUES (7, 5, 'G');

-- department structure represented here is as follows:
--
-- ROOT-+->A-+->B-+->C
--      |         |
--      |         +->D-+->F
--      +->E-+->G

To extract all departments under A, following recursive query can be
used:

WITH RECURSIVE subdepartment AS
(
    -- non recursive term
    SELECT * FROM department WHERE name = 'A'

    UNION ALL

    -- recursive term
    SELECT d.* FROM department AS d, subdepartment AS sd
        WHERE d.parent_department = sd.id
)
SELECT * FROM subdepartment ORDER BY name;

The meaning of the query above can be explained as follows.

Let IT be an intermediate table. Let WT be a work table. Let RT be a
result table.

1) initialize

Execute non recursive term (SELECT * FROM department WHERE name = 'A')
and assign the result to RT and WT.

RT = WT = ('A')

Make IT empty.

IT = ()

2) execute recursive query

Replace subdepartment with WT and execute recursive term (SELECT d.*
FROM WT AS d, subdepartment AS sd WHERE d.parent_department =
sd.id) and assign the result to IT.
Add IT to RT. Assign IT to WT. Make IT empty.

3) Check if recursion is ended

If IT is empty in 2), execution of recursion is ended. Return RT.

Otherwise go to 2).

"subdepartment" is a CTE including a recursive expression since it
referes itself within its definition. In the above query first non
recursive term is evaluted. Then recursive term is evaluated and the
result is added to the prevoius result set over and over until there's
no more data to process. Finally the last SELECT is excuted and data
is extracted from the result set.

2. Implementation details

- Parsing recursive queries

To represent CTE, new structure member "withClause" is added to
SelectStmt structure which is used to store parsed information of a
SELECT stament. The definition of withClause is as follows:

/*
 * WithClause -
 *     reprensentation of WITH clause
 */
typedef struct WithClause
{
    NodeTag        type;            /* T_WithClause */
    List        *subquery;        /* subquery list (list of RangeSubselect) */
    bool        recursive;        /* true = WITH RECURSIVE */
} WithClause;

If RECURSIVE key word is not given, CTE is just converted to a
WithClause.subquery and WithClause.recursive is set to false.

If RECURSIVE key word is given, the raw parse tree stored in
WithClause.subquery represents a subquery which is an UNION ALL of a
non recursive term and a recursive term (set operation other than
UNION ALL between a non recursive term and a recursive term is not
allowed).  WithClause.subquery.subquery.larg represents "SELECT * FROM
department WHERE name = 'A'" and WithClause.subquery.subquery.rarg
represents "SELECT d.* FROM department AS d, subdepartment AS sd WHERE
d.parent_department = sd.id" in the example above.

- Analyzing recursive queries

To analyze CTEs, we add following new members "p_ctenamespace",
"p_recursive_namespace", and "p_in_with_clause" to ParseState structure.

p_ctenamespace is a namespace for non recursive CTEs and a list of
RangeSubselects. Here is an excerption from comments of ParseState:

 * [1] Note that p_ctenamespace is a namespace for "relations" but distinct
 *     from p_relnamespace. p_ctenamespace is a list of relations that can be
 *     referred to in a FROM or JOIN clause (in addition to normal tables and
 *     views). p_relnamespace is the list of relations which already have been
 *     listed in such clauses and therefore can be referred to in qualified
 *     variable references. Also, note that p_ctenamespace is a list of
 *     RangeSubselects, not a list of range table entries.

p_recursive_namespace is a namespace for recursive CTEs and is a list
of RangeRecursive. RangeRecursive is a newly introduced strucure:

/*
 * RangeRecursive - recursive call appearing in a FROM clause
 */
typedef struct RangeRecursive
{
    NodeTag        type;
    Node       *subquery;        /* the untransformed sub-select clause */
    Alias       *alias;            /* table alias & optional column aliases */
    List       *coldeflist;        /* list of ColumnDef nodes to describe result */
    Node        *non_recursive_term;    /* analyze result of non recusive term */
    bool        recursive;        /* true if recursive query */
} RangeRecursive;

ParseState.p_rtable is a list of RangeTblEntry. To process CTEs, new
members "self_reference" and "non_recursive_query" are added to
RangeTblEntry.

    /*
     * Fields valid for a RECURSIVE CTE (else NIL)
     */
    bool        self_reference; /* delay analyzing SelectStmt */
    Query        *non_recursive_query; /* non-recursive term */

If a CTE using "RECURSIVE" keyword is not actually recursive,
"recursive" is set to false and the CTE is treated as a subquery and
added to ParseState.p_rtable.

If the CTE referes to itself, analyzing will be delayed,
self_reference is set to true, and non_recurisve_term are filled.

It is possible that more than one CTE elements (query names) exist.

WITH RECURSIVE x AS (SELECT * from y),
                y AS (SELECT * FROM pg_class)
  SELECT * FROM x;

In this case we should analyze y first since x depends on y. To find
such dependency information, a topological sort is performed.
See makeDependencyGraph() for more details.

Next we check if the recursive query is follow the SQL standard.
Following example queries are not allowed by the standard.

- non-recursive term and recursive term must be combined with UNION ALL
WITH RECURSIVE x(n) AS (SELECT 1 UNION SELECT n+1 FROM x)
    SELECT * FROM x;

WITH RECURSIVE x(n) AS (SELECT 1 INTERSECT SELECT n+1 FROM x)
    SELECT * FROM x;

- non-recursive term does not exist

WITH RECURSIVE x(n) AS (SELECT n FROM x)
    SELECT * FROM x;

-  recursive term must be in the left hand side
WITH RECURSIVE x(n) AS (SELECT n FROM x UNION ALL SELECT 1)
    SELECT * FROM x;

- LEFT JOIN in the right hand side of recursive term not allowed
WITH RECURSIVE x(n) AS (SELECT a FROM y WHERE a = 1
    UNION ALL
    SELECT x.n+1 FROM y LEFT JOIN x ON x.n = y.a WHERE n < 10)
SELECT * FROM x;

- RIGHT JOIN in the left hand side of recursive term not allowed
WITH RECURSIVE x(n) AS (SELECT a FROM y WHERE a = 1
    UNION ALL
    SELECT x.n+1 FROM x RIGHT JOIN y ON x.n = y.a WHERE n < 10)
SELECT * FROM x;

- FULL JOIN in a recursive term not allowed
WITH RECURSIVE x(n) AS (SELECT a FROM y WHERE a = 1
    UNION ALL
    SELECT x.n+1 FROM x FULL JOIN y ON x.n = y.a WHERE n < 10)
SELECT * FROM x;

- WHERE clause having subqury which uses recursive name in a recursive
  term not allowed
WITH RECURSIVE x(n) AS (SELECT 1 UNION ALL SELECT n+1 FROM x
                          WHERE n IN (SELECT * FROM x))
  SELECT * FROM x;

- GROUP BY, HAVING in a recursive term not allowed
WITH RECURSIVE x(n) AS (SELECT 1 UNION ALL SELECT n+1 FROM x GROUP BY n)
  SELECT * FROM x;

-- aggregate functions a recursive term not allowed
WITH RECURSIVE x(n) AS (SELECT 1 UNION ALL SELECT count(*) FROM x)
  SELECT * FROM x;

In addition to above restrictions, we add more restrictions:

- ORDER BY in a recursive query not allowed
WITH RECURSIVE x(n) AS (SELECT 1 UNION ALL SELECT n+1 FROM x ORDER BY 1)
  SELECT * FROM x;

- LIMIT OFFSET in a recursive query not allowed
WITH RECURSIVE x(n) AS (SELECT 1 UNION ALL SELECT n+1 FROM x LIMIT 10 OFFSET 1)
  SELECT * FROM x;

- FOR UPDATE in a recursive query not allowed
WITH RECURSIVE x(n) AS (SELECT 1 UNION ALL SELECT n+1 FROM x FOR UPDATE)
  SELECT * FROM x;

- target list having subquery which uses recursive name in a recursive term not allowed
WITH RECURSIVE x(id) AS (values (1)
    UNION ALL
    SELECT (SELECT * FROM x) FROM x WHERE id < 5
) SELECT * FROM x;

- mutual recursive call is not supported
WITH RECURSIVE
  x (id) AS (SELECT 1 UNION ALL SELECT id+1 FROM y WHERE id < 5),
  y (id) AS (SELECT 1 UNION ALL SELECT id+1 FROM x WHERE id < 5)
SELECT * FROM x;

Note that SQL Server and Firebird do not allow mutual recursive query
either (Oracle does not support WITH RECURSIVE at this point by the way).

Tables defined in WITH RECURSIVE clause are identified as
RTE_RECURSIVE out of RTEKind in the RangeTblEntry in the parse tree
nodes.  A name space "p_recursive_namespace", whose structure type is
RangeRecursive, is added to in ParseState structure.

typedef struct RangeRecursive
{
    NodeTag        type;
    Node       *subquery; /* whole subquery */
    Alias       *alias;            /* table alias & optional column aliases */
    List       *coldeflist;        /* list of ColumnDef nodes */
    Node       *non_recursive_term; /* analyze result of non recursive term */
    bool       recursive; /* true if recursive */
} RangeRecursive;

non_recursive_term keeps the result of analyzing on non recursive
term.  Using this, the type of recursive query is inferenced.


- Rewriter

The changes applied to the rewriter is small. fireRIRrules()
recursively expand subqueries. We just do the same thing if it's a
recurisve query.

- Generating a plan

New plan node "Recursion" and "Recursive scan" are added.  Recusion
plan is the top level plan for execution of WITH RECURSIVE query. In
the example below, Recursion plan represents the execution plan for
"SELECT * FROM subdepartment" part. Recursion plan always has "Append"
subplan which represents the execution plan for "UNION ALL" for non
recursive part and recursive part. Plan for non recursive part is
nothing special and is an ordinary plan generated according to the non
recursive query part. Recursive scan represents the plan for recursive
part.

Below is an example EXPLAIN output.

EXPLAIN WITH RECURSIVE subdepartment AS
(
    -- non recursive term
    SELECT * FROM department WHERE name = 'A'

    UNION ALL

    -- recursive term
    SELECT d.* FROM department AS d, subdepartment AS sd
        WHERE d.parent_department = sd.id
)
SELECT * FROM subdepartment;

                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Recursion on subdepartment  (cost=0.00..50.76 rows=12 width=40)
   ->  Append  (cost=0.00..50.64 rows=12 width=40)
         ->  Seq Scan on department  (cost=0.00..24.50 rows=6 width=40)
               Filter: (name = 'A'::text)
         ->  Hash Join  (cost=0.01..26.02 rows=6 width=40)
               Hash Cond: (d.parent_department = sd.id)
               ->  Seq Scan on department d  (cost=0.00..21.60 rows=1160 width=40)
               ->  Hash  (cost=0.00..0.00 rows=1 width=4)
                     ->  Recursive Scan on sd  (cost=0.00..0.00 rows=1 width=4)

The structure which represents Recursion plan is as follows:

typedef struct Recursion
{
    Scan    scan;
    Plan    *subplan;
    List    *subrtable;        /* temporary workspace for planner */
} Recursion;

The structure which represents RecursiveScan plan is as follows:

typedef struct RecursiveScan
{
    Scan        scan;
} RecursiveScan;


- Executor

To execute plans for CTEs, we add new members "es_tuplestorestate",
"es_rscan_tupledesc" and "es_disallow_tuplestore". es_tuplestorestate
is used to remember TupleStoreState of the working
table(WT). es_rscan_tupledesc is used to remeber the type information
of the tuples returned by the CTEs. es_disallow_tuplestore is not
actually used.

To manage recursive query execution state following structures are added.

typedef struct RecursionState
{
    ScanState    ss;
    PlanState  *subplan;        /* plan for non recursive query */
    bool recursive_empty; /* if recursive term does exist, true */
    Tuplestorestate *intermediate_tuplestorestate; /* intermediate table (IT) */
    Tuplestorestate *working_table;        /* working table (WT) */
    bool init_done; /* initialization flag */
} RecursionState;

typedef struct RecursiveScanState
{
    ScanState    ss;
    TupleDesc    tupdesc;
    Tuplestorestate **working_table;    /* working table */
} RecursiveScanState;

- Initializing Recursion Plan

This is done by ExecInitRecursion(). The working table (WT) is created
and its pointer is stored in RecusionState.working_table. The
imtermidiate table (IT) is created and its pointer is stored in
RecusionState.intermediate_tuplestorestate.

While the initialization WT created above must be used and pointer to
WT is stored in es_tuplestorestate in the master executor state
(Estate).

Type info for scan is taken from the non recursive query (saved in
RecursionState.subquery) and stored in RecursionState.ss and
Estate/es_rscan_tupledesc.

- Executing Recursion Plan

Recursion Plan is executed by ExecRecursion(). The work horse is
RecursionNext(). First it execute a plan for non recursive term and
the result is stored in WT. Then execute a plan for recursive term,
which is a subplan of the recursion plan. If the plan returns one or
more rows, they are store in IT. IT and WT are swapped and recreate
IT. If no row is returned, the recursion is ended and ExecRecursion()
returns NULL.

- Initializing RecursiveScan plan

This is very much same as the initialization of RecursionScan execpt
that type info for scan is taken from Estate.es_rscan_tupledesc which
is initialized by ExecInitRecursion().

- Executing RecursiveScan plan

Recursive scan is executed by ExecRecursiveScan(). The work horse is
RecursivescanNext().  It just eturns tuples store in WT. Work table is
stored in Estate structure.

- Miscellaneous changes in Executor

1) Hash join plan does not always recreate hash until hash join
   ended. If hash join has Recurisve Scan as its subplan, the hash
   needs to recreate. To solve the problem, "has_recursivescan" is
   added to PlanState structure, which is true if upper plan has
   RecursiveScan as a subplan and hash is recreated.

2) exec_append_initialize_next() is executed while processing
   recurision. The function is now a global one.

3. Limitations

1) SEARCH and CYCLE clauses are not implemented

2) Mutual recursion is not allowed

3) Only the last SELECT of UNION ALL can include self recursion name.

4) Cost of Recursion and Recursivescan plan are always 0.

Attachment

Re: [HACKERS] WITH RECURSIVE patches 0818

From
David Fetter
Date:
On Mon, Aug 18, 2008 at 04:38:52PM +0900, Tatsuo Ishii wrote:
> Hi,
>
> Here is the latest WITH RECURSIVE patches against CVS HEAD. Besides
> syncing to CVS HEAD, followings are main differences from previous
> one:

Thanks for the new patch :)

I think I may have found another bug:

WITH RECURSIVE t(i,j) AS (
    VALUES (1,2)
UNION ALL
    SELECT t2.i, t.j
    FROM (
        SELECT 2 AS i
    UNION ALL               /* Wrongly getting detected, I think */
        SELECT 3 AS i
    ) AS t2
    JOIN
        t
        ON (t2.i = t.i)
)
SELECT * FROM t;
ERROR:  attribute number 2 exceeds number of columns 1

Is there some way to ensure that in the case of WITH RECURSIVE, the
query to the right of UNION ALL follows only the SQL:2008 rules about
not having outer JOINs, etc. in it, but otherwise make it opaque to
the error-checking code?

I know I didn't explain that well, but the above SQL should work and
the error appears to stem from the parser's looking at the innermost
UNION ALL instead of the outermost.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: [HACKERS] WITH RECURSIVE patches 0818

From
Tatsuo Ishii
Date:
> I think I may have found another bug:
>
> WITH RECURSIVE t(i,j) AS (
>     VALUES (1,2)
> UNION ALL
>     SELECT t2.i, t.j
>     FROM (
>         SELECT 2 AS i
>     UNION ALL               /* Wrongly getting detected, I think */
>         SELECT 3 AS i
>     ) AS t2
>     JOIN
>         t
>         ON (t2.i = t.i)
> )
> SELECT * FROM t;
> ERROR:  attribute number 2 exceeds number of columns 1
>
> Is there some way to ensure that in the case of WITH RECURSIVE, the
> query to the right of UNION ALL follows only the SQL:2008 rules about
> not having outer JOINs, etc. in it, but otherwise make it opaque to
> the error-checking code?
>
> I know I didn't explain that well, but the above SQL should work and
> the error appears to stem from the parser's looking at the innermost
> UNION ALL instead of the outermost.

Thanks for the report. I will look into this.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

Re: [HACKERS] WITH RECURSIVE patches 0818

From
Tatsuo Ishii
Date:
> I think I may have found another bug:
>
> WITH RECURSIVE t(i,j) AS (
>     VALUES (1,2)
> UNION ALL
>     SELECT t2.i, t.j
>     FROM (
>         SELECT 2 AS i
>     UNION ALL               /* Wrongly getting detected, I think */
>         SELECT 3 AS i
>     ) AS t2
>     JOIN
>         t
>         ON (t2.i = t.i)
> )
> SELECT * FROM t;
> ERROR:  attribute number 2 exceeds number of columns 1
>
> Is there some way to ensure that in the case of WITH RECURSIVE, the
> query to the right of UNION ALL follows only the SQL:2008 rules about
> not having outer JOINs, etc. in it, but otherwise make it opaque to
> the error-checking code?
>
> I know I didn't explain that well, but the above SQL should work and
> the error appears to stem from the parser's looking at the innermost
> UNION ALL instead of the outermost.

Here is new patches fixing the bug you pointed out (patches was
created by Yoshiyuki). Also I added your SQL to the regression test,
and now the patches is against CVS HEAD. For your convenience I also
include patches against the previous version.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
*** pgsql/src/backend/executor/nodeAppend.c    2008-08-18 16:20:40.000000000 +0900
--- pgsql.patched/src/backend/executor/nodeAppend.c    2008-08-23 07:37:29.000000000 +0900
***************
*** 143,152 ****
--- 143,156 ----
      int            nplans;
      int            i;
      Plan       *initNode;
+     TupleDesc  save_tupledesc;

      /* check for unsupported flags */
      Assert(!(eflags & EXEC_FLAG_MARK));

+     /* Save tuple desc */
+     save_tupledesc = estate->es_rscan_tupledesc;
+
      /*
       * Set up empty vector of subplan states
       */
***************
*** 232,237 ****
--- 236,243 ----
      appendstate->as_whichplan = appendstate->as_firstplan;
      exec_append_initialize_next(appendstate);

+     /* Restore tuple desc */
+     estate->es_rscan_tupledesc = save_tupledesc;
+
      return appendstate;
  }

Attachment

Re: [HACKERS] WITH RECURSIVE patches 0818

From
David Fetter
Date:
On Sat, Aug 23, 2008 at 11:33:13AM +0900, Tatsuo Ishii wrote:
> > I think I may have found another bug:
> >
> > WITH RECURSIVE t(i,j) AS (
> >     VALUES (1,2)
> > UNION ALL
> >     SELECT t2.i, t.j
> >     FROM (
> >         SELECT 2 AS i
> >     UNION ALL               /* Wrongly getting detected, I think */
> >         SELECT 3 AS i
> >     ) AS t2
> >     JOIN
> >         t
> >         ON (t2.i = t.i)
> > )
> > SELECT * FROM t;
> > ERROR:  attribute number 2 exceeds number of columns 1
> >
> > Is there some way to ensure that in the case of WITH RECURSIVE,
> > the query to the right of UNION ALL follows only the SQL:2008
> > rules about not having outer JOINs, etc. in it, but otherwise make
> > it opaque to the error-checking code?
> >
> > I know I didn't explain that well, but the above SQL should work
> > and the error appears to stem from the parser's looking at the
> > innermost UNION ALL instead of the outermost.
>
> Here is new patches fixing the bug you pointed out (patches was
> created by Yoshiyuki). Also I added your SQL to the regression test,
> and now the patches is against CVS HEAD. For your convenience I also
> include patches against the previous version.

Thanks :)

Any progress on the READMEs for this?

Also, now that we are into August, would Asaba-san and whomever else
like to try out the git repository?  To do so, I just need a login
name and a public key.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: [HACKERS] WITH RECURSIVE patches 0818

From
Tatsuo Ishii
Date:
> > Here is new patches fixing the bug you pointed out (patches was
> > created by Yoshiyuki). Also I added your SQL to the regression test,
> > and now the patches is against CVS HEAD. For your convenience I also
> > include patches against the previous version.
>
> Thanks :)
>
> Any progress on the READMEs for this?

I have posted kind of README (implementation.txt) along with patches
on Aug 18. Have you read it?

> Also, now that we are into August, would Asaba-san and whomever else
> like to try out the git repository?  To do so, I just need a login
> name and a public key.

I will send you later.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

Re: [HACKERS] WITH RECURSIVE patches 0818

From
David Fetter
Date:
On Sat, Aug 23, 2008 at 03:35:52PM +0900, Tatsuo Ishii wrote:
> > > Here is new patches fixing the bug you pointed out (patches was
> > > created by Yoshiyuki). Also I added your SQL to the regression
> > > test, and now the patches is against CVS HEAD. For your
> > > convenience I also include patches against the previous version.
> >
> > Thanks :)
> >
> > Any progress on the READMEs for this?
>
> I have posted kind of README (implementation.txt) along with patches
> on Aug 18. Have you read it?

Oops.  I've now put it up with some minor edits and formatting at
<http://wiki.postgresql.org/wiki/CTEReadme>.  It is linked from the
Commitfest page.

> > Also, now that we are into August, would Asaba-san and whomever else
> > like to try out the git repository?  To do so, I just need a login
> > name and a public key.
>
> I will send you later.

Thanks :)

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate