Thread: RFP: Recursive query in 8.4

RFP: Recursive query in 8.4

From
Tatsuo Ishii
Date:
Hi,

As I promised before we would like to propose implementing the
recursive query as defined in the standard for PostgreSQL 8.4.

The work is supported by Sumitomo Electric Information Systems Co.,
Ltd. (http://www.sei-info.co.jp/) and SRA OSS, Inc. Japan
(http://www.sraoss.co.jp).

1. Overview

We propose to implement the recursive query (WITH RECURSIVE clause)
defined in SQL:1999 and later. With the recursive query, one can
easily inquire the data expressed as tree and graph structures.  The
actual syntax we prefer is the one defined in SQL:2008 (it's not
published yet, but I have a closest draft).

We do not propose the WITH clause without RECURSIVE key word here
since someone else has already made a proposal for this.
(http://archives.postgresql.org/pgsql-patches/2008-01/msg00105.php)

2. Example

For those who are not familiar with the recursive query, I include an
example:

CREATE TABLE department ( id INT PRIMARY KEY, parent_department INT REFERENCES department, name TEXT
);

INSERT INTO department VALUES (0, NULL, 'ROOT');
INSERT INTO department VALUES (1, 0, 'A');
INSERT INTO department VALUES (2, 1, 'B');
INSERT INTO department VALUES (3, 2, 'C');
INSERT INTO department VALUES (4, 2, 'D');
INSERT INTO department VALUES (5, 0, 'E');
INSERT INTO department VALUES (6, 4, 'F');
INSERT INTO department VALUES (7, 4, 'G');

This will represent a tree structure of an organization:
 ROOT ---> A ---> B ---> C ---> F   |              |   |              +----> D   |   +-----> E ---> G

If you want to extract all departments "under" A, you could use a
recursive query:

WITH RECURSIVE subdepartment AS
( --  SELECT * FROM department WHERE id = 'A'
 UNION ALL
 -- recursive term referring to "subdepartment" SELECT d.* FROM department AS d, subdepartment AS sd   WHERE d.id =
sd.parent_department
)
SELECT * FROM subdepartment;

This will return A, B, C, D and F.

2. The syntax

As described above, we refers to the SQL:2008's WITH RECURSIVE clause syntax.

WITH RECURSIVE clause ::=

WITH RECURSIVE <query name> AS ( <table subquery> )
[ SEARCH clause | CYCLE clause ]
<SELECT body>

In the example above, <query name> is "subdepartment" and <table
subquery> is SELECTs in pareses.

<table subquery> must have one or more "anchor" expressions. This is
required by the standard.

The anchor expressions are consisted with "none recursive term"
(SELECT * FROM department WHERE id = 'A') + UNION ALL + "recursive
term" (SELECT d.* FROM department AS d, subdepartment AS sd WHERE d.id
= sd.parent_department). <SELECT body> is "SELECT * FROM subdepartment".

Note that the standard allows to use an UNION without ALL. However
this proposal only allow UNION ALL due to an implementation reason.

Other limitations required by the standard include:

- aggregate functions are not allowed in the recursive term

- GROUP BY is not allowed in the recursive term

- outer joins are not allowed in the recursive term

3. Processing a recursive query

If a WITH clause includes a recursive referencing cycle, we call the
set of <with list elements> as "partition". In the example above,
there is a partition in which subdepartment referees to itself.  We
limit number of list elements in a partition up to 1, which means it
should be a self reference.

While processing a recursive query, we start with a partition which
does not depend on any other partitions.

There is a working table WT and an intermediate table RT to evaluate a
partition. We implement WT and RT using tuplestore.

The algorithm is shown below.

[using the width first search]

1. evaluate non recursive term or partition depending on other  partitions and assign the result to RT

2. execute recursive terms

2.1 WT := RT
2.2 while WT is not empty repeat 2.3 to 2.6. if WT is empty returns RT
2.3 replace the name of recursive term with WT
2.4 evaluate the recursive term and store into WT
2.5 append WT to RT
2.6 go back to 2.2

Pseudo code shown below.

1. RT := none recursive query result
2.  for i = 1..N (N = number of partitions)
2.1   WT := RT
2.2   while !empty(WT); do
2.3     subdepartment := WT
2.4     WT := result of recursive term
2.5     RT := WT UNION ALL RT
2.6   done

Execution trace shown below.

WITH RECURSIVE subdepartment AS
( -- non recursive term SELECT * FROM department WHERE id = 'A'
 UNION ALL
 -- recursive term referring to "subdepartment" SELECT d.* FROM department AS d, subdepartment AS sd   WHERE d.id =
sd.parent_department
)
SELECT * FROM subdepartment;

1) 
RT = {'A'}  WT = {'A'}

2) 
SELECT d.* FROM department AS d, WT({'A'}) AS sd  WHERE d.id =  sd.parent_id
 WT = {'B'}

RT = RT({'A'}) UNION ALL(*) WT({'B'}) => RT = {'A', 'B'}

3) 
SELECT d.* FROM department AS d, WT({'B', 'C'}) AS sd  WHERE d.id =  sd.parent_id
 => WT = {'C', 'D'}

RT = RT({'A', 'B'}) UNION ALL WT({'C', 'D'}) => RT = {'A', 'B', 'C', 'D'}

4) 
SELECT d.* FROM department AS d, WT({'B', 'C'}) AS sd  WHERE d.id =  sd.parent_id
 => WT = {'F'}

RT = RT({'A', 'B', 'C', 'D'}) UNION ALL WT({'F'}) => RT = {'A', 'B', 'C', 'D', 'F'}

5) 
SELECT d.* FROM department AS d, WT({'B', 'C'}) AS sd  WHERE d.id =  sd.parent_id
 => WT = {}

RT = RT({'A', 'B', 'C', 'D', 'F'}) UNION ALL WT({}) <--(1) => RT = {'A', 'B', 'C', 'D', 'F'}

6) 
Since WT is empty, the execution stops. RT = {'A', 'B', 'C', 'D', F'} (the result)

(1) Actually we do not execute UNION ALL. We use tuplestore_puttuple()   to add results to RT.

4.  About GUC parameters

We do not add new GUC parameters.

5. Limitation with PostgreSQL

1) we do not implement SEARCH clause and CYCLE clause. This is because  we need array of rows to implement them. Note
thatthere's no  support for array of rows in PostgreSQL.
 

2) we only allow UNION ALL while appending none recursive term and  recursive terms. This is because it's difficult to
remove duplications using tuplestore. Note that Firebird and MS SQL support  only UNION as well.
 
  [1] http://www.firebirdsql.org/rlsnotesh/rlsnotes210.html#rnfb210-cte  [2]
http://msdn2.microsoft.com/en-us/library/ms186243.aspx

3) no support for mutually recursive queries
  In the parser we throw an error if there's a mutually recursive query.
 rec1 -> rec2 -> rec1 -> ...
 Note that Firebird and MS SQL do not support mutually recursive queries either.

6. Problems

1) while processing recursive queries, we repeat JOIN operations many  times. JOIN methods can be nested loop, merge
join,or hash join.  Depending on the number of rows and etc., the optimal join method  may vary. However we have no way
tochange the join method  dynamically according to increasing the number of rows.
 

2) it's not clear for the planner how to estimate the cost of  recursive queries.
--
Tatsuo Ishii
SRA OSS, Inc. Japan


Re: RFP: Recursive query in 8.4

From
ITAGAKI Takahiro
Date:
Tatsuo Ishii <ishii@postgresql.org> wrote:

> 5. Limitation with PostgreSQL
> 
> 1) we do not implement SEARCH clause and CYCLE clause. This is because
>    we need array of rows to implement them. Note that there's no
>    support for array of rows in PostgreSQL.

What is difference between "array of rows" and
Arrays of composite types, that is new feature in 8.3 ?

=# CREATE TABLE foo (i integer);
CREATE TABLE
=# CREATE TABLE bar (foos foo[]);  -- *here*
CREATE TABLE

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center




Re: RFP: Recursive query in 8.4

From
Tatsuo Ishii
Date:
> Tatsuo Ishii <ishii@postgresql.org> wrote:
> 
> > 5. Limitation with PostgreSQL
> > 
> > 1) we do not implement SEARCH clause and CYCLE clause. This is because
> >    we need array of rows to implement them. Note that there's no
> >    support for array of rows in PostgreSQL.
> 
> What is difference between "array of rows" and
> Arrays of composite types, that is new feature in 8.3 ?
> 
> =# CREATE TABLE foo (i integer);
> CREATE TABLE
> =# CREATE TABLE bar (foos foo[]);  -- *here*
> CREATE TABLE

Will check. Thanks for pointing it out.
--
Tatsuo Ishii
SRA OSS, Inc. Japan


Re: RFP: Recursive query in 8.4

From
"Merlin Moncure"
Date:
On Tue, Feb 19, 2008 at 3:36 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
>  We propose to implement the recursive query (WITH RECURSIVE clause)
>  defined in SQL:1999 and later. With the recursive query, one can
>  easily inquire the data expressed as tree and graph structures.  The
>  actual syntax we prefer is the one defined in SQL:2008 (it's not
>  published yet, but I have a closest draft).

I am sure you are aware of various ad hoc approaches that are
currently possible.  The recursive clause seems to generalize these
approaches.

Do you expect that your proposed solution will have performance
advantages over solutions like using recursive functions and (for tree
organized data) arrays?

merlin


Re: RFP: Recursive query in 8.4

From
Tatsuo Ishii
Date:
> On Tue, Feb 19, 2008 at 3:36 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
> >  We propose to implement the recursive query (WITH RECURSIVE clause)
> >  defined in SQL:1999 and later. With the recursive query, one can
> >  easily inquire the data expressed as tree and graph structures.  The
> >  actual syntax we prefer is the one defined in SQL:2008 (it's not
> >  published yet, but I have a closest draft).
> 
> I am sure you are aware of various ad hoc approaches that are
> currently possible.  The recursive clause seems to generalize these
> approaches.
> 
> Do you expect that your proposed solution will have performance
> advantages over solutions like using recursive functions and (for tree
> organized data) arrays?

I hope so. But the first thing I would like to do is, to implement the
right thing (i.e. following the standard).

I don't see any reason that the proposal gets less performance than
existing functions.  Moreover the proposal could better cooperate with
the optimizer since it can feed more info to it. Any ideas to enhance
the performance are welcome.
--
Tatsuo Ishii
SRA OSS, Inc. Japan


Re: RFP: Recursive query in 8.4

From
Gregory Stark
Date:
[This message is mostly for the benefit of the list -- he and I already talked
a bit about this here at FOSDEM. Ishii-san, if you have a chance we should sit
down and talk about this in more detail before we leave!]


Tatsuo Ishii wrote:
>> On Tue, Feb 19, 2008 at 3:36 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
>>     
> I hope so. But the first thing I would like to do is, to implement the
> right thing (i.e. following the standard).
>
> I don't see any reason that the proposal gets less performance than
> existing functions.  Moreover the proposal could better cooperate with
> the optimizer since it can feed more info to it. Any ideas to enhance
> the performance are welcome.

I agree about following the standard but I think it's true that the standard
creates some challenges for the optimizer.

The standard recursive query syntax is quite general. It can represent
arbitrary non-linear recursive queries including possibly mutually recursive
queries, for example. The challenge is that there are no extra hints when you
have the more usual case of a simple linear recursion.

You really do want to discover such linear recursive structures because you
can use simpler algorithms and recover memory sooner if you know you have a
linear recursive query. You can also support the SEARCH and CYCLE clauses to
do depth-first searches which you can't do for arbitrary recursive queries. I
also don't have much hope for good optimizer estimates for general recursive
queries but for linear recursive queries we can probably do better.

But I think (surprisingly) it's actually easier to implement the general case
than the special nodes to handle the linear case more efficiently. To handle
the general case we need the memoize node to handle recursive loops in the
plan and then we can use otherwise normal plan nodes.

My plan was to implement the general case first, then look for ways to add
intelligence in the planner to discover linearity and add new paths to take
advantage of it.

-- 
greg



Re: RFP: Recursive query in 8.4

From
"Greg Stark"
Date:
Tatsuo Ishii wrote:
>> On Tue, Feb 19, 2008 at 3:36 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
>>     
> I hope so. But the first thing I would like to do is, to implement the
> right thing (i.e. following the standard).
>
> I don't see any reason that the proposal gets less performance than
> existing functions.  Moreover the proposal could better cooperate with
> the optimizer since it can feed more info to it. Any ideas to enhance
> the performance are welcome.
>   

I agree about following the standard but I think it's true that the 
standard creates some challenges for the optimizer.

The standard recursive query syntax is quite general. It can represent 
arbitrary non-linear recursive queries including possibly mutually 
recursive queries, for example. The challenge is that there are no extra 
hints when you have the more usual case of a simple linear recursion.

You really do want to discover such linear recursive structures because 
you can use simpler algorithms and recover memory sooner if you know you 
have a linear recursive query. You can also support the SEARCH and CYCLE 
clauses to do depth-first searches which you can't do for arbitrary 
recursive queries. I also don't have much hope for good optimizer 
estimates for general recursive queries but for linear recursive queries 
we can probably do better.

But I think it's actually easier to implement the general case than the 
special nodes to handle the linear case more efficiently. To handle the 
general case we need the memoize node to handle recursive loops in the 
plan and then we can use otherwise normal plan nodes.

My plan was to implement the general case first, then look for ways to 
add intelligence in the planner to discover linearity and add new paths 
to take advantage of it.



Re: RFP: Recursive query in 8.4

From
Tatsuo Ishii
Date:
> Tatsuo Ishii wrote:
> >> On Tue, Feb 19, 2008 at 3:36 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
> >>     
> > I hope so. But the first thing I would like to do is, to implement the
> > right thing (i.e. following the standard).
> >
> > I don't see any reason that the proposal gets less performance than
> > existing functions.  Moreover the proposal could better cooperate with
> > the optimizer since it can feed more info to it. Any ideas to enhance
> > the performance are welcome.
> >   
> 
> I agree about following the standard but I think it's true that the 
> standard creates some challenges for the optimizer.
> 
> The standard recursive query syntax is quite general. It can represent 
> arbitrary non-linear recursive queries including possibly mutually 
> recursive queries, for example. The challenge is that there are no extra 
> hints when you have the more usual case of a simple linear recursion.

I seems the standard does not allow non-linear recursive queries. In
the SQL:2008 draft pp.380:

7.13 <query expression>

Syntax Rules 2) g) iv)

"If WLE_i is recursive, then WLE_i shall be linearly recursive."

So now the problem is, how to detect the non-linear recursive queries
and reject them.

> You really do want to discover such linear recursive structures because 
> you can use simpler algorithms and recover memory sooner if you know you 
> have a linear recursive query. You can also support the SEARCH and CYCLE 
> clauses to do depth-first searches which you can't do for arbitrary 
> recursive queries. I also don't have much hope for good optimizer 
> estimates for general recursive queries but for linear recursive queries 
> we can probably do better.
>
> But I think it's actually easier to implement the general case than the 
> special nodes to handle the linear case more efficiently. To handle the 
> general case we need the memoize node to handle recursive loops in the 
> plan and then we can use otherwise normal plan nodes.
> 
> My plan was to implement the general case first, then look for ways to 
> add intelligence in the planner to discover linearity and add new paths 
> to take advantage of it.
> 


Re: RFP: Recursive query in 8.4

From
Tatsuo Ishii
Date:
> > Tatsuo Ishii <ishii@postgresql.org> wrote:
> > 
> > > 5. Limitation with PostgreSQL
> > > 
> > > 1) we do not implement SEARCH clause and CYCLE clause. This is because
> > >    we need array of rows to implement them. Note that there's no
> > >    support for array of rows in PostgreSQL.
> > 
> > What is difference between "array of rows" and
> > Arrays of composite types, that is new feature in 8.3 ?
> > 
> > =# CREATE TABLE foo (i integer);
> > CREATE TABLE
> > =# CREATE TABLE bar (foos foo[]);  -- *here*
> > CREATE TABLE
> 
> Will check. Thanks for pointing it out.

Yes, I agree with that the 8.3's new feature might be used to
implement SEARCH clause and CYCLE clause. Problem is, it requres that
the named composite type (the standard's term is "row type") is
previously defined.  Maybe we need "anonymous" row type?
--
Tatsuo Ishii
SRA OSS, Inc. Japan