Home > mailing lists

Thread: WITH RECURSIVE doesn't work properly for me

WITH RECURSIVE doesn't work properly for me

From

Jing Fan

Date:

05 November 2013, 03:36:17

I use following command to get a shortest-path query:

with recursive paths( src_id, dest_id, dist) as(

select n1,n2,1

from nodes

union

select src_id, dest_id, min(dist)

from ( select paths.src_id as src_id, nodes.n2 as dest_id, paths.dist+1 as dist

from paths, nodes

where paths.dest_id=nodes.n1

and paths.src_id<>nodes.n2

) as Temp

group by src_id, dest_id

)

select paths.src_id, paths.dest_id, min(dist)

from paths

group by 1,2;

It seems that this query goes into infinite loops and finally run out of disk space. However, I testrf every iteration seperately and found that it will converge after 3-4 iterations. I wonder where is the problem. Could anyone help with it? The attatchment is the test data.

Thank you very much:)

Attachment

data.txt

Re: WITH RECURSIVE doesn't work properly for me

From

Albe Laurenz

Date:

05 November 2013, 14:05:01

Jing Fan wrote:
> I use following command to get a shortest-path query:
> 
> with recursive paths( src_id, dest_id, dist) as(
>         select n1,n2,1
>         from nodes
>         union
>         select src_id, dest_id, min(dist)
>         from (  select paths.src_id as src_id, nodes.n2 as dest_id, paths.dist+1 as dist
>             from paths, nodes
>             where paths.dest_id=nodes.n1
>             and paths.src_id<>nodes.n2
>             ) as Temp
>         group by src_id, dest_id
>         )
> select paths.src_id, paths.dest_id, min(dist)
>     from paths
>     group by 1,2;
> 
> It seems that this query goes into infinite loops and finally run out of disk space. However, I testrf
> every iteration seperately and found that it will converge after 3-4 iterations. I wonder where is the
> problem. Could anyone help with it? The attatchment is the test data.

The attached test data suggest different table and column names,
but I assume that you mean "edge" when you write "nodes" and
that columns "n1" and "n2" are really "src_id" and "dest_id".

The endless loop occurs because there are loops in your
directed graph, but you only exclude circles where beginning
is equal to end.

To quote three lines from your attachment:
INSERT INTO edge (src_id, dest_id) VALUES (1739, 6236);
INSERT INTO edge (src_id, dest_id) VALUES (6236, 1739);
INSERT INTO edge (src_id, dest_id) VALUES (3384, 6236);

Your recursive WITH clause (CTE) will now happily produce:
 src_id | dest_id | dist
--------+---------+------
   3384 |    6236 |    1
   3384 |    1739 |    2
   3384 |    6236 |    3
   3384 |    1739 |    4
   3384 |    6236 |    5
   3384 |    1739 |    6
   3384 |    6236 |    7
   3384 |    1739 |    8
   3384 |    6236 |    9
   3384 |    1739 |   10
   3384 |    6236 |   11
and so on to infinity, which is why you will eventually run
out of space.

The grouping (and any other processing in your main query)
takes place only after the CTE has been calculated, so while
your query would in theory return the desired result, it does
so only after calculating infinitely many intermediate rows.

One solution I can envision is to put a limit on the distance,
for example the total count of nodes.

Yours,
Laurenz Albe

Re: WITH RECURSIVE doesn't work properly for me

From

Jing Fan

Date:

05 November 2013, 17:38:02

I have two group operations.

One is inside the CTE ( union

select src_id, dest_id, min(dist) ),

another is outside the CTE.

Do you mean that even the grouping inside the CTE will be calculated only after the CTE has been calculated?

Thank you very much:)

Best,

Jing

On Tue, Nov 5, 2013 at 5:04 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

Jing Fan wrote:
> I use following command to get a shortest-path query:
>
> with recursive paths( src_id, dest_id, dist) as(
> select n1,n2,1
> from nodes
> union
> select src_id, dest_id, min(dist)
> from ( select paths.src_id as src_id, nodes.n2 as dest_id, paths.dist+1 as dist
> from paths, nodes
> where paths.dest_id=nodes.n1
> and paths.src_id<>nodes.n2
> ) as Temp
> group by src_id, dest_id
> )
> select paths.src_id, paths.dest_id, min(dist)
> from paths
> group by 1,2;
>
> It seems that this query goes into infinite loops and finally run out of disk space. However, I testrf
> every iteration seperately and found that it will converge after 3-4 iterations. I wonder where is the
> problem. Could anyone help with it? The attatchment is the test data.

The attached test data suggest different table and column names,
but I assume that you mean "edge" when you write "nodes" and
that columns "n1" and "n2" are really "src_id" and "dest_id".

The endless loop occurs because there are loops in your
directed graph, but you only exclude circles where beginning
is equal to end.

To quote three lines from your attachment:
INSERT INTO edge (src_id, dest_id) VALUES (1739, 6236);
INSERT INTO edge (src_id, dest_id) VALUES (6236, 1739);
INSERT INTO edge (src_id, dest_id) VALUES (3384, 6236);

Your recursive WITH clause (CTE) will now happily produce:
src_id | dest_id | dist
--------+---------+------
3384 | 6236 | 1
3384 | 1739 | 2
3384 | 6236 | 3
3384 | 1739 | 4
3384 | 6236 | 5
3384 | 1739 | 6
3384 | 6236 | 7
3384 | 1739 | 8
3384 | 6236 | 9
3384 | 1739 | 10
3384 | 6236 | 11
and so on to infinity, which is why you will eventually run
out of space.

The grouping (and any other processing in your main query)
takes place only after the CTE has been calculated, so while
your query would in theory return the desired result, it does
so only after calculating infinitely many intermediate rows.

One solution I can envision is to put a limit on the distance,
for example the total count of nodes.

Yours,
Laurenz Albe

Re: WITH RECURSIVE doesn't work properly for me

From

Albe Laurenz

Date:

05 November 2013, 17:52:57

Jing Fan wrote:
> I have two group operations.
> 
> One is inside the CTE (   union
>                                    select src_id, dest_id, min(dist) ),
> another is outside the CTE.
> Do you mean that even the grouping inside the CTE will be calculated only after the CTE has been
> calculated?

I mean the one outside the CTE.

The one inside does not do anything at all, you could omit it.

Yours,
Laurenz Albe

Re: WITH RECURSIVE doesn't work properly for me

From

Jing Fan

Date:

05 November 2013, 18:07:46

Why the one inside does not do anything? It won't be executed?

Best,

Jing

On Tue, Nov 5, 2013 at 8:52 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

Jing Fan wrote:
> I have two group operations.
>
> One is inside the CTE ( union
> select src_id, dest_id, min(dist) ),
> another is outside the CTE.
> Do you mean that even the grouping inside the CTE will be calculated only after the CTE has been
> calculated?

I mean the one outside the CTE.

The one inside does not do anything at all, you could omit it.

Yours,
Laurenz Albe

Re: WITH RECURSIVE doesn't work properly for me

From

Albe Laurenz

Date:

05 November 2013, 18:29:10

Jing Fan wrote:
> Why the one inside does not do anything? It won't be executed?

It is executed.

It might filter out the occasional row, but if you look at
the example I gave you, you'll see that it won't do anything
to keep it from recursing.

Yours,
Laurenz Albe

Re: WITH RECURSIVE doesn't work properly for me

From

Jing Fan

Date:

05 November 2013, 21:03:51

If the grouping inside CTE is executed, I don't think it would generate result like

src_id | dest_id | dist
--------+---------+------
3384 | 6236 | 1
3384 | 1739 | 2
3384 | 6236 | 3
3384 | 1739 | 4
3384 | 6236 | 5
3384 | 1739 | 6
3384 | 6236 | 7
3384 | 1739 | 8
3384 | 6236 | 9
3384 | 1739 | 10
3384 | 6236 | 11

for we have min(dist),

so it should be like

src_id | dest_id | dist
--------+---------+------
3384 | 6236 | 1
3384 | 1739 | 2

other values will be eliminated by min(). It actually generate no new tuples and the iteration should stop.

Best,

Jing

On Tue, Nov 5, 2013 at 9:28 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

Jing Fan wrote:
> Why the one inside does not do anything? It won't be executed?

It is executed.

It might filter out the occasional row, but if you look at
the example I gave you, you'll see that it won't do anything
to keep it from recursing.

Yours,
Laurenz Albe

Re: WITH RECURSIVE doesn't work properly for me

From

Albe Laurenz

Date:

06 November 2013, 11:13:34

Jing Fan wrote:
> If the grouping inside CTE is executed, I don't think it would generate result like
> 
> src_id | dest_id | dist
> --------+---------+------
>    3384 |    6236 |    1
>    3384 |    1739 |    2
>    3384 |    6236 |    3
>    3384 |    1739 |    4
>    3384 |    6236 |    5
>    3384 |    1739 |    6
>    3384 |    6236 |    7
>    3384 |    1739 |    8
>    3384 |    6236 |    9
>    3384 |    1739 |   10
>    3384 |    6236 |   11
> 
> 
> 
> for we have min(dist),
> so it should be like
> 
> 
> src_id | dest_id | dist
> --------+---------+------
>    3384 |    6236 |    1
>    3384 |    1739 |    2
> 
> 
> 
> other values will be eliminated by min(). It actually generate no new tuples and the iteration should
> stop.

You forget that the grouping query only spans the second branch
of the UNION, where you add the new entries.
So the new entries and the old entries won't be grouped together,
and the new paths that are longer than the old ones won't get removed.

Unfortunately you cannot have the UNION in a subquery for
recursive CTEs, but you could use arrays to achieve what you want:

WITH RECURSIVE paths (path) AS (
      SELECT ARRAY[src_id, dest_id] FROM edge
   UNION ALL
      SELECT edge.src_id || paths.path
      FROM paths, edge
      WHERE edge.dest_id = paths.path[array_lower(paths.path, 1)]
        AND edge.src_id <> ALL (paths.path)
)
SELECT path[1], path[array_upper(path, 1)], min(array_length(path, 1))
FROM paths
GROUP BY 1, 2;

The whole exercise sounds a bit like homework to me.

Yours,
Laurenz Albe

Re: WITH RECURSIVE doesn't work properly for me

From

Jing Fan

Date:

06 November 2013, 16:56:46

I am sorry but I still don't understand why it doesn't work. Possibly I misunderstand how with recursive works?

In my opinion,

with recursive table as{

seed statement

union

recursive statement

}

In every iteration, It will just generate results from seed statement union recursive statement and put them into a new temporary table, and then compare the results with the former temporary table and check if there are any new tuples. If no new tuples, just stop iteration. Is there any tricky things about recursive statement?

Thank you very much:)

Best,

Jing

On Wed, Nov 6, 2013 at 2:13 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

Jing Fan wrote:
> If the grouping inside CTE is executed, I don't think it would generate result like
>
> src_id | dest_id | dist
> --------+---------+------
> 3384 | 6236 | 1
> 3384 | 1739 | 2
> 3384 | 6236 | 3
> 3384 | 1739 | 4
> 3384 | 6236 | 5
> 3384 | 1739 | 6
> 3384 | 6236 | 7
> 3384 | 1739 | 8
> 3384 | 6236 | 9
> 3384 | 1739 | 10
> 3384 | 6236 | 11
>
>
>
> for we have min(dist),
> so it should be like
>
>
> src_id | dest_id | dist
> --------+---------+------
> 3384 | 6236 | 1
> 3384 | 1739 | 2
>
>
>
> other values will be eliminated by min(). It actually generate no new tuples and the iteration should
> stop.

You forget that the grouping query only spans the second branch
of the UNION, where you add the new entries.
So the new entries and the old entries won't be grouped together,
and the new paths that are longer than the old ones won't get removed.

Unfortunately you cannot have the UNION in a subquery for
recursive CTEs, but you could use arrays to achieve what you want:

WITH RECURSIVE paths (path) AS (
SELECT ARRAY[src_id, dest_id] FROM edge
UNION ALL
SELECT edge.src_id || paths.path
FROM paths, edge
WHERE edge.dest_id = paths.path[array_lower(paths.path, 1)]
AND edge.src_id <> ALL (paths.path)
)
SELECT path[1], path[array_upper(path, 1)], min(array_length(path, 1))
FROM paths
GROUP BY 1, 2;

The whole exercise sounds a bit like homework to me.

Yours,
Laurenz Albe

Re: WITH RECURSIVE doesn't work properly for me

From

Albe Laurenz

Date:

06 November 2013, 17:10:34

Jing Fan wrote:
> I am sorry but I still don't understand why it doesn't work. Possibly I misunderstand how with
> recursive works?
> In my opinion,
> with recursive table as{
>     seed statement
>     union
>     recursive statement
> }
> In every iteration, It will just generate results from seed statement union recursive statement and
> put them into a new temporary table, and then compare the results with the former temporary table and
> check if there are any new tuples. If no new tuples, just stop iteration. Is there any tricky things
> about recursive statement?

That is correct.

Let's assume that we have three nodes A, B and C.
Also, A points to B, B points to C and C points to B.

Let's assume that we already generated (A, B, 1) and (A, C, 2)
in previous iterations.

Then the "recursive statement" will generate the new
rows (A, C, 2) and (A, B, 3).
The SELECT ... GROUP BY only surrounds the recursive statement,
So the result will still be (A, C, 2) and (A, B, 3).

Then the UNION will take care of the first triple, but the second
one will be added in this iteration.

And so on ad infinitum.

Yours,
Laurenz Albe

Re: WITH RECURSIVE doesn't work properly for me

From

Jing Fan

Date:

06 November 2013, 17:23:14

But after this iteration, the paths will be:

A B 1

B C 1

C B 1

A C 2

A B 3

in next iteration, the recursive statement will generate (A,C,2), (A,B,3), and (A,C,4), after the group by, it will still be (A,C,2) and (A,B,3)

so I think it should stop after this iteration.

On Wed, Nov 6, 2013 at 8:10 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

Jing Fan wrote:
> I am sorry but I still don't understand why it doesn't work. Possibly I misunderstand how with
> recursive works?
> In my opinion,
> with recursive table as{
> seed statement
> union
> recursive statement
> }
> In every iteration, It will just generate results from seed statement union recursive statement and
> put them into a new temporary table, and then compare the results with the former temporary table and
> check if there are any new tuples. If no new tuples, just stop iteration. Is there any tricky things
> about recursive statement?

That is correct.

Let's assume that we have three nodes A, B and C.
Also, A points to B, B points to C and C points to B.

Let's assume that we already generated (A, B, 1) and (A, C, 2)
in previous iterations.

Then the "recursive statement" will generate the new
rows (A, C, 2) and (A, B, 3).
The SELECT ... GROUP BY only surrounds the recursive statement,
So the result will still be (A, C, 2) and (A, B, 3).

Then the UNION will take care of the first triple, but the second
one will be added in this iteration.

And so on ad infinitum.

Yours,
Laurenz Albe

Re: WITH RECURSIVE doesn't work properly for me

From

Albe Laurenz

Date:

06 November 2013, 17:40:29

Jing Fan wrote:
> On Wed, Nov 6, 2013 at 8:10 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
>> Let's assume that we have three nodes A, B and C.
>> Also, A points to B, B points to C and C points to B.
>> 
>> Let's assume that we already generated (A, B, 1) and (A, C, 2)
>> in previous iterations.
>> 
>> Then the "recursive statement" will generate the new
>> rows (A, C, 2) and (A, B, 3).
>> The SELECT ... GROUP BY only surrounds the recursive statement,
>> So the result will still be (A, C, 2) and (A, B, 3).
>> 
>> Then the UNION will take care of the first triple, but the second
>> one will be added in this iteration.
>> 
>> And so on ad infinitum.

> But after this iteration, the paths will be:
> A B 1
> B C 1
> C B 1
> A C 2
> A B 3
> 
> in next iteration, the recursive statement will generate (A,C,2), (A,B,3), and (A,C,4), after the
> group by, it will still be (A,C,2) and (A,B,3)
> so I think it should stop after this iteration.

I see, I didn't notice that.

Actually there is a mistake in my explanation above, see
http://www.postgresql.org/docs/9.3/static/queries-with.html#QUERIES-WITH-SELECT
"Recursive Query Evaluation" for a detailed explanation:

In step 2b, the "working table" is replaced with the "intermediate table",
so the next iteration does not see all previously generated rows,
but only the ones that were generated in the previous iteration.

So in our case, the working table will look like this:

Initially:
A B 1
B C 1
C B 1

After the first iteration:
A C 2

After the third iteration:
A B 3

After the fourth iteration:
A C 4

... and so on.

Your GROUP BY assumes that the working table contains
all previously generated rows.

Yours,
Laurenz Albe

Re: WITH RECURSIVE doesn't work properly for me

From

Jing Fan

Date:

06 November 2013, 17:53:20

Yeah, that can explain why it doesn't work.

Thank you very much:)

On Wed, Nov 6, 2013 at 8:40 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

Jing Fan wrote:
> On Wed, Nov 6, 2013 at 8:10 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
>> Let's assume that we have three nodes A, B and C.
>> Also, A points to B, B points to C and C points to B.
>>
>> Let's assume that we already generated (A, B, 1) and (A, C, 2)
>> in previous iterations.
>>
>> Then the "recursive statement" will generate the new
>> rows (A, C, 2) and (A, B, 3).
>> The SELECT ... GROUP BY only surrounds the recursive statement,
>> So the result will still be (A, C, 2) and (A, B, 3).
>>
>> Then the UNION will take care of the first triple, but the second
>> one will be added in this iteration.
>>
>> And so on ad infinitum.

> But after this iteration, the paths will be:
> A B 1
> B C 1
> C B 1
> A C 2
> A B 3
>
> in next iteration, the recursive statement will generate (A,C,2), (A,B,3), and (A,C,4), after the
> group by, it will still be (A,C,2) and (A,B,3)
> so I think it should stop after this iteration.

I see, I didn't notice that.

Actually there is a mistake in my explanation above, see
http://www.postgresql.org/docs/9.3/static/queries-with.html#QUERIES-WITH-SELECT
"Recursive Query Evaluation" for a detailed explanation:

In step 2b, the "working table" is replaced with the "intermediate table",
so the next iteration does not see all previously generated rows,
but only the ones that were generated in the previous iteration.

So in our case, the working table will look like this:

Initially:
A B 1
B C 1
C B 1

After the first iteration:
A C 2

After the third iteration:
A B 3

After the fourth iteration:
A C 4

... and so on.

Your GROUP BY assumes that the working table contains
all previously generated rows.

Yours,
Laurenz Albe