Thread: WITH RECURSIVE doesn't work properly for me

WITH RECURSIVE doesn't work properly for me

From
Jing Fan
Date:
I use following command to get a shortest-path query:

with recursive paths( src_id, dest_id, dist) as(
        select n1,n2,1
        from nodes
        union
        select src_id, dest_id, min(dist) 
        from (  select paths.src_id as src_id, nodes.n2 as dest_id, paths.dist+1 as dist
            from paths, nodes
            where paths.dest_id=nodes.n1
            and paths.src_id<>nodes.n2
            ) as Temp
        group by src_id, dest_id
        )
select paths.src_id, paths.dest_id, min(dist)
    from paths
    group by 1,2;

It seems that this query goes into infinite loops and finally run out of disk space. However, I testrf every iteration seperately and found that it will converge after 3-4 iterations. I wonder where is the problem. Could anyone help with it? The attatchment is the test data.

Thank you very much:)
Attachment

Re: WITH RECURSIVE doesn't work properly for me

From
Albe Laurenz
Date:
Jing Fan wrote:
> I use following command to get a shortest-path query:
> 
> with recursive paths( src_id, dest_id, dist) as(
>         select n1,n2,1
>         from nodes
>         union
>         select src_id, dest_id, min(dist)
>         from (  select paths.src_id as src_id, nodes.n2 as dest_id, paths.dist+1 as dist
>             from paths, nodes
>             where paths.dest_id=nodes.n1
>             and paths.src_id<>nodes.n2
>             ) as Temp
>         group by src_id, dest_id
>         )
> select paths.src_id, paths.dest_id, min(dist)
>     from paths
>     group by 1,2;
> 
> It seems that this query goes into infinite loops and finally run out of disk space. However, I testrf
> every iteration seperately and found that it will converge after 3-4 iterations. I wonder where is the
> problem. Could anyone help with it? The attatchment is the test data.

The attached test data suggest different table and column names,
but I assume that you mean "edge" when you write "nodes" and
that columns "n1" and "n2" are really "src_id" and "dest_id".

The endless loop occurs because there are loops in your
directed graph, but you only exclude circles where beginning
is equal to end.

To quote three lines from your attachment:
INSERT INTO edge (src_id, dest_id) VALUES (1739, 6236);
INSERT INTO edge (src_id, dest_id) VALUES (6236, 1739);
INSERT INTO edge (src_id, dest_id) VALUES (3384, 6236);

Your recursive WITH clause (CTE) will now happily produce:
 src_id | dest_id | dist
--------+---------+------
   3384 |    6236 |    1
   3384 |    1739 |    2
   3384 |    6236 |    3
   3384 |    1739 |    4
   3384 |    6236 |    5
   3384 |    1739 |    6
   3384 |    6236 |    7
   3384 |    1739 |    8
   3384 |    6236 |    9
   3384 |    1739 |   10
   3384 |    6236 |   11
and so on to infinity, which is why you will eventually run
out of space.

The grouping (and any other processing in your main query)
takes place only after the CTE has been calculated, so while
your query would in theory return the desired result, it does
so only after calculating infinitely many intermediate rows.

One solution I can envision is to put a limit on the distance,
for example the total count of nodes.

Yours,
Laurenz Albe

Re: WITH RECURSIVE doesn't work properly for me

From
Jing Fan
Date:
I have two group operations. 
One is inside the CTE (   union
                                   select src_id, dest_id, min(dist) ), 
another is outside the CTE.
Do you mean that even the grouping inside the CTE will be calculated only after the CTE has been calculated?

Thank you very much:)

Best,
Jing




On Tue, Nov 5, 2013 at 5:04 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Jing Fan wrote:
> I use following command to get a shortest-path query:
>
> with recursive paths( src_id, dest_id, dist) as(
>         select n1,n2,1
>         from nodes
>         union
>         select src_id, dest_id, min(dist)
>         from (  select paths.src_id as src_id, nodes.n2 as dest_id, paths.dist+1 as dist
>             from paths, nodes
>             where paths.dest_id=nodes.n1
>             and paths.src_id<>nodes.n2
>             ) as Temp
>         group by src_id, dest_id
>         )
> select paths.src_id, paths.dest_id, min(dist)
>     from paths
>     group by 1,2;
>
> It seems that this query goes into infinite loops and finally run out of disk space. However, I testrf
> every iteration seperately and found that it will converge after 3-4 iterations. I wonder where is the
> problem. Could anyone help with it? The attatchment is the test data.

The attached test data suggest different table and column names,
but I assume that you mean "edge" when you write "nodes" and
that columns "n1" and "n2" are really "src_id" and "dest_id".

The endless loop occurs because there are loops in your
directed graph, but you only exclude circles where beginning
is equal to end.

To quote three lines from your attachment:
INSERT INTO edge (src_id, dest_id) VALUES (1739, 6236);
INSERT INTO edge (src_id, dest_id) VALUES (6236, 1739);
INSERT INTO edge (src_id, dest_id) VALUES (3384, 6236);

Your recursive WITH clause (CTE) will now happily produce:
 src_id | dest_id | dist
--------+---------+------
   3384 |    6236 |    1
   3384 |    1739 |    2
   3384 |    6236 |    3
   3384 |    1739 |    4
   3384 |    6236 |    5
   3384 |    1739 |    6
   3384 |    6236 |    7
   3384 |    1739 |    8
   3384 |    6236 |    9
   3384 |    1739 |   10
   3384 |    6236 |   11
and so on to infinity, which is why you will eventually run
out of space.

The grouping (and any other processing in your main query)
takes place only after the CTE has been calculated, so while
your query would in theory return the desired result, it does
so only after calculating infinitely many intermediate rows.

One solution I can envision is to put a limit on the distance,
for example the total count of nodes.

Yours,
Laurenz Albe

Re: WITH RECURSIVE doesn't work properly for me

From
Albe Laurenz
Date:
Jing Fan wrote:
> I have two group operations.
> 
> One is inside the CTE (   union
>                                    select src_id, dest_id, min(dist) ),
> another is outside the CTE.
> Do you mean that even the grouping inside the CTE will be calculated only after the CTE has been
> calculated?

I mean the one outside the CTE.

The one inside does not do anything at all, you could omit it.

Yours,
Laurenz Albe

Re: WITH RECURSIVE doesn't work properly for me

From
Jing Fan
Date:
Why the one inside does not do anything? It won't be executed?

Best,
Jing


On Tue, Nov 5, 2013 at 8:52 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Jing Fan wrote:
> I have two group operations.
>
> One is inside the CTE (   union
>                                    select src_id, dest_id, min(dist) ),
> another is outside the CTE.
> Do you mean that even the grouping inside the CTE will be calculated only after the CTE has been
> calculated?

I mean the one outside the CTE.

The one inside does not do anything at all, you could omit it.

Yours,
Laurenz Albe

Re: WITH RECURSIVE doesn't work properly for me

From
Albe Laurenz
Date:
Jing Fan wrote:
> Why the one inside does not do anything? It won't be executed?

It is executed.

It might filter out the occasional row, but if you look at
the example I gave you, you'll see that it won't do anything
to keep it from recursing.

Yours,
Laurenz Albe

Re: WITH RECURSIVE doesn't work properly for me

From
Jing Fan
Date:
If the grouping inside CTE is executed, I don't think it would generate result like

src_id | dest_id | dist
--------+---------+------
   3384 |    6236 |    1
   3384 |    1739 |    2
   3384 |    6236 |    3
   3384 |    1739 |    4
   3384 |    6236 |    5
   3384 |    1739 |    6
   3384 |    6236 |    7
   3384 |    1739 |    8
   3384 |    6236 |    9
   3384 |    1739 |   10
   3384 |    6236 |   11

for we have min(dist),
so it should be like

src_id | dest_id | dist
--------+---------+------
   3384 |    6236 |    1
   3384 |    1739 |    2

other values will be eliminated by min(). It actually generate no new tuples and the iteration should stop.

Best,
Jing




On Tue, Nov 5, 2013 at 9:28 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Jing Fan wrote:
> Why the one inside does not do anything? It won't be executed?

It is executed.

It might filter out the occasional row, but if you look at
the example I gave you, you'll see that it won't do anything
to keep it from recursing.

Yours,
Laurenz Albe

Re: WITH RECURSIVE doesn't work properly for me

From
Albe Laurenz
Date:
Jing Fan wrote:
> If the grouping inside CTE is executed, I don't think it would generate result like
> 
> src_id | dest_id | dist
> --------+---------+------
>    3384 |    6236 |    1
>    3384 |    1739 |    2
>    3384 |    6236 |    3
>    3384 |    1739 |    4
>    3384 |    6236 |    5
>    3384 |    1739 |    6
>    3384 |    6236 |    7
>    3384 |    1739 |    8
>    3384 |    6236 |    9
>    3384 |    1739 |   10
>    3384 |    6236 |   11
> 
> 
> 
> for we have min(dist),
> so it should be like
> 
> 
> src_id | dest_id | dist
> --------+---------+------
>    3384 |    6236 |    1
>    3384 |    1739 |    2
> 
> 
> 
> other values will be eliminated by min(). It actually generate no new tuples and the iteration should
> stop.

You forget that the grouping query only spans the second branch
of the UNION, where you add the new entries.
So the new entries and the old entries won't be grouped together,
and the new paths that are longer than the old ones won't get removed.

Unfortunately you cannot have the UNION in a subquery for
recursive CTEs, but you could use arrays to achieve what you want:

WITH RECURSIVE paths (path) AS (
      SELECT ARRAY[src_id, dest_id] FROM edge
   UNION ALL
      SELECT edge.src_id || paths.path
      FROM paths, edge
      WHERE edge.dest_id = paths.path[array_lower(paths.path, 1)]
        AND edge.src_id <> ALL (paths.path)
)
SELECT path[1], path[array_upper(path, 1)], min(array_length(path, 1))
FROM paths
GROUP BY 1, 2;

The whole exercise sounds a bit like homework to me.

Yours,
Laurenz Albe

Re: WITH RECURSIVE doesn't work properly for me

From
Jing Fan
Date:
I am sorry but I still don't understand why it doesn't work. Possibly I misunderstand how with recursive works?
In my opinion,  
with recursive table as{
    seed statement
    union
    recursive statement
}
In every iteration, It will just generate results from seed statement union recursive statement and put them into a new temporary table, and then compare the results with the former temporary table and check if there are any new tuples. If no new tuples, just stop iteration. Is there any tricky things about recursive statement?

Thank you very much:)

Best,
Jing



On Wed, Nov 6, 2013 at 2:13 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Jing Fan wrote:
> If the grouping inside CTE is executed, I don't think it would generate result like
>
> src_id | dest_id | dist
> --------+---------+------
>    3384 |    6236 |    1
>    3384 |    1739 |    2
>    3384 |    6236 |    3
>    3384 |    1739 |    4
>    3384 |    6236 |    5
>    3384 |    1739 |    6
>    3384 |    6236 |    7
>    3384 |    1739 |    8
>    3384 |    6236 |    9
>    3384 |    1739 |   10
>    3384 |    6236 |   11
>
>
>
> for we have min(dist),
> so it should be like
>
>
> src_id | dest_id | dist
> --------+---------+------
>    3384 |    6236 |    1
>    3384 |    1739 |    2
>
>
>
> other values will be eliminated by min(). It actually generate no new tuples and the iteration should
> stop.

You forget that the grouping query only spans the second branch
of the UNION, where you add the new entries.
So the new entries and the old entries won't be grouped together,
and the new paths that are longer than the old ones won't get removed.

Unfortunately you cannot have the UNION in a subquery for
recursive CTEs, but you could use arrays to achieve what you want:

WITH RECURSIVE paths (path) AS (
      SELECT ARRAY[src_id, dest_id] FROM edge
   UNION ALL
      SELECT edge.src_id || paths.path
      FROM paths, edge
      WHERE edge.dest_id = paths.path[array_lower(paths.path, 1)]
        AND edge.src_id <> ALL (paths.path)
)
SELECT path[1], path[array_upper(path, 1)], min(array_length(path, 1))
FROM paths
GROUP BY 1, 2;

The whole exercise sounds a bit like homework to me.

Yours,
Laurenz Albe

Re: WITH RECURSIVE doesn't work properly for me

From
Albe Laurenz
Date:
Jing Fan wrote:
> I am sorry but I still don't understand why it doesn't work. Possibly I misunderstand how with
> recursive works?
> In my opinion,
> with recursive table as{
>     seed statement
>     union
>     recursive statement
> }
> In every iteration, It will just generate results from seed statement union recursive statement and
> put them into a new temporary table, and then compare the results with the former temporary table and
> check if there are any new tuples. If no new tuples, just stop iteration. Is there any tricky things
> about recursive statement?

That is correct.

Let's assume that we have three nodes A, B and C.
Also, A points to B, B points to C and C points to B.

Let's assume that we already generated (A, B, 1) and (A, C, 2)
in previous iterations.

Then the "recursive statement" will generate the new
rows (A, C, 2) and (A, B, 3).
The SELECT ... GROUP BY only surrounds the recursive statement,
So the result will still be (A, C, 2) and (A, B, 3).

Then the UNION will take care of the first triple, but the second
one will be added in this iteration.

And so on ad infinitum.

Yours,
Laurenz Albe

Re: WITH RECURSIVE doesn't work properly for me

From
Jing Fan
Date:
But after this iteration, the paths will be:
A B 1
B C 1
C B 1
A C 2
A B 3

in next iteration, the recursive statement will generate (A,C,2), (A,B,3), and (A,C,4), after the group by, it will still be (A,C,2) and (A,B,3)
so I think it should stop after this iteration.


On Wed, Nov 6, 2013 at 8:10 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Jing Fan wrote:
> I am sorry but I still don't understand why it doesn't work. Possibly I misunderstand how with
> recursive works?
> In my opinion,
> with recursive table as{
>     seed statement
>     union
>     recursive statement
> }
> In every iteration, It will just generate results from seed statement union recursive statement and
> put them into a new temporary table, and then compare the results with the former temporary table and
> check if there are any new tuples. If no new tuples, just stop iteration. Is there any tricky things
> about recursive statement?

That is correct.

Let's assume that we have three nodes A, B and C.
Also, A points to B, B points to C and C points to B.

Let's assume that we already generated (A, B, 1) and (A, C, 2)
in previous iterations.

Then the "recursive statement" will generate the new
rows (A, C, 2) and (A, B, 3).
The SELECT ... GROUP BY only surrounds the recursive statement,
So the result will still be (A, C, 2) and (A, B, 3).

Then the UNION will take care of the first triple, but the second
one will be added in this iteration.

And so on ad infinitum.

Yours,
Laurenz Albe

Re: WITH RECURSIVE doesn't work properly for me

From
Albe Laurenz
Date:
Jing Fan wrote:
> On Wed, Nov 6, 2013 at 8:10 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
>> Let's assume that we have three nodes A, B and C.
>> Also, A points to B, B points to C and C points to B.
>> 
>> Let's assume that we already generated (A, B, 1) and (A, C, 2)
>> in previous iterations.
>> 
>> Then the "recursive statement" will generate the new
>> rows (A, C, 2) and (A, B, 3).
>> The SELECT ... GROUP BY only surrounds the recursive statement,
>> So the result will still be (A, C, 2) and (A, B, 3).
>> 
>> Then the UNION will take care of the first triple, but the second
>> one will be added in this iteration.
>> 
>> And so on ad infinitum.

> But after this iteration, the paths will be:
> A B 1
> B C 1
> C B 1
> A C 2
> A B 3
> 
> in next iteration, the recursive statement will generate (A,C,2), (A,B,3), and (A,C,4), after the
> group by, it will still be (A,C,2) and (A,B,3)
> so I think it should stop after this iteration.

I see, I didn't notice that.

Actually there is a mistake in my explanation above, see
http://www.postgresql.org/docs/9.3/static/queries-with.html#QUERIES-WITH-SELECT
"Recursive Query Evaluation" for a detailed explanation:

In step 2b, the "working table" is replaced with the "intermediate table",
so the next iteration does not see all previously generated rows,
but only the ones that were generated in the previous iteration.

So in our case, the working table will look like this:

Initially:
A B 1
B C 1
C B 1

After the first iteration:
A C 2

After the third iteration:
A B 3

After the fourth iteration:
A C 4

... and so on.

Your GROUP BY assumes that the working table contains
all previously generated rows.

Yours,
Laurenz Albe

Re: WITH RECURSIVE doesn't work properly for me

From
Jing Fan
Date:
Yeah, that can explain why it doesn't work.

Thank you very much:)


On Wed, Nov 6, 2013 at 8:40 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Jing Fan wrote:
> On Wed, Nov 6, 2013 at 8:10 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
>> Let's assume that we have three nodes A, B and C.
>> Also, A points to B, B points to C and C points to B.
>>
>> Let's assume that we already generated (A, B, 1) and (A, C, 2)
>> in previous iterations.
>>
>> Then the "recursive statement" will generate the new
>> rows (A, C, 2) and (A, B, 3).
>> The SELECT ... GROUP BY only surrounds the recursive statement,
>> So the result will still be (A, C, 2) and (A, B, 3).
>>
>> Then the UNION will take care of the first triple, but the second
>> one will be added in this iteration.
>>
>> And so on ad infinitum.

> But after this iteration, the paths will be:
> A B 1
> B C 1
> C B 1
> A C 2
> A B 3
>
> in next iteration, the recursive statement will generate (A,C,2), (A,B,3), and (A,C,4), after the
> group by, it will still be (A,C,2) and (A,B,3)
> so I think it should stop after this iteration.

I see, I didn't notice that.

Actually there is a mistake in my explanation above, see
http://www.postgresql.org/docs/9.3/static/queries-with.html#QUERIES-WITH-SELECT
"Recursive Query Evaluation" for a detailed explanation:

In step 2b, the "working table" is replaced with the "intermediate table",
so the next iteration does not see all previously generated rows,
but only the ones that were generated in the previous iteration.

So in our case, the working table will look like this:

Initially:
A B 1
B C 1
C B 1

After the first iteration:
A C 2

After the third iteration:
A B 3

After the fourth iteration:
A C 4

... and so on.

Your GROUP BY assumes that the working table contains
all previously generated rows.

Yours,
Laurenz Albe