Thread: WITH RECURSIVE doesn't work properly for me
I use following command to get a shortest-path query:
with recursive paths( src_id, dest_id, dist) as(
select n1,n2,1
from nodes
union
select src_id, dest_id, min(dist)
from ( select paths.src_id as src_id, nodes.n2 as dest_id, paths.dist+1 as dist
from paths, nodes
where paths.dest_id=nodes.n1
and paths.src_id<>nodes.n2
) as Temp
group by src_id, dest_id
)
select paths.src_id, paths.dest_id, min(dist)
from paths
group by 1,2;
It seems that this query goes into infinite loops and finally run out of disk space. However, I testrf every iteration seperately and found that it will converge after 3-4 iterations. I wonder where is the problem. Could anyone help with it? The attatchment is the test data.
Thank you very much:)
Attachment
Jing Fan wrote: > I use following command to get a shortest-path query: > > with recursive paths( src_id, dest_id, dist) as( > select n1,n2,1 > from nodes > union > select src_id, dest_id, min(dist) > from ( select paths.src_id as src_id, nodes.n2 as dest_id, paths.dist+1 as dist > from paths, nodes > where paths.dest_id=nodes.n1 > and paths.src_id<>nodes.n2 > ) as Temp > group by src_id, dest_id > ) > select paths.src_id, paths.dest_id, min(dist) > from paths > group by 1,2; > > It seems that this query goes into infinite loops and finally run out of disk space. However, I testrf > every iteration seperately and found that it will converge after 3-4 iterations. I wonder where is the > problem. Could anyone help with it? The attatchment is the test data. The attached test data suggest different table and column names, but I assume that you mean "edge" when you write "nodes" and that columns "n1" and "n2" are really "src_id" and "dest_id". The endless loop occurs because there are loops in your directed graph, but you only exclude circles where beginning is equal to end. To quote three lines from your attachment: INSERT INTO edge (src_id, dest_id) VALUES (1739, 6236); INSERT INTO edge (src_id, dest_id) VALUES (6236, 1739); INSERT INTO edge (src_id, dest_id) VALUES (3384, 6236); Your recursive WITH clause (CTE) will now happily produce: src_id | dest_id | dist --------+---------+------ 3384 | 6236 | 1 3384 | 1739 | 2 3384 | 6236 | 3 3384 | 1739 | 4 3384 | 6236 | 5 3384 | 1739 | 6 3384 | 6236 | 7 3384 | 1739 | 8 3384 | 6236 | 9 3384 | 1739 | 10 3384 | 6236 | 11 and so on to infinity, which is why you will eventually run out of space. The grouping (and any other processing in your main query) takes place only after the CTE has been calculated, so while your query would in theory return the desired result, it does so only after calculating infinitely many intermediate rows. One solution I can envision is to put a limit on the distance, for example the total count of nodes. Yours, Laurenz Albe
I have two group operations.
One is inside the CTE ( union
select src_id, dest_id, min(dist) ),
another is outside the CTE.
Do you mean that even the grouping inside the CTE will be calculated only after the CTE has been calculated?
Thank you very much:)
Best,
Jing
On Tue, Nov 5, 2013 at 5:04 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
The attached test data suggest different table and column names,Jing Fan wrote:
> I use following command to get a shortest-path query:
>
> with recursive paths( src_id, dest_id, dist) as(
> select n1,n2,1
> from nodes
> union
> select src_id, dest_id, min(dist)
> from ( select paths.src_id as src_id, nodes.n2 as dest_id, paths.dist+1 as dist
> from paths, nodes
> where paths.dest_id=nodes.n1
> and paths.src_id<>nodes.n2
> ) as Temp
> group by src_id, dest_id
> )
> select paths.src_id, paths.dest_id, min(dist)
> from paths
> group by 1,2;
>
> It seems that this query goes into infinite loops and finally run out of disk space. However, I testrf
> every iteration seperately and found that it will converge after 3-4 iterations. I wonder where is the
> problem. Could anyone help with it? The attatchment is the test data.
but I assume that you mean "edge" when you write "nodes" and
that columns "n1" and "n2" are really "src_id" and "dest_id".
The endless loop occurs because there are loops in your
directed graph, but you only exclude circles where beginning
is equal to end.
To quote three lines from your attachment:
INSERT INTO edge (src_id, dest_id) VALUES (1739, 6236);
INSERT INTO edge (src_id, dest_id) VALUES (6236, 1739);
INSERT INTO edge (src_id, dest_id) VALUES (3384, 6236);
Your recursive WITH clause (CTE) will now happily produce:
src_id | dest_id | dist
--------+---------+------
3384 | 6236 | 1
3384 | 1739 | 2
3384 | 6236 | 3
3384 | 1739 | 4
3384 | 6236 | 5
3384 | 1739 | 6
3384 | 6236 | 7
3384 | 1739 | 8
3384 | 6236 | 9
3384 | 1739 | 10
3384 | 6236 | 11
and so on to infinity, which is why you will eventually run
out of space.
The grouping (and any other processing in your main query)
takes place only after the CTE has been calculated, so while
your query would in theory return the desired result, it does
so only after calculating infinitely many intermediate rows.
One solution I can envision is to put a limit on the distance,
for example the total count of nodes.
Yours,
Laurenz Albe
Jing Fan wrote: > I have two group operations. > > One is inside the CTE ( union > select src_id, dest_id, min(dist) ), > another is outside the CTE. > Do you mean that even the grouping inside the CTE will be calculated only after the CTE has been > calculated? I mean the one outside the CTE. The one inside does not do anything at all, you could omit it. Yours, Laurenz Albe
Why the one inside does not do anything? It won't be executed?
Best,
Jing
On Tue, Nov 5, 2013 at 8:52 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Jing Fan wrote:I mean the one outside the CTE.
> I have two group operations.
>
> One is inside the CTE ( union
> select src_id, dest_id, min(dist) ),
> another is outside the CTE.
> Do you mean that even the grouping inside the CTE will be calculated only after the CTE has been
> calculated?
The one inside does not do anything at all, you could omit it.
Yours,
Laurenz Albe
Jing Fan wrote: > Why the one inside does not do anything? It won't be executed? It is executed. It might filter out the occasional row, but if you look at the example I gave you, you'll see that it won't do anything to keep it from recursing. Yours, Laurenz Albe
If the grouping inside CTE is executed, I don't think it would generate result like
src_id | dest_id | dist
--------+---------+------
3384 | 6236 | 1
3384 | 1739 | 2
3384 | 6236 | 3
3384 | 1739 | 4
3384 | 6236 | 5
3384 | 1739 | 6
3384 | 6236 | 7
3384 | 1739 | 8
3384 | 6236 | 9
3384 | 1739 | 10
3384 | 6236 | 11
--------+---------+------
3384 | 6236 | 1
3384 | 1739 | 2
3384 | 6236 | 3
3384 | 1739 | 4
3384 | 6236 | 5
3384 | 1739 | 6
3384 | 6236 | 7
3384 | 1739 | 8
3384 | 6236 | 9
3384 | 1739 | 10
3384 | 6236 | 11
for we have min(dist),
so it should be like
src_id | dest_id | dist
--------+---------+------
3384 | 6236 | 1
3384 | 1739 | 2
--------+---------+------
3384 | 6236 | 1
3384 | 1739 | 2
other values will be eliminated by min(). It actually generate no new tuples and the iteration should stop.
Best,
Jing
On Tue, Nov 5, 2013 at 9:28 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Jing Fan wrote:It is executed.
> Why the one inside does not do anything? It won't be executed?
It might filter out the occasional row, but if you look at
the example I gave you, you'll see that it won't do anything
to keep it from recursing.
Yours,
Laurenz Albe
Jing Fan wrote: > If the grouping inside CTE is executed, I don't think it would generate result like > > src_id | dest_id | dist > --------+---------+------ > 3384 | 6236 | 1 > 3384 | 1739 | 2 > 3384 | 6236 | 3 > 3384 | 1739 | 4 > 3384 | 6236 | 5 > 3384 | 1739 | 6 > 3384 | 6236 | 7 > 3384 | 1739 | 8 > 3384 | 6236 | 9 > 3384 | 1739 | 10 > 3384 | 6236 | 11 > > > > for we have min(dist), > so it should be like > > > src_id | dest_id | dist > --------+---------+------ > 3384 | 6236 | 1 > 3384 | 1739 | 2 > > > > other values will be eliminated by min(). It actually generate no new tuples and the iteration should > stop. You forget that the grouping query only spans the second branch of the UNION, where you add the new entries. So the new entries and the old entries won't be grouped together, and the new paths that are longer than the old ones won't get removed. Unfortunately you cannot have the UNION in a subquery for recursive CTEs, but you could use arrays to achieve what you want: WITH RECURSIVE paths (path) AS ( SELECT ARRAY[src_id, dest_id] FROM edge UNION ALL SELECT edge.src_id || paths.path FROM paths, edge WHERE edge.dest_id = paths.path[array_lower(paths.path, 1)] AND edge.src_id <> ALL (paths.path) ) SELECT path[1], path[array_upper(path, 1)], min(array_length(path, 1)) FROM paths GROUP BY 1, 2; The whole exercise sounds a bit like homework to me. Yours, Laurenz Albe
I am sorry but I still don't understand why it doesn't work. Possibly I misunderstand how with recursive works?
In my opinion,
with recursive table as{
seed statement
union
recursive statement
}
In every iteration, It will just generate results from seed statement union recursive statement and put them into a new temporary table, and then compare the results with the former temporary table and check if there are any new tuples. If no new tuples, just stop iteration. Is there any tricky things about recursive statement?
Thank you very much:)
Best,
Jing
On Wed, Nov 6, 2013 at 2:13 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Jing Fan wrote:You forget that the grouping query only spans the second branch
> If the grouping inside CTE is executed, I don't think it would generate result like
>
> src_id | dest_id | dist
> --------+---------+------
> 3384 | 6236 | 1
> 3384 | 1739 | 2
> 3384 | 6236 | 3
> 3384 | 1739 | 4
> 3384 | 6236 | 5
> 3384 | 1739 | 6
> 3384 | 6236 | 7
> 3384 | 1739 | 8
> 3384 | 6236 | 9
> 3384 | 1739 | 10
> 3384 | 6236 | 11
>
>
>
> for we have min(dist),
> so it should be like
>
>
> src_id | dest_id | dist
> --------+---------+------
> 3384 | 6236 | 1
> 3384 | 1739 | 2
>
>
>
> other values will be eliminated by min(). It actually generate no new tuples and the iteration should
> stop.
of the UNION, where you add the new entries.
So the new entries and the old entries won't be grouped together,
and the new paths that are longer than the old ones won't get removed.
Unfortunately you cannot have the UNION in a subquery for
recursive CTEs, but you could use arrays to achieve what you want:
WITH RECURSIVE paths (path) AS (
SELECT ARRAY[src_id, dest_id] FROM edge
UNION ALL
SELECT edge.src_id || paths.path
FROM paths, edge
WHERE edge.dest_id = paths.path[array_lower(paths.path, 1)]
AND edge.src_id <> ALL (paths.path)
)
SELECT path[1], path[array_upper(path, 1)], min(array_length(path, 1))
FROM paths
GROUP BY 1, 2;
The whole exercise sounds a bit like homework to me.
Yours,
Laurenz Albe
Jing Fan wrote: > I am sorry but I still don't understand why it doesn't work. Possibly I misunderstand how with > recursive works? > In my opinion, > with recursive table as{ > seed statement > union > recursive statement > } > In every iteration, It will just generate results from seed statement union recursive statement and > put them into a new temporary table, and then compare the results with the former temporary table and > check if there are any new tuples. If no new tuples, just stop iteration. Is there any tricky things > about recursive statement? That is correct. Let's assume that we have three nodes A, B and C. Also, A points to B, B points to C and C points to B. Let's assume that we already generated (A, B, 1) and (A, C, 2) in previous iterations. Then the "recursive statement" will generate the new rows (A, C, 2) and (A, B, 3). The SELECT ... GROUP BY only surrounds the recursive statement, So the result will still be (A, C, 2) and (A, B, 3). Then the UNION will take care of the first triple, but the second one will be added in this iteration. And so on ad infinitum. Yours, Laurenz Albe
But after this iteration, the paths will be:
A B 1
B C 1
C B 1
A C 2
A B 3
in next iteration, the recursive statement will generate (A,C,2), (A,B,3), and (A,C,4), after the group by, it will still be (A,C,2) and (A,B,3)
so I think it should stop after this iteration.
On Wed, Nov 6, 2013 at 8:10 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Jing Fan wrote:That is correct.
> I am sorry but I still don't understand why it doesn't work. Possibly I misunderstand how with
> recursive works?
> In my opinion,
> with recursive table as{
> seed statement
> union
> recursive statement
> }
> In every iteration, It will just generate results from seed statement union recursive statement and
> put them into a new temporary table, and then compare the results with the former temporary table and
> check if there are any new tuples. If no new tuples, just stop iteration. Is there any tricky things
> about recursive statement?
Let's assume that we have three nodes A, B and C.
Also, A points to B, B points to C and C points to B.
Let's assume that we already generated (A, B, 1) and (A, C, 2)
in previous iterations.
Then the "recursive statement" will generate the new
rows (A, C, 2) and (A, B, 3).
The SELECT ... GROUP BY only surrounds the recursive statement,
So the result will still be (A, C, 2) and (A, B, 3).
Then the UNION will take care of the first triple, but the second
one will be added in this iteration.
And so on ad infinitum.
Yours,
Laurenz Albe
Jing Fan wrote: > On Wed, Nov 6, 2013 at 8:10 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote: >> Let's assume that we have three nodes A, B and C. >> Also, A points to B, B points to C and C points to B. >> >> Let's assume that we already generated (A, B, 1) and (A, C, 2) >> in previous iterations. >> >> Then the "recursive statement" will generate the new >> rows (A, C, 2) and (A, B, 3). >> The SELECT ... GROUP BY only surrounds the recursive statement, >> So the result will still be (A, C, 2) and (A, B, 3). >> >> Then the UNION will take care of the first triple, but the second >> one will be added in this iteration. >> >> And so on ad infinitum. > But after this iteration, the paths will be: > A B 1 > B C 1 > C B 1 > A C 2 > A B 3 > > in next iteration, the recursive statement will generate (A,C,2), (A,B,3), and (A,C,4), after the > group by, it will still be (A,C,2) and (A,B,3) > so I think it should stop after this iteration. I see, I didn't notice that. Actually there is a mistake in my explanation above, see http://www.postgresql.org/docs/9.3/static/queries-with.html#QUERIES-WITH-SELECT "Recursive Query Evaluation" for a detailed explanation: In step 2b, the "working table" is replaced with the "intermediate table", so the next iteration does not see all previously generated rows, but only the ones that were generated in the previous iteration. So in our case, the working table will look like this: Initially: A B 1 B C 1 C B 1 After the first iteration: A C 2 After the third iteration: A B 3 After the fourth iteration: A C 4 ... and so on. Your GROUP BY assumes that the working table contains all previously generated rows. Yours, Laurenz Albe
Yeah, that can explain why it doesn't work.
Thank you very much:)
On Wed, Nov 6, 2013 at 8:40 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
>> Let's assume that we have three nodes A, B and C.
>> Also, A points to B, B points to C and C points to B.
>>
>> Let's assume that we already generated (A, B, 1) and (A, C, 2)
>> in previous iterations.
>>
>> Then the "recursive statement" will generate the new
>> rows (A, C, 2) and (A, B, 3).
>> The SELECT ... GROUP BY only surrounds the recursive statement,
>> So the result will still be (A, C, 2) and (A, B, 3).
>>
>> Then the UNION will take care of the first triple, but the second
>> one will be added in this iteration.
>>
>> And so on ad infinitum.> But after this iteration, the paths will be:I see, I didn't notice that.
> A B 1
> B C 1
> C B 1
> A C 2
> A B 3
>
> in next iteration, the recursive statement will generate (A,C,2), (A,B,3), and (A,C,4), after the
> group by, it will still be (A,C,2) and (A,B,3)
> so I think it should stop after this iteration.
Actually there is a mistake in my explanation above, see
http://www.postgresql.org/docs/9.3/static/queries-with.html#QUERIES-WITH-SELECT
"Recursive Query Evaluation" for a detailed explanation:
In step 2b, the "working table" is replaced with the "intermediate table",
so the next iteration does not see all previously generated rows,
but only the ones that were generated in the previous iteration.
So in our case, the working table will look like this:
Initially:A B 1After the first iteration:
B C 1
C B 1
A C 2
After the third iteration:
A B 3
After the fourth iteration:
A C 4
... and so on.
Your GROUP BY assumes that the working table contains
all previously generated rows.
Yours,
Laurenz Albe