Re: WITH RECURSIVE doesn't work properly for me - Mailing list pgsql-general

From Albe Laurenz
Subject Re: WITH RECURSIVE doesn't work properly for me
Date
Msg-id A737B7A37273E048B164557ADEF4A58B17C56588@ntex2010i.host.magwien.gv.at
Whole thread Raw
In response to WITH RECURSIVE doesn't work properly for me  (Jing Fan <fanjing09@gmail.com>)
Responses Re: WITH RECURSIVE doesn't work properly for me
List pgsql-general
Jing Fan wrote:
> I use following command to get a shortest-path query:
> 
> with recursive paths( src_id, dest_id, dist) as(
>         select n1,n2,1
>         from nodes
>         union
>         select src_id, dest_id, min(dist)
>         from (  select paths.src_id as src_id, nodes.n2 as dest_id, paths.dist+1 as dist
>             from paths, nodes
>             where paths.dest_id=nodes.n1
>             and paths.src_id<>nodes.n2
>             ) as Temp
>         group by src_id, dest_id
>         )
> select paths.src_id, paths.dest_id, min(dist)
>     from paths
>     group by 1,2;
> 
> It seems that this query goes into infinite loops and finally run out of disk space. However, I testrf
> every iteration seperately and found that it will converge after 3-4 iterations. I wonder where is the
> problem. Could anyone help with it? The attatchment is the test data.

The attached test data suggest different table and column names,
but I assume that you mean "edge" when you write "nodes" and
that columns "n1" and "n2" are really "src_id" and "dest_id".

The endless loop occurs because there are loops in your
directed graph, but you only exclude circles where beginning
is equal to end.

To quote three lines from your attachment:
INSERT INTO edge (src_id, dest_id) VALUES (1739, 6236);
INSERT INTO edge (src_id, dest_id) VALUES (6236, 1739);
INSERT INTO edge (src_id, dest_id) VALUES (3384, 6236);

Your recursive WITH clause (CTE) will now happily produce:
 src_id | dest_id | dist
--------+---------+------
   3384 |    6236 |    1
   3384 |    1739 |    2
   3384 |    6236 |    3
   3384 |    1739 |    4
   3384 |    6236 |    5
   3384 |    1739 |    6
   3384 |    6236 |    7
   3384 |    1739 |    8
   3384 |    6236 |    9
   3384 |    1739 |   10
   3384 |    6236 |   11
and so on to infinity, which is why you will eventually run
out of space.

The grouping (and any other processing in your main query)
takes place only after the CTE has been calculated, so while
your query would in theory return the desired result, it does
so only after calculating infinitely many intermediate rows.

One solution I can envision is to put a limit on the distance,
for example the total count of nodes.

Yours,
Laurenz Albe

pgsql-general by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: Keepalive doubt.
Next
From: bsreejithin
Date:
Subject: Re: Junk date getting uploaded into date field