Documentation example for RECURISVE WITH isn't what I would expect - Mailing list pgsql-docs

From Alex Sherwin
Subject Documentation example for RECURISVE WITH isn't what I would expect
Date
Msg-id CAMqGX520K=3onSR45KQb=7iwk9V2Xt43V267K+Lx+XZcA=K0aw@mail.gmail.com
Whole thread Raw
Responses Re: Documentation example for RECURISVE WITH isn't what I would expect  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-docs
The 9.1 docs for RECURSIVE WITH: http://www.postgresql.org/docs/9.1/static/queries-with.html eventually builds up to this example query:

WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (       SELECT g.id, g.link, g.data, 1,         ARRAY[g.id],         false       FROM graph g     UNION ALL       SELECT g.id, g.link, g.data, sg.depth + 1,         path || g.id,         g.id = ANY(path)       FROM graph g, search_graph sg       WHERE g.id = sg.link AND NOT cycle
)
SELECT * FROM search_graph;

From reading the docs and this query, it would be my understanding that the recursive portion (after UNION ALL) can refer to itself, which is what it appears this query does with the definition of cycle as "g.id = ANY(path)" then later referring to it in the WHERE clause "ANT NOT cycle", meaning that I would expect this to exclude duplicate traversals.

But, in my test this is not the case, I have to modify the query (complete working example below) as follows to make it work as the docs (I believe) intended:

drop table graph;
create table graph (
id varchar not null primary key,
link varchar null,
data varchar null);

insert into graph (id, link, data) values ('a', null, 'a');
insert into graph (id, link, data) values ('ab', 'a', 'ab');
insert into graph (id, link, data) values ('ac', 'a', 'ac');
insert into graph (id, link, data) values ('aca', 'ac', 'aca');
insert into graph (id, link, data) values ('acb', 'ac', 'acb');
insert into graph (id, link, data) values ('acba', 'acb', 'acba');
insert into graph (id, link, data) values ('acbb', 'acb', 'acbb');
insert into graph (id, link, data) values ('acc', 'ac', 'acc');
insert into graph (id, link, data) values ('ad', 'a', 'ad');

with recursive search_graph(id, link, data, depth, path, cycle) as 
(
  select 
    g.id, g.link, g.data, 1, array[g.id], false 
  from graph g
  union all
  select 
    g.id, g.link, g.data, sg.depth + 1, path || g.id, g.id=any(path || g.id)
  from graph g, search_graph sg
  where g.id=sg.link and not cycle and not g.id=any(path || g.id)
)
select * from search_graph


Without the change "g.id=any(path || g.id)", cycle is never set to true.  If this were the only change, the first level ('a') would still be re-visited, but stop any further duplicate visits from there.  In order to prevent 'a' from being re-visited altogether, the WHERE must be modified as well with "and not g.id=any(path || g.id)", which essentially negates the need for cycle to begin with and can be re-factored as:

with recursive search_graph(id, link, data, depth, path) as 
(
  select 
    g.id, g.link, g.data, 1, array[g.id]
  from graph g
  union all
  select 
    g.id, g.link, g.data, sg.depth + 1, path || g.id
  from graph g, search_graph sg
  where g.id=sg.link and not g.id=any(path || g.id)
)
select * from search_graph


Unless I'm missing something here.. but I don't think the example in the docs works correctly.

To apply the docs example to my data set, would look like:

with recursive search_graph(id, link, data, depth, path, cycle) as 
(
  select 
    g.id, g.link, g.data, 1, array[g.id], false 
  from graph g
  union all
  select 
    g.id, g.link, g.data, sg.depth + 1, path || g.id, g.id=any(path)
  from graph g, search_graph sg
  where g.id=sg.link and not cycle
)
select * from search_graph


Which re-visits many rows





--
Alexander Sherwin

pgsql-docs by date:

Previous
From: Daniele Varrazzo
Date:
Subject: Wrong advisory locks docs in pg_locks
Next
From: Tom Lane
Date:
Subject: Re: Documentation example for RECURISVE WITH isn't what I would expect