Hi,
I was asked to post this here for any interested parties -- please Cc me on
any comments/followups as I'm not subscribed to -hackers.
This is a sort-of writeup on how to find the transitive closure[1] of a
directed graph G(V,E), or G+(V,E) if you want to get into notation. (There
are multiple ways to do this, though, and this only describes one of them.
I've been told it could be possible to do this with WITH RECURSIVE, but I've
never used it myself and PostgreSQL doesn't support it, so it's left as an
excercise for the reader :-) )
In short, the transitive closure of a graph is defined as follows: The
transitive closure G+(V,E) contains an edge A -> B if and only If there is a
path from A to B (possibly via other vertices) in the graph G(V,E). In other
words, one question the transitive closure can answer quickly (to other
queries) is "is there a path from A to B?". (We use this in our door card
access system, which supports group inheritance -- a group can be a subgroup
of another group and thus inherits all its door access.)
A relational database is unable to store and operate on a graph directly, but
you can store the vertices and edges separately, as in:
CREATE TABLE vertices ( id INTEGER PRIMARY KEY <any other fields you'd want here> ); CREATE TABLE edges ( upper
INTEGERREFERENCES vertices(id), lower INTEGER REFERENCES vertices(id) );
We'll be using PL/PgSQL, which unfortunately can't handle temporary tables
very well at the moment, so we'll also define a temporary copy we can work on
from our function, identical to edges:
CREATE TABLE temp ( upper INTEGER, lower INTEGER );
The basic idea of the technique is to grow the closure outwards from the
original graph; this makes it somewhat like the Bellman-Ford algorithm
implemented in SQL. It is not particularily effective on dense graphs (since
it generates potentially V^2 records in the temporary table) or in graphs
with very long paths (since it needs one iteration for every step of the
longest path), but works well on sparse graphs without too long paths.
It's also very simple, and tries to leave as much as possible of the work to
PostgreSQL itself. (I implemented a variant using depth-first search and
simulated queues once, and it was 10-20 times slower, possibly due to
PL/PgSQL overhead -- an implementation in C or Perl might fare a lot better.)
So, for every iteration we basically do this: For every edge (A,B) in our
table, find all edges (B,C) that we don't already have found. Since there's a
path between A and C, insert (A,C) as an edge. Repeat until there are no more
edges inserted, and we have our transitive closure.
The function goes as follows:
CREATE FUNCTION transitive_closure() RETURNS SETOF edges AS
'
DECLARE r record;
BEGIN -- Start with the initial set of edges as a base INSERT INTO temp SELECT * FROM edges;
-- Grow the graph until nothing more happens LOOP INSERT INTO temp SELECT
e1.upper,e2.lower FROM temp e1 JOIN temp e2 ON e1.lower=e2.upper WHERE (e1.upper,
e2.lower)NOT IN ( SELECT * FROM temp ); EXIT WHEN NOT FOUND;
END LOOP; -- Return the entire data set FOR r in SELECT * FROM temp LOOP RETURN NEXT r;
ENDLOOP;
-- Clean up TRUNCATE temp; RETURN;
END;
' LANGUAGE plpgsql;
Also note that after this, if there's any edge (A,A) in the graph, there is
a loop somewhere, and the original graph is not a DAG.
A simple example, just for reference:
graph=# select * from vertices; id ---- 1 2 3 4 5 (5 rows)
graph=# select * from edges; upper | lower -------+------- 1 | 2 2 | 3 2 | 4 1 |
5 5 | 4 (5 rows)
The matching graph in ASCII art looks something like (I haven't drawn the
arrows): 1 / \ 2 5 / \ / 3 4
The matching transitive closure then becomes:
mdb2=# select * from transitive_closure(); upper | lower -------+------- 1 | 2 2 | 3 2 |
4 1 | 5 5 | 4 1 | 3 1 | 4 1 | 4 (8 rows)
Or, again in ASCII (a bit harder this time :-) ) --- 1 / / | \ / 2 | 5 |/ \ | / 3 4
Note that there's a duplicate in there (because both the path 1 -> 2 -> 4
and 1 -> 5 -> 4 are added in the same iteration and thus produce the edge
1 -> 4); you might want to add a DISTINCT (either at the end or in each
iteration) if that's a problem.
/* Steinar */
[1] http://en.wikipedia.org/wiki/Transitive_closure
--
Homepage: http://www.sesse.net/