Re: Complex Recursive Query - Mailing list pgsql-general

From matt@byrney.com
Subject Re: Complex Recursive Query
Date
Msg-id 06947bada247128a5e937d15084abeab.squirrel@mail.byrney.com
Whole thread Raw
In response to Complex Recursive Query  (Jim Garrison <jim.garrison@nwea.org>)
List pgsql-general
I wouldn't do this with recursion; plain old iteration is your friend
(yes, WITH RECURSIVE is actually iterative, not recursive...)

The algorithm goes like this:

1. Extend your graph relation to be symmetric and transitive.
2. Assign a integer group id to each node.
3. Repeatedly join the node list to the (extended) relation, updating each
node's group id to be the minimum of the group ids of every node it
touches.
4. Stop when the group ids stop changing.

Here's some example code, using your data:

DROP SCHEMA IF EXISTS test CASCADE;
CREATE SCHEMA test;
SET SEARCH_PATH TO test;

CREATE TABLE graph(key1 TEXT, key2 TEXT);

INSERT INTO graph VALUES
('a', 'x'),
('a', 'y'),
('b', 'w'),
('c', 't'),
('x', 'a'),
('y', 'a'),
('y', 'z'),
('z', 'y'),
('t', 'c'),
('w', 'b'),
('w', 'd'),
('d', 'w');

DO
$$
DECLARE
  prev INT = 0;
  curr INT;
BEGIN
  CREATE TABLE rel AS
  SELECT key1, key2 FROM graph
  UNION
  SELECT key2, key1 FROM graph
  UNION
  SELECT key1, key1 FROM graph
  UNION
  SELECT key2, key2 FROM graph;

  CREATE TABLE group_ids AS
  SELECT
    key,
    ROW_NUMBER() OVER (ORDER BY key) AS group_id
  FROM
    (
      SELECT key1 AS key FROM graph
      UNION
      SELECT key2 FROM graph
    ) _;

  SELECT SUM(group_id) INTO curr FROM group_ids;
  WHILE prev != curr LOOP
    prev = curr;
    DROP TABLE IF EXISTS min_ids;
    CREATE TABLE min_ids AS
    SELECT
      a.key,
      MIN(c.group_id) AS group_id
    FROM
      group_ids a
    INNER JOIN
      rel b
    ON
      a.key = b.key1
    INNER JOIN
      group_ids c
    ON
      b.key2 = c.key
    GROUP BY
      a.key;

    UPDATE
      group_ids
    SET
      group_id = min_ids.group_id
    FROM
      min_ids
    WHERE
      group_ids.key = min_ids.key;

    SELECT SUM(group_id) INTO curr FROM group_ids;
  END LOOP;

  DROP TABLE IF EXISTS rel;
  DROP TABLE IF EXISTS min_ids;
END
$$;

SELECT * FROM group_ids;


Hope it helps,

Matthew




> I have a collection of relationship rows of the form
>
> Table: graph
>     key1 varchar
>     key2 varchar
>
> A row of the form ('a','b') indicates that 'a' and 'b' are related.
> The table contains many relationships between keys, forming several
> disjoint sets. All relationships are bi-directional, and both
> directions are present.  I.e. the table contains a set of disjoint
> graphs specified as node pairs.
>
> For example the set of values
>
>     key1    key2
>     -----   -----
>       a       x
>       a       y
>       b       w
>       c       t
>       x       a
>       y       a
>       y       z
>       z       y
>       t       c
>       w       b
>       w       d
>       d       w
>
> defines three disjoint groups of connected keys:
>
>       a x y z
>       c t
>       b w d
>
> What I would like to achieve is a single SQL query that returns
>
>       group key
>       ----- ---
>         1    a
>         1    x
>         1    y
>         1    z
>         2    c
>         2    t
>         3    b
>         3    w
>         3    d
>
> I don't care about preserving the node-to-node relationships, only
> the group membership for each node.
>
> I've been playing with "WITH RECURSIVE" CTEs but haven't had any
> success.  I'm not really sure how to express what I want in SQL, and
> it's not completely clear to me that recursive CTEs will help here.
> Also I'm not sure how to generate the sequence numbers for the groups
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: event triggers in 9.3.4
Next
From: matt@byrney.com
Date:
Subject: Table checksum proposal