Home > mailing lists

Re: Recursive merging of overlapping arrays in a column - Mailing list pgsql-sql

From	dave
Subject	Re: Recursive merging of overlapping arrays in a column
Date	September 20, 2015 15:57:28
Msg-id	1442764637710-5866579.post@n5.nabble.com Whole thread Raw
In response to	Re: Recursive merging of overlapping arrays in a column (Rob Sargent <robjsargent@gmail.com>)
Responses	Re: Recursive merging of overlapping arrays in a column
List	pgsql-sql

Tree view

Sorry, here is the post again in plain text...

i have the following Table:

CREATE TABLE arrays (id SERIAL, arr INT[]);
INSERT INTO arrays (arr) VALUES (ARRAY[1,3,6,9]);
INSERT INTO arrays (arr) VALUES (ARRAY[2,4]);
INSERT INTO arrays (arr) VALUES (ARRAY[3,10,40]);
INSERT INTO arrays (arr) VALUES (ARRAY[3,18,44]);
INSERT INTO arrays (arr) VALUES (ARRAY[63,140,420]);
INSERT INTO arrays (arr) VALUES (ARRAY[42,102,420]);
INSERT INTO arrays (arr) VALUES (ARRAY[2,7]);
INSERT INTO arrays (arr) VALUES (ARRAY[1,3,11]);
INSERT INTO arrays (arr) VALUES (ARRAY[8,12,19]);


I want to merge the arrays which have overlapping elements, so that I get
the result which doesn't contain overlapping arrays anymore:
          arr            
--------------------------{1,3,6,9,10,11,18,40,44}{2,4,7}{8,12,19}{42,63,102,140,420}


I am not an expert in SQL and it took me a long time to come up with this
solution:

WITH RECURSIVE clusters AS (select DISTINCT uniq(sort_asc(array_cat(a1.arr, a2.arr))) AS arrfrom arrays a1 cross join
arraysa2 where a1.arr && a2.arr ANDleast(a1.id,a2.id) != greatest(a1.id, a2.id)UNIONselect DISTINCT
uniq(sort_asc(array_cat(a1.arr,a2.arr))) AS arrfrom arrays a1 cross join clusters a2 where a1.arr && a2.arr AND a1.arr
!=a2.arr
 
)

SELECT arr FROM (SELECT * FROM (    SELECT DISTINCT ON (arr[1]) arr     FROM clusters    ORDER BY arr[1],
array_length(arr,1) DESC) AS c
 
UNION
SELECT arr FROM arrays WHERE id NOT IN (    SELECT DISTINCT a1.id    FROM arrays a1 CROSS JOIN arrays a2     WHERE
a1.arr&& a2.arr AND    least(a1.id,a2.id) != greatest(a1.id, a2.id))
 
) AS clustertable
ORDER BY arr;


Which gives me the result:
          arr            
--------------------------{1,3,6,9,10,11,18,40,44}{2,4,7}{3,10,18,40,44}{8,12,19}{42,63,102,140,420}
(5 rows)


Result number 3 is contained in number one and shouldn't be in the output
anymore, because I only want non overlapping arrays in the result.

Another problem I encountered is that the performance of this query seems to
be very bad. I tried running it on a larger table (~400000 arrays) and it is
still running after ~10h.

I would appreciate any input on this problems, so it would be nice if anyone
could give me a hint how to get only the merged arrays without overlaps in
the resultset and maybe how to build a more elegant and efficient query.


Thanks in advance,

Dave 



--
View this message in context:
http://postgresql.nabble.com/Recursive-merging-of-overlapping-arrays-in-a-column-tp5866560p5866579.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.

pgsql-sql by date:

From: Rob Sargent
Date: 20 September 2015, 14:01:20
Subject: Re: Recursive merging of overlapping arrays in a column

From: "David G. Johnston"
Date: 20 September 2015, 17:43:29
Subject: Re: Recursive merging of overlapping arrays in a column

Re: Recursive merging of overlapping arrays in a column - Mailing list pgsql-sql

Previous

Next