Thread: Recursive merging of overlapping arrays in a column
Hey mailing list, i have the following Table: I want to merge the arrays which have overlapping elements, so that I get the result which doesn't contain overlapping arrays anymore: I am not an expert in SQL and it took me a long time to come up with this solution: Which gives me the result: Result number 3 is contained in number one and shouldn't be in the output anymore, because I only want non overlapping arrays in the result. Another problem I encountered is that the performance of this query seems to be very bad. I tried running it on a larger table (~400000 arrays) and it is still running after ~10h. I would appreciate any input on this problems, so it would be nice if anyone could give me a hint how to get only the merged arrays without overlaps in the resultset and maybe how to build a more elegant and efficient query. Thanks in advance, Dave -- View this message in context: http://postgresql.nabble.com/Recursive-merging-of-overlapping-arrays-in-a-column-tp5866560.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
<div class="moz-cite-prefix">On 09/20/2015 05:12 AM, dave wrote:<br /></div><blockquote cite="mid:1442747556700-5866560.post@n5.nabble.com"type="cite"><pre wrap="">Hey mailing list, i have the following Table: I want to merge the arrays which have overlapping elements, so that I get the result which doesn't contain overlapping arrays anymore: I am not an expert in SQL and it took me a long time to come up with this solution: Which gives me the result: Result number 3 is contained in number one and shouldn't be in the output anymore, because I only want non overlapping arrays in the result. Another problem I encountered is that the performance of this query seems to be very bad. I tried running it on a larger table (~400000 arrays) and it is still running after ~10h. I would appreciate any input on this problems, so it would be nice if anyone could give me a hint how to get only the merged arrays without overlaps in the resultset and maybe how to build a more elegant and efficient query. Thanks in advance, Dave -- View this message in context: <a class="moz-txt-link-freetext" href="http://postgresql.nabble.com/Recursive-merging-of-overlapping-arrays-in-a-column-tp5866560.html">http://postgresql.nabble.com/Recursive-merging-of-overlapping-arrays-in-a-column-tp5866560.html</a> Sent from the PostgreSQL - sql mailing list archive at Nabble.com. </pre></blockquote><font face="Courier New, Courier, monospace">I don't see your table def, sql nor result output. Suggestyou re-post with all those as plain text</font><br />
Sorry, here is the post again in plain text... i have the following Table: CREATE TABLE arrays (id SERIAL, arr INT[]); INSERT INTO arrays (arr) VALUES (ARRAY[1,3,6,9]); INSERT INTO arrays (arr) VALUES (ARRAY[2,4]); INSERT INTO arrays (arr) VALUES (ARRAY[3,10,40]); INSERT INTO arrays (arr) VALUES (ARRAY[3,18,44]); INSERT INTO arrays (arr) VALUES (ARRAY[63,140,420]); INSERT INTO arrays (arr) VALUES (ARRAY[42,102,420]); INSERT INTO arrays (arr) VALUES (ARRAY[2,7]); INSERT INTO arrays (arr) VALUES (ARRAY[1,3,11]); INSERT INTO arrays (arr) VALUES (ARRAY[8,12,19]); I want to merge the arrays which have overlapping elements, so that I get the result which doesn't contain overlapping arrays anymore: arr --------------------------{1,3,6,9,10,11,18,40,44}{2,4,7}{8,12,19}{42,63,102,140,420} I am not an expert in SQL and it took me a long time to come up with this solution: WITH RECURSIVE clusters AS (select DISTINCT uniq(sort_asc(array_cat(a1.arr, a2.arr))) AS arrfrom arrays a1 cross join arraysa2 where a1.arr && a2.arr ANDleast(a1.id,a2.id) != greatest(a1.id, a2.id)UNIONselect DISTINCT uniq(sort_asc(array_cat(a1.arr,a2.arr))) AS arrfrom arrays a1 cross join clusters a2 where a1.arr && a2.arr AND a1.arr !=a2.arr ) SELECT arr FROM (SELECT * FROM ( SELECT DISTINCT ON (arr[1]) arr FROM clusters ORDER BY arr[1], array_length(arr,1) DESC) AS c UNION SELECT arr FROM arrays WHERE id NOT IN ( SELECT DISTINCT a1.id FROM arrays a1 CROSS JOIN arrays a2 WHERE a1.arr&& a2.arr AND least(a1.id,a2.id) != greatest(a1.id, a2.id)) ) AS clustertable ORDER BY arr; Which gives me the result: arr --------------------------{1,3,6,9,10,11,18,40,44}{2,4,7}{3,10,18,40,44}{8,12,19}{42,63,102,140,420} (5 rows) Result number 3 is contained in number one and shouldn't be in the output anymore, because I only want non overlapping arrays in the result. Another problem I encountered is that the performance of this query seems to be very bad. I tried running it on a larger table (~400000 arrays) and it is still running after ~10h. I would appreciate any input on this problems, so it would be nice if anyone could give me a hint how to get only the merged arrays without overlaps in the resultset and maybe how to build a more elegant and efficient query. Thanks in advance, Dave -- View this message in context: http://postgresql.nabble.com/Recursive-merging-of-overlapping-arrays-in-a-column-tp5866560p5866579.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
On Sunday, September 20, 2015, dave <audiotecture@web.de> wrote:
Sorry, here is the post again in plain text...
i have the following Table:
CREATE TABLE arrays (id SERIAL, arr INT[]);
INSERT INTO arrays (arr) VALUES (ARRAY[1,3,6,9]);
INSERT INTO arrays (arr) VALUES (ARRAY[2,4]);
INSERT INTO arrays (arr) VALUES (ARRAY[3,10,40]);
INSERT INTO arrays (arr) VALUES (ARRAY[3,18,44]);
INSERT INTO arrays (arr) VALUES (ARRAY[63,140,420]);
INSERT INTO arrays (arr) VALUES (ARRAY[42,102,420]);
INSERT INTO arrays (arr) VALUES (ARRAY[2,7]);
INSERT INTO arrays (arr) VALUES (ARRAY[1,3,11]);
INSERT INTO arrays (arr) VALUES (ARRAY[8,12,19]);
I want to merge the arrays which have overlapping elements, so that I get
the result which doesn't contain overlapping arrays anymore:
arr
--------------------------
{1,3,6,9,10,11,18,40,44}
{2,4,7}
{8,12,19}
{42,63,102,140,420}
I am not an expert in SQL and it took me a long time to come up with this
solution:
WITH RECURSIVE clusters AS (
select DISTINCT uniq(sort_asc(array_cat(a1.arr, a2.arr))) AS arr
from arrays a1 cross join arrays a2
where a1.arr && a2.arr AND
least(a1.id,a2.id) != greatest(a1.id, a2.id)
UNION
select DISTINCT uniq(sort_asc(array_cat(a1.arr, a2.arr))) AS arr
from arrays a1 cross join clusters a2
where a1.arr && a2.arr AND
a1.arr != a2.arr
)
SELECT arr FROM (
SELECT * FROM (
SELECT DISTINCT ON (arr[1]) arr
FROM clusters
ORDER BY arr[1], array_length(arr, 1) DESC
) AS c
UNION
SELECT arr FROM arrays WHERE id NOT IN (
SELECT DISTINCT a1.id
FROM arrays a1 CROSS JOIN arrays a2
WHERE a1.arr && a2.arr AND
least(a1.id,a2.id) != greatest(a1.id, a2.id)
)
) AS clustertable
ORDER BY arr;
Which gives me the result:
arr
--------------------------
{1,3,6,9,10,11,18,40,44}
{2,4,7}
{3,10,18,40,44}
{8,12,19}
{42,63,102,140,420}
(5 rows)
Result number 3 is contained in number one and shouldn't be in the output
anymore, because I only want non overlapping arrays in the result.
Another problem I encountered is that the performance of this query seems to
be very bad. I tried running it on a larger table (~400000 arrays) and it is
still running after ~10h.
I would appreciate any input on this problems, so it would be nice if anyone
could give me a hint how to get only the merged arrays without overlaps in
the resultset and maybe how to build a more elegant and efficient query.
Thanks in advance,
Dave
While I've never actually done this personally...
Convert the data into nodes and edges and import it into a graph database.
David J.
Hey David, funnny that you give me that advice because that's exactly where I came from. I tried analyzing a adjacence table with several disjunct graphs without real success, so I tried concatenating the values in the Table to arrays by grouping the ids and then aggregating the overlaping arrays to finally get the whole set of points for each graph as an array. I have never worked with graph functions before, so I might be missing some elegant solution here. My feeling was simply, that it would be easier to merge the arrays like I did in my current approach. Regards Dave -- View this message in context: http://postgresql.nabble.com/Recursive-merging-of-overlapping-arrays-in-a-column-tp5866560p5866585.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
dave <audiotecture@web.de> writes: > Sorry, here is the post again in plain text... > > i have the following Table: > > CREATE TABLE arrays (id SERIAL, arr INT[]); > INSERT INTO arrays (arr) VALUES (ARRAY[1,3,6,9]); > INSERT INTO arrays (arr) VALUES (ARRAY[2,4]); > INSERT INTO arrays (arr) VALUES (ARRAY[3,10,40]); > INSERT INTO arrays (arr) VALUES (ARRAY[3,18,44]); > INSERT INTO arrays (arr) VALUES (ARRAY[63,140,420]); > INSERT INTO arrays (arr) VALUES (ARRAY[42,102,420]); > INSERT INTO arrays (arr) VALUES (ARRAY[2,7]); > INSERT INTO arrays (arr) VALUES (ARRAY[1,3,11]); > INSERT INTO arrays (arr) VALUES (ARRAY[8,12,19]); > > > I want to merge the arrays which have overlapping elements, so that I get > the result which doesn't contain overlapping arrays anymore: > > arr > -------------------------- > {1,3,6,9,10,11,18,40,44} > {2,4,7} > {8,12,19} > {42,63,102,140,420} The "intarray" extension (see Appendix F of the fine manual) provides an "overlaps" operator "&&".
Thanks for your reply. I am already using this operator to match the overlapping arrays when I cross join the tables. As far as I know intarray doesn't provide a way to merge all overlapping arrays in a whole column out of the box. -- View this message in context: http://postgresql.nabble.com/Recursive-merging-of-overlapping-arrays-in-a-column-tp5866560p5866646.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.