Thread: Recursive merging of overlapping arrays in a column

Recursive merging of overlapping arrays in a column

From
dave
Date:
Hey mailing list,

i have the following Table:



I want to merge the arrays which have overlapping elements, so that I get
the result which doesn't contain overlapping arrays anymore:


I am not an expert in SQL and it took me a long time to come up with this
solution:


Which gives me the result:


Result number 3 is contained in number one and shouldn't be in the output
anymore, because I only want non overlapping arrays in the result.

Another problem I encountered is that the performance of this query seems to
be very bad. I tried running it on a larger table (~400000 arrays) and it is
still running after ~10h.

I would appreciate any input on this problems, so it would be nice if anyone
could give me a hint how to get only the merged arrays without overlaps in
the resultset and maybe how to build a more elegant and efficient query.


Thanks in advance,

Dave



--
View this message in context:
http://postgresql.nabble.com/Recursive-merging-of-overlapping-arrays-in-a-column-tp5866560.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.



Re: Recursive merging of overlapping arrays in a column

From
Rob Sargent
Date:
<div class="moz-cite-prefix">On 09/20/2015 05:12 AM, dave wrote:<br /></div><blockquote
cite="mid:1442747556700-5866560.post@n5.nabble.com"type="cite"><pre wrap="">Hey mailing list,
 

i have the following Table:



I want to merge the arrays which have overlapping elements, so that I get
the result which doesn't contain overlapping arrays anymore:


I am not an expert in SQL and it took me a long time to come up with this
solution:


Which gives me the result:


Result number 3 is contained in number one and shouldn't be in the output
anymore, because I only want non overlapping arrays in the result.

Another problem I encountered is that the performance of this query seems to
be very bad. I tried running it on a larger table (~400000 arrays) and it is
still running after ~10h.

I would appreciate any input on this problems, so it would be nice if anyone
could give me a hint how to get only the merged arrays without overlaps in
the resultset and maybe how to build a more elegant and efficient query.


Thanks in advance,

Dave



--
View this message in context: <a class="moz-txt-link-freetext"
href="http://postgresql.nabble.com/Recursive-merging-of-overlapping-arrays-in-a-column-tp5866560.html">http://postgresql.nabble.com/Recursive-merging-of-overlapping-arrays-in-a-column-tp5866560.html</a>
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.


</pre></blockquote><font face="Courier New, Courier, monospace">I don't see your table def, sql nor result output.
Suggestyou re-post with all those as plain text</font><br /> 

Re: Recursive merging of overlapping arrays in a column

From
dave
Date:
Sorry, here is the post again in plain text...

i have the following Table:

CREATE TABLE arrays (id SERIAL, arr INT[]);
INSERT INTO arrays (arr) VALUES (ARRAY[1,3,6,9]);
INSERT INTO arrays (arr) VALUES (ARRAY[2,4]);
INSERT INTO arrays (arr) VALUES (ARRAY[3,10,40]);
INSERT INTO arrays (arr) VALUES (ARRAY[3,18,44]);
INSERT INTO arrays (arr) VALUES (ARRAY[63,140,420]);
INSERT INTO arrays (arr) VALUES (ARRAY[42,102,420]);
INSERT INTO arrays (arr) VALUES (ARRAY[2,7]);
INSERT INTO arrays (arr) VALUES (ARRAY[1,3,11]);
INSERT INTO arrays (arr) VALUES (ARRAY[8,12,19]);


I want to merge the arrays which have overlapping elements, so that I get
the result which doesn't contain overlapping arrays anymore:
          arr            
--------------------------{1,3,6,9,10,11,18,40,44}{2,4,7}{8,12,19}{42,63,102,140,420}


I am not an expert in SQL and it took me a long time to come up with this
solution:

WITH RECURSIVE clusters AS (select DISTINCT uniq(sort_asc(array_cat(a1.arr, a2.arr))) AS arrfrom arrays a1 cross join
arraysa2 where a1.arr && a2.arr ANDleast(a1.id,a2.id) != greatest(a1.id, a2.id)UNIONselect DISTINCT
uniq(sort_asc(array_cat(a1.arr,a2.arr))) AS arrfrom arrays a1 cross join clusters a2 where a1.arr && a2.arr AND a1.arr
!=a2.arr
 
)

SELECT arr FROM (SELECT * FROM (    SELECT DISTINCT ON (arr[1]) arr     FROM clusters    ORDER BY arr[1],
array_length(arr,1) DESC) AS c
 
UNION
SELECT arr FROM arrays WHERE id NOT IN (    SELECT DISTINCT a1.id    FROM arrays a1 CROSS JOIN arrays a2     WHERE
a1.arr&& a2.arr AND    least(a1.id,a2.id) != greatest(a1.id, a2.id))
 
) AS clustertable
ORDER BY arr;


Which gives me the result:
          arr            
--------------------------{1,3,6,9,10,11,18,40,44}{2,4,7}{3,10,18,40,44}{8,12,19}{42,63,102,140,420}
(5 rows)


Result number 3 is contained in number one and shouldn't be in the output
anymore, because I only want non overlapping arrays in the result.

Another problem I encountered is that the performance of this query seems to
be very bad. I tried running it on a larger table (~400000 arrays) and it is
still running after ~10h.

I would appreciate any input on this problems, so it would be nice if anyone
could give me a hint how to get only the merged arrays without overlaps in
the resultset and maybe how to build a more elegant and efficient query.


Thanks in advance,

Dave 



--
View this message in context:
http://postgresql.nabble.com/Recursive-merging-of-overlapping-arrays-in-a-column-tp5866560p5866579.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.



Re: Recursive merging of overlapping arrays in a column

From
"David G. Johnston"
Date:
On Sunday, September 20, 2015, dave <audiotecture@web.de> wrote:
Sorry, here is the post again in plain text...

i have the following Table:

CREATE TABLE arrays (id SERIAL, arr INT[]);
INSERT INTO arrays (arr) VALUES (ARRAY[1,3,6,9]);
INSERT INTO arrays (arr) VALUES (ARRAY[2,4]);
INSERT INTO arrays (arr) VALUES (ARRAY[3,10,40]);
INSERT INTO arrays (arr) VALUES (ARRAY[3,18,44]);
INSERT INTO arrays (arr) VALUES (ARRAY[63,140,420]);
INSERT INTO arrays (arr) VALUES (ARRAY[42,102,420]);
INSERT INTO arrays (arr) VALUES (ARRAY[2,7]);
INSERT INTO arrays (arr) VALUES (ARRAY[1,3,11]);
INSERT INTO arrays (arr) VALUES (ARRAY[8,12,19]);


I want to merge the arrays which have overlapping elements, so that I get
the result which doesn't contain overlapping arrays anymore:

           arr
--------------------------
 {1,3,6,9,10,11,18,40,44}
 {2,4,7}
 {8,12,19}
 {42,63,102,140,420}


I am not an expert in SQL and it took me a long time to come up with this
solution:

WITH RECURSIVE clusters AS (
        select DISTINCT uniq(sort_asc(array_cat(a1.arr, a2.arr))) AS arr
        from arrays a1 cross join arrays a2
        where a1.arr && a2.arr AND
        least(a1.id,a2.id) != greatest(a1.id, a2.id)

        UNION

        select DISTINCT uniq(sort_asc(array_cat(a1.arr, a2.arr))) AS arr
        from arrays a1 cross join clusters a2
        where a1.arr && a2.arr AND
        a1.arr != a2.arr
)

SELECT arr FROM (
        SELECT * FROM (
                SELECT DISTINCT ON (arr[1]) arr
                FROM clusters
                ORDER BY arr[1], array_length(arr, 1) DESC
        ) AS c

        UNION

        SELECT arr FROM arrays WHERE id NOT IN (
                SELECT DISTINCT a1.id
                FROM arrays a1 CROSS JOIN arrays a2
                WHERE a1.arr && a2.arr AND
                least(a1.id,a2.id) != greatest(a1.id, a2.id)
        )
) AS clustertable
ORDER BY arr;


Which gives me the result:

           arr
--------------------------
 {1,3,6,9,10,11,18,40,44}
 {2,4,7}
 {3,10,18,40,44}
 {8,12,19}
 {42,63,102,140,420}
(5 rows)


Result number 3 is contained in number one and shouldn't be in the output
anymore, because I only want non overlapping arrays in the result.

Another problem I encountered is that the performance of this query seems to
be very bad. I tried running it on a larger table (~400000 arrays) and it is
still running after ~10h.

I would appreciate any input on this problems, so it would be nice if anyone
could give me a hint how to get only the merged arrays without overlaps in
the resultset and maybe how to build a more elegant and efficient query.


Thanks in advance,

Dave



While I've never actually done this personally...

Convert the data into nodes and edges and import it into a graph database.

David J.

Re: Recursive merging of overlapping arrays in a column

From
dave
Date:
Hey David,

funnny that you give me that advice because that's exactly where I came
from.

I tried analyzing a adjacence table with several disjunct graphs without
real success, so I tried concatenating the values in the Table to arrays by
grouping the ids and then aggregating the overlaping arrays to finally get
the whole set of points for each graph as an array.
I have never worked with graph functions before, so I might be missing some
elegant solution here. My feeling was simply, that it would be easier to
merge the arrays like I did in my current approach.

Regards

Dave



--
View this message in context:
http://postgresql.nabble.com/Recursive-merging-of-overlapping-arrays-in-a-column-tp5866560p5866585.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.



Re: Recursive merging of overlapping arrays in a column

From
hari.fuchs@gmail.com
Date:
dave <audiotecture@web.de> writes:

> Sorry, here is the post again in plain text...
>
> i have the following Table:
>
> CREATE TABLE arrays (id SERIAL, arr INT[]);
> INSERT INTO arrays (arr) VALUES (ARRAY[1,3,6,9]);
> INSERT INTO arrays (arr) VALUES (ARRAY[2,4]);
> INSERT INTO arrays (arr) VALUES (ARRAY[3,10,40]);
> INSERT INTO arrays (arr) VALUES (ARRAY[3,18,44]);
> INSERT INTO arrays (arr) VALUES (ARRAY[63,140,420]);
> INSERT INTO arrays (arr) VALUES (ARRAY[42,102,420]);
> INSERT INTO arrays (arr) VALUES (ARRAY[2,7]);
> INSERT INTO arrays (arr) VALUES (ARRAY[1,3,11]);
> INSERT INTO arrays (arr) VALUES (ARRAY[8,12,19]);
>
>
> I want to merge the arrays which have overlapping elements, so that I get
> the result which doesn't contain overlapping arrays anymore:
>
>            arr            
> --------------------------
>  {1,3,6,9,10,11,18,40,44}
>  {2,4,7}
>  {8,12,19}
>  {42,63,102,140,420}

The "intarray" extension (see Appendix F of the fine manual) provides an
"overlaps" operator "&&".




Re: Recursive merging of overlapping arrays in a column

From
dave
Date:
Thanks for your reply. I am already using this operator to match the
overlapping arrays when I cross join the tables. As far as I know intarray
doesn't provide a way to merge all overlapping arrays in a whole column out
of the box.



--
View this message in context:
http://postgresql.nabble.com/Recursive-merging-of-overlapping-arrays-in-a-column-tp5866560p5866646.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.