Thread: Recursive merging of overlapping arrays in a column

Recursive merging of overlapping arrays in a column

From

dave

Date:

20 September 2015, 11:12:39

Hey mailing list,

i have the following Table:



I want to merge the arrays which have overlapping elements, so that I get
the result which doesn't contain overlapping arrays anymore:


I am not an expert in SQL and it took me a long time to come up with this
solution:


Which gives me the result:


Result number 3 is contained in number one and shouldn't be in the output
anymore, because I only want non overlapping arrays in the result.

Another problem I encountered is that the performance of this query seems to
be very bad. I tried running it on a larger table (~400000 arrays) and it is
still running after ~10h.

I would appreciate any input on this problems, so it would be nice if anyone
could give me a hint how to get only the merged arrays without overlaps in
the resultset and maybe how to build a more elegant and efficient query.


Thanks in advance,

Dave



--
View this message in context:
http://postgresql.nabble.com/Recursive-merging-of-overlapping-arrays-in-a-column-tp5866560.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.

Re: Recursive merging of overlapping arrays in a column

From

Rob Sargent

Date:

20 September 2015, 14:01:20

<div class="moz-cite-prefix">On 09/20/2015 05:12 AM, dave wrote:<br /></div><blockquote
cite="mid:1442747556700-5866560.post@n5.nabble.com"type="cite"><pre wrap="">Hey mailing list,
 

i have the following Table:



I want to merge the arrays which have overlapping elements, so that I get
the result which doesn't contain overlapping arrays anymore:


I am not an expert in SQL and it took me a long time to come up with this
solution:


Which gives me the result:


Result number 3 is contained in number one and shouldn't be in the output
anymore, because I only want non overlapping arrays in the result.

Another problem I encountered is that the performance of this query seems to
be very bad. I tried running it on a larger table (~400000 arrays) and it is
still running after ~10h.

I would appreciate any input on this problems, so it would be nice if anyone
could give me a hint how to get only the merged arrays without overlaps in
the resultset and maybe how to build a more elegant and efficient query.


Thanks in advance,

Dave



--
View this message in context: <a class="moz-txt-link-freetext"
href="http://postgresql.nabble.com/Recursive-merging-of-overlapping-arrays-in-a-column-tp5866560.html">http://postgresql.nabble.com/Recursive-merging-of-overlapping-arrays-in-a-column-tp5866560.html</a>
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.


</pre></blockquote><font face="Courier New, Courier, monospace">I don't see your table def, sql nor result output.
Suggestyou re-post with all those as plain text</font><br />

Re: Recursive merging of overlapping arrays in a column

From

dave

Date:

20 September 2015, 15:57:28

Sorry, here is the post again in plain text...

i have the following Table:

CREATE TABLE arrays (id SERIAL, arr INT[]);
INSERT INTO arrays (arr) VALUES (ARRAY[1,3,6,9]);
INSERT INTO arrays (arr) VALUES (ARRAY[2,4]);
INSERT INTO arrays (arr) VALUES (ARRAY[3,10,40]);
INSERT INTO arrays (arr) VALUES (ARRAY[3,18,44]);
INSERT INTO arrays (arr) VALUES (ARRAY[63,140,420]);
INSERT INTO arrays (arr) VALUES (ARRAY[42,102,420]);
INSERT INTO arrays (arr) VALUES (ARRAY[2,7]);
INSERT INTO arrays (arr) VALUES (ARRAY[1,3,11]);
INSERT INTO arrays (arr) VALUES (ARRAY[8,12,19]);


I want to merge the arrays which have overlapping elements, so that I get
the result which doesn't contain overlapping arrays anymore:
          arr            
--------------------------{1,3,6,9,10,11,18,40,44}{2,4,7}{8,12,19}{42,63,102,140,420}


I am not an expert in SQL and it took me a long time to come up with this
solution:

WITH RECURSIVE clusters AS (select DISTINCT uniq(sort_asc(array_cat(a1.arr, a2.arr))) AS arrfrom arrays a1 cross join
arraysa2 where a1.arr && a2.arr ANDleast(a1.id,a2.id) != greatest(a1.id, a2.id)UNIONselect DISTINCT
uniq(sort_asc(array_cat(a1.arr,a2.arr))) AS arrfrom arrays a1 cross join clusters a2 where a1.arr && a2.arr AND a1.arr
!=a2.arr
 
)

SELECT arr FROM (SELECT * FROM (    SELECT DISTINCT ON (arr[1]) arr     FROM clusters    ORDER BY arr[1],
array_length(arr,1) DESC) AS c
 
UNION
SELECT arr FROM arrays WHERE id NOT IN (    SELECT DISTINCT a1.id    FROM arrays a1 CROSS JOIN arrays a2     WHERE
a1.arr&& a2.arr AND    least(a1.id,a2.id) != greatest(a1.id, a2.id))
 
) AS clustertable
ORDER BY arr;


Which gives me the result:
          arr            
--------------------------{1,3,6,9,10,11,18,40,44}{2,4,7}{3,10,18,40,44}{8,12,19}{42,63,102,140,420}
(5 rows)


Result number 3 is contained in number one and shouldn't be in the output
anymore, because I only want non overlapping arrays in the result.

Another problem I encountered is that the performance of this query seems to
be very bad. I tried running it on a larger table (~400000 arrays) and it is
still running after ~10h.

I would appreciate any input on this problems, so it would be nice if anyone
could give me a hint how to get only the merged arrays without overlaps in
the resultset and maybe how to build a more elegant and efficient query.


Thanks in advance,

Dave 



--
View this message in context:
http://postgresql.nabble.com/Recursive-merging-of-overlapping-arrays-in-a-column-tp5866560p5866579.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.

Re: Recursive merging of overlapping arrays in a column

From

"David G. Johnston"

Date:

20 September 2015, 17:43:29

On Sunday, September 20, 2015, dave <audiotecture@web.de> wrote:

Sorry, here is the post again in plain text...

i have the following Table:

CREATE TABLE arrays (id SERIAL, arr INT[]);
INSERT INTO arrays (arr) VALUES (ARRAY[1,3,6,9]);
INSERT INTO arrays (arr) VALUES (ARRAY[2,4]);
INSERT INTO arrays (arr) VALUES (ARRAY[3,10,40]);
INSERT INTO arrays (arr) VALUES (ARRAY[3,18,44]);
INSERT INTO arrays (arr) VALUES (ARRAY[63,140,420]);
INSERT INTO arrays (arr) VALUES (ARRAY[42,102,420]);
INSERT INTO arrays (arr) VALUES (ARRAY[2,7]);
INSERT INTO arrays (arr) VALUES (ARRAY[1,3,11]);
INSERT INTO arrays (arr) VALUES (ARRAY[8,12,19]);

I want to merge the arrays which have overlapping elements, so that I get
the result which doesn't contain overlapping arrays anymore:

arr
--------------------------
{1,3,6,9,10,11,18,40,44}
{2,4,7}
{8,12,19}
{42,63,102,140,420}

I am not an expert in SQL and it took me a long time to come up with this
solution:

WITH RECURSIVE clusters AS (
select DISTINCT uniq(sort_asc(array_cat(a1.arr, a2.arr))) AS arr
from arrays a1 cross join arrays a2
where a1.arr && a2.arr AND
least(a1.id,a2.id) != greatest(a1.id, a2.id)

UNION

select DISTINCT uniq(sort_asc(array_cat(a1.arr, a2.arr))) AS arr
from arrays a1 cross join clusters a2
where a1.arr && a2.arr AND
a1.arr != a2.arr
)

SELECT arr FROM (
SELECT * FROM (
SELECT DISTINCT ON (arr[1]) arr
FROM clusters
ORDER BY arr[1], array_length(arr, 1) DESC
) AS c

UNION

SELECT arr FROM arrays WHERE id NOT IN (
SELECT DISTINCT a1.id
FROM arrays a1 CROSS JOIN arrays a2
WHERE a1.arr && a2.arr AND
least(a1.id,a2.id) != greatest(a1.id, a2.id)
)
) AS clustertable
ORDER BY arr;

Which gives me the result:

arr
--------------------------
{1,3,6,9,10,11,18,40,44}
{2,4,7}
{3,10,18,40,44}
{8,12,19}
{42,63,102,140,420}
(5 rows)

Result number 3 is contained in number one and shouldn't be in the output
anymore, because I only want non overlapping arrays in the result.

Another problem I encountered is that the performance of this query seems to
be very bad. I tried running it on a larger table (~400000 arrays) and it is
still running after ~10h.

I would appreciate any input on this problems, so it would be nice if anyone
could give me a hint how to get only the merged arrays without overlaps in
the resultset and maybe how to build a more elegant and efficient query.

Thanks in advance,

Dave

While I've never actually done this personally...

Convert the data into nodes and edges and import it into a graph database.

David J.

Re: Recursive merging of overlapping arrays in a column

From

dave

Date:

20 September 2015, 19:40:23

Hey David,

funnny that you give me that advice because that's exactly where I came
from.

I tried analyzing a adjacence table with several disjunct graphs without
real success, so I tried concatenating the values in the Table to arrays by
grouping the ids and then aggregating the overlaping arrays to finally get
the whole set of points for each graph as an array.
I have never worked with graph functions before, so I might be missing some
elegant solution here. My feeling was simply, that it would be easier to
merge the arrays like I did in my current approach.

Regards

Dave



--
View this message in context:
http://postgresql.nabble.com/Recursive-merging-of-overlapping-arrays-in-a-column-tp5866560p5866585.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.

Re: Recursive merging of overlapping arrays in a column

From

hari.fuchs@gmail.com

Date:

21 September 2015, 09:46:11

dave <audiotecture@web.de> writes:

> Sorry, here is the post again in plain text...
>
> i have the following Table:
>
> CREATE TABLE arrays (id SERIAL, arr INT[]);
> INSERT INTO arrays (arr) VALUES (ARRAY[1,3,6,9]);
> INSERT INTO arrays (arr) VALUES (ARRAY[2,4]);
> INSERT INTO arrays (arr) VALUES (ARRAY[3,10,40]);
> INSERT INTO arrays (arr) VALUES (ARRAY[3,18,44]);
> INSERT INTO arrays (arr) VALUES (ARRAY[63,140,420]);
> INSERT INTO arrays (arr) VALUES (ARRAY[42,102,420]);
> INSERT INTO arrays (arr) VALUES (ARRAY[2,7]);
> INSERT INTO arrays (arr) VALUES (ARRAY[1,3,11]);
> INSERT INTO arrays (arr) VALUES (ARRAY[8,12,19]);
>
>
> I want to merge the arrays which have overlapping elements, so that I get
> the result which doesn't contain overlapping arrays anymore:
>
>            arr            
> --------------------------
>  {1,3,6,9,10,11,18,40,44}
>  {2,4,7}
>  {8,12,19}
>  {42,63,102,140,420}

The "intarray" extension (see Appendix F of the fine manual) provides an
"overlaps" operator "&&".

Re: Recursive merging of overlapping arrays in a column

From

dave

Date:

21 September 2015, 12:19:21

Thanks for your reply. I am already using this operator to match the
overlapping arrays when I cross join the tables. As far as I know intarray
doesn't provide a way to merge all overlapping arrays in a whole column out
of the box.



--
View this message in context:
http://postgresql.nabble.com/Recursive-merging-of-overlapping-arrays-in-a-column-tp5866560p5866646.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.