Thread: Array intersection

Array intersection

From
Josh Trutwin
Date:
Hi,

Is it possible to find the intersection of two array values?

a = '{1,2,3}'
b = '{2,3,4}'

a intersect b = '{2,3}'

Assume I need to write a pl/pgsql function to do this.

Josh

Re: Array intersection

From
Josh Trutwin
Date:
On Wed, 17 Oct 2007 10:19:43 -0500
Josh Trutwin <josh@trutwins.homeip.net> wrote:

> Hi,
>
> Is it possible to find the intersection of two array values?
>
> a = '{1,2,3}'
> b = '{2,3,4}'
>
> a intersect b = '{2,3}'
>
> Assume I need to write a pl/pgsql function to do this.

nm - I just wrote a function - though curious if this is the most
effecient way:


CREATE OR REPLACE FUNCTION array_has_intersect (array1 INTEGER[],
array2 INTEGER[]) RETURNS BOOLEAN
AS $$
   BEGIN
      IF array1 IS NULL OR array2 IS NULL THEN
         RETURN FALSE;
      END IF;

      FOR i IN ARRAY_LOWER(array1,1) .. ARRAY_UPPER(array1,1) LOOP
         FOR j IN ARRAY_LOWER(array2,1) .. ARRAY_UPPER(array2,1) LOOP
            IF (array1[i] = array2[j]) THEN
               RETURN TRUE;
            END IF;
         END LOOP;
      END LOOP;

      RETURN FALSE;
   END;
$$ LANGUAGE PLPGSQL;



psql=> select array_has_intersect('{1,2,3}', '{1,3,4}');
 array_has_intersect
---------------------
 t

psql=> select array_has_intersect('{1,2,3}', '{21,23,24}');
 array_has_intersect
---------------------
 f


It doesn't return the actual intersection, but could easily be
modified to do so.

Josh

Re: Array intersection

From
Sam Mason
Date:
On Wed, Oct 17, 2007 at 10:37:23AM -0500, Josh Trutwin wrote:
> On Wed, 17 Oct 2007 10:19:43 -0500
> Josh Trutwin <josh@trutwins.homeip.net> wrote:
>
> > Hi,
> >
> > Is it possible to find the intersection of two array values?
> >
> > a = '{1,2,3}'
> > b = '{2,3,4}'
> >
> > a intersect b = '{2,3}'
> >
> > Assume I need to write a pl/pgsql function to do this.
>
> nm - I just wrote a function - though curious if this is the most
> effecient way:
>
>
> CREATE OR REPLACE FUNCTION array_has_intersect (array1 INTEGER[],
> array2 INTEGER[]) RETURNS BOOLEAN
> AS $$
>    BEGIN
>       IF array1 IS NULL OR array2 IS NULL THEN
>          RETURN FALSE;
>       END IF;
>
>       FOR i IN ARRAY_LOWER(array1,1) .. ARRAY_UPPER(array1,1) LOOP
>          FOR j IN ARRAY_LOWER(array2,1) .. ARRAY_UPPER(array2,1) LOOP
>             IF (array1[i] = array2[j]) THEN
>                RETURN TRUE;
>             END IF;
>          END LOOP;
>       END LOOP;
>
>       RETURN FALSE;
>    END;
> $$ LANGUAGE PLPGSQL;

This is only going to work for one-dimensional arrays (I'm not sure how
you would ever fix that with the support postgres has for arrays) but
the (computational) complexity of having an embedded FOR loops looks bad
for performance.  As you can already use '=ANY' syntax to search inside
an array, you may as well use that---it's probably a bit more faster
than the plpgsql work-alike.  Leading to the following implementation of
intersect:

  CREATE OR REPLACE FUNCTION array_intersect (array1 INTEGER[],array2 INTEGER[]) RETURNS INTEGER[]
  AS $$
     DECLARE
       out INTEGER[];
     BEGIN
        IF array1 IS NULL OR array2 IS NULL THEN
           RETURN '[]';
        END IF;
        FOR i IN ARRAY_LOWER(array1,1) .. ARRAY_UPPER(array1,1) LOOP
           IF (array1[i] =ANY (array2)) THEN
              out := array_append(out,array1[i]);
           END IF;
        END LOOP;
        RETURN out;
     END;
  $$ LANGUAGE PLPGSQL;

It seems to work for me, but as a side effect will leave the array sorted
in the same order as the first parameter and with any duplicates it has.
Even more annoyingly if there is no intersection it will return NULL
instead of an empty array, how do I fix this?


  Sam

p.s. plpgsql is badly in need of parametric polymorphism, the silly
anyarray/anyelement it's got at the moment means it's impossible to
declare an array of the same type as the type of its parameters.

Re: Array intersection

From
"Rodrigo De León"
Date:
On 10/17/07, Josh Trutwin <josh@trutwins.homeip.net> wrote:
> nm - I just wrote a function - though curious if this is the most
> effecient way:

If you only want TRUE or FALSE, you can use '&&':

t=# SELECT '{1,2}'::INT[] && '{2,3}'::INT[];
 ?column?
----------
 t
(1 row)

Re: Array intersection

From
Josh Trutwin
Date:
On Wed, 17 Oct 2007 11:12:27 -0500
"Rodrigo De León" <rdeleonp@gmail.com> wrote:

> On 10/17/07, Josh Trutwin <josh@trutwins.homeip.net> wrote:
> > nm - I just wrote a function - though curious if this is the most
> > effecient way:
>
> If you only want TRUE or FALSE, you can use '&&':
>
> t=# SELECT '{1,2}'::INT[] && '{2,3}'::INT[];
>  ?column?
> ----------
>  t
> (1 row)

Is that 8.2 or above?  I'm on 8.1.10:

psql=> SELECT '{1,2}'::INT[] && '{2,3}'::INT[];
ERROR:  operator does not exist: integer[] && integer[]
HINT:  No operator matches the given name and argument type(s). You
may need to add explicit type casts.

Josh

Re: Array intersection

From
Josh Trutwin
Date:
> This is only going to work for one-dimensional arrays (I'm not sure
> how you would ever fix that with the support postgres has for
> arrays) but the (computational) complexity of having an embedded
> FOR loops looks bad for performance.  As you can already use '=ANY'
> syntax to search inside an array, you may as well use that---it's
> probably a bit more faster than the plpgsql work-alike.  Leading to
> the following implementation of intersect:

Thanks for the pointers.

> It seems to work for me, but as a side effect will leave the array
> sorted in the same order as the first parameter and with any
> duplicates it has. Even more annoyingly if there is no intersection
> it will return NULL instead of an empty array, how do I fix this?

It's inelegant, but I just did this:

CREATE OR REPLACE FUNCTION array_intersect (array1 INTEGER[],array2
INTEGER[]) RETURNS INTEGER[]
AS $$
   DECLARE
     out INTEGER[];
     return_empty BOOLEAN := TRUE;
   BEGIN
      IF array1 IS NULL OR array2 IS NULL THEN
         RETURN '[]';
      END IF;
      FOR i IN ARRAY_LOWER(array1,1) .. ARRAY_UPPER(array1,1) LOOP
         IF (array1[i] =ANY (array2)) THEN
            out := array_append(out,array1[i]);
            return_empty := FALSE;
         END IF;
      END LOOP;

      IF return_empty THEN
         RETURN '{}';
      END IF;

      RETURN out;
   END;
$$ LANGUAGE PLPGSQL;

psql=> select array_intersect('{1,2,3}', '{6,7,8}');
 array_intersect
-----------------
 {}
(1 row)

Josh


Re: Array intersection

From
Josh Trutwin
Date:
On Wed, 17 Oct 2007 17:08:06 +0100
Sam Mason <sam@samason.me.uk> wrote:

>   CREATE OR REPLACE FUNCTION array_intersect (array1
> INTEGER[],array2 INTEGER[]) RETURNS INTEGER[] AS $$
>      DECLARE
>        out INTEGER[];
>      BEGIN
>         IF array1 IS NULL OR array2 IS NULL THEN
>            RETURN '[]';
>         END IF;
>         FOR i IN ARRAY_LOWER(array1,1) .. ARRAY_UPPER(array1,1) LOOP
>            IF (array1[i] =ANY (array2)) THEN
>               out := array_append(out,array1[i]);
>            END IF;
>         END LOOP;
>         RETURN out;
>      END;
>   $$ LANGUAGE PLPGSQL;

Is the =ANY specific to PG 8.2 or higher?  On 8.1.10:

psql=> select array_intersect('{1,2,3}', '{1,2,6,7,8}');
 array_intersect
-----------------

(1 row)

Also, I think the first return needs to be:

RETURN '{}';

instead of:

RETURN '[]';

Josh

Re: Array intersection

From
"Merlin Moncure"
Date:
On 10/17/07, Josh Trutwin <josh@trutwins.homeip.net> wrote:
> On Wed, 17 Oct 2007 11:12:27 -0500
> "Rodrigo De León" <rdeleonp@gmail.com> wrote:
>
> > On 10/17/07, Josh Trutwin <josh@trutwins.homeip.net> wrote:
> > > nm - I just wrote a function - though curious if this is the most
> > > effecient way:
> >
> > If you only want TRUE or FALSE, you can use '&&':
> >
> > t=# SELECT '{1,2}'::INT[] && '{2,3}'::INT[];
> >  ?column?
> > ----------
> >  t
> > (1 row)
>
> Is that 8.2 or above?  I'm on 8.1.10:

introduced in 8.2

merlin

Re: Array intersection

From
Josh Trutwin
Date:
On Wed, 17 Oct 2007 12:33:13 -0400
"Merlin Moncure" <mmoncure@gmail.com> wrote:

> On 10/17/07, Josh Trutwin <josh@trutwins.homeip.net> wrote:
> > On Wed, 17 Oct 2007 11:12:27 -0500
> > "Rodrigo De León" <rdeleonp@gmail.com> wrote:
> >
> > > On 10/17/07, Josh Trutwin <josh@trutwins.homeip.net> wrote:
> > > > nm - I just wrote a function - though curious if this is the
> > > > most effecient way:
> > >
> > > If you only want TRUE or FALSE, you can use '&&':
> > >
> > > t=# SELECT '{1,2}'::INT[] && '{2,3}'::INT[];
> > >  ?column?
> > > ----------
> > >  t
> > > (1 row)
> >
> > Is that 8.2 or above?  I'm on 8.1.10:
>
> introduced in 8.2

http://www.postgresql.org/docs/8.2/static/functions-array.html

Thanks - yet another reason to upgrade - waiting till 8.3 tho.

Josh

Re: Array intersection

From
Sam Mason
Date:
On Wed, Oct 17, 2007 at 11:28:31AM -0500, Josh Trutwin wrote:
> It's inelegant, but I just did this:

>       IF return_empty THEN
>          RETURN '{}';
>       END IF;

humm, why didn't that seem to work for me... ah well.  Next version
fixes a problem that I didn't test of the inputs being NULL. '[]' isn't
semantically correct, let alone the correct syntax.  I've also fixed the
problem with items in the first array appearing twice.  Try:

  CREATE OR REPLACE FUNCTION array_intersect (array1 INTEGER[],array2 INTEGER[])  RETURNS INTEGER[]
  AS $$
     DECLARE
       out INTEGER[];
     BEGIN
        out := '{}'::INTEGER[];
        IF array1 IS NULL OR array2 IS NULL THEN
          RETURN NULL;
        END IF;
        FOR i IN array_lower(array1,1) .. array_upper(array1,1) LOOP
          IF (array1[i] = ANY (array2)) AND NOT array1[i] = ANY (out) THEN
            out := array_append(out,array1[i]);
          END IF;
        END LOOP;
        RETURN out;
     END;
  $$ LANGUAGE PLPGSQL;


  Sam

Re: Array intersection

From
Sam Mason
Date:
On Wed, Oct 17, 2007 at 11:31:51AM -0500, Josh Trutwin wrote:
> Is the =ANY specific to PG 8.2 or higher?  On 8.1.10:

It appears (according to [1] and [2]) that you may be able to just
remove the '=' to get it working with 8.1.x.  i.e.

  v ANY ('{1,2}')

is correct in 8.1.x but in 8.2.x it's

  v = ANY ('{1,2}')

I don't have an 8.1 box so I can't test unfortunatly.


  Sam

 [1] http://www.postgresql.org/docs/8.1/static/functions-comparisons.html#AEN13394
 [2] http://www.postgresql.org/docs/8.2/static/functions-comparisons.html#AEN14115

Re: Array intersection

From
David Fetter
Date:
On Wed, Oct 17, 2007 at 10:19:43AM -0500, Josh Trutwin wrote:
> Hi,
>
> Is it possible to find the intersection of two array values?
>
> a = '{1,2,3}'
> b = '{2,3,4}'
>
> a intersect b = '{2,3}'
>
> Assume I need to write a pl/pgsql function to do this.

You can use an SQL function, which has the advantage of always being
available :)

It's up to you to ensure that the input arrays have the same type.

Cheers,
David.

CREATE OR REPLACE FUNCTION array_intersect(ANYARRAY, ANYARRAY)
RETURNS ANYARRAY
LANGUAGE SQL
AS $$
SELECT ARRAY(
    SELECT $1[i] AS "the_intersection"
    FROM generate_series(
        array_lower($1,1),
        array_upper($1,1)
    ) AS i
    INTERSECT
    SELECT $2[j] AS "the_intersection"
    FROM generate_series(
        array_lower($2,1),
        array_upper($2,1)
    ) AS j
);
$$;
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Array intersection

From
Josh Trutwin
Date:
On Wed, 17 Oct 2007 17:42:21 +0100
Sam Mason <sam@samason.me.uk> wrote:

<snip>

>   CREATE OR REPLACE FUNCTION array_intersect (array1
> INTEGER[],array2 INTEGER[])  RETURNS INTEGER[] AS $$
>      DECLARE
>        out INTEGER[];
>      BEGIN
>         out := '{}'::INTEGER[];
>         IF array1 IS NULL OR array2 IS NULL THEN
>           RETURN NULL;
>         END IF;
>         FOR i IN array_lower(array1,1) .. array_upper(array1,1) LOOP
>           IF (array1[i] = ANY (array2)) AND NOT array1[i] = ANY
> (out) THEN out := array_append(out,array1[i]);
>           END IF;
>         END LOOP;
>         RETURN out;
>      END;
>   $$ LANGUAGE PLPGSQL;

Works like a champ on 8.1.

Thanks!

Did you see David Fetter's reply to the original post?  He has
an interesting alternative.

Josh

Re: Array intersection

From
Josh Trutwin
Date:
On Wed, 17 Oct 2007 17:49:01 +0100
Sam Mason <sam@samason.me.uk> wrote:

> On Wed, Oct 17, 2007 at 11:31:51AM -0500, Josh Trutwin wrote:
> > Is the =ANY specific to PG 8.2 or higher?  On 8.1.10:
>
> It appears (according to [1] and [2]) that you may be able to just
> remove the '=' to get it working with 8.1.x.  i.e.
>
>   v ANY ('{1,2}')
>
> is correct in 8.1.x but in 8.2.x it's
>
>   v = ANY ('{1,2}')
>
> I don't have an 8.1 box so I can't test unfortunatly.

= ANY appears to work in 8.1 - at least your function from the other
reply worked just fine on 8.1 and I have used = ANY in some other
queries.

Thanks,

Josh

Re: Array intersection

From
Josh Trutwin
Date:
On Wed, 17 Oct 2007 10:04:21 -0700
David Fetter <david@fetter.org> wrote:

<snip>

> CREATE OR REPLACE FUNCTION array_intersect(ANYARRAY, ANYARRAY)
> RETURNS ANYARRAY
> LANGUAGE SQL
> AS $$
> SELECT ARRAY(
>     SELECT $1[i] AS "the_intersection"
>     FROM generate_series(
>         array_lower($1,1),
>         array_upper($1,1)
>     ) AS i
>     INTERSECT
>     SELECT $2[j] AS "the_intersection"
>     FROM generate_series(
>         array_lower($2,1),
>         array_upper($2,1)
>     ) AS j
> );
> $$;

Doesn't appear to work on 8.1:

psql=> select array_intersect('{1,2,3}', '{2,3,4}');
ERROR:  could not determine anyarray/anyelement type because input
has type "unknown"

Hopefully upgrading to 8.3 soon so I'll have another look then.

Thanks,

Josh

Re: Array intersection

From
David Fetter
Date:
On Wed, Oct 17, 2007 at 01:06:35PM -0500, Josh Trutwin wrote:
> On Wed, 17 Oct 2007 10:04:21 -0700
> David Fetter <david@fetter.org> wrote:
>
> <snip>
>
> > CREATE OR REPLACE FUNCTION array_intersect(ANYARRAY, ANYARRAY)
> > RETURNS ANYARRAY
> > LANGUAGE SQL
> > AS $$
> > SELECT ARRAY(
> >     SELECT $1[i] AS "the_intersection"
> >     FROM generate_series(
> >         array_lower($1,1),
> >         array_upper($1,1)
> >     ) AS i
> >     INTERSECT
> >     SELECT $2[j] AS "the_intersection"
> >     FROM generate_series(
> >         array_lower($2,1),
> >         array_upper($2,1)
> >     ) AS j
> > );
> > $$;
>
> Doesn't appear to work on 8.1:
>
> psql=> select array_intersect('{1,2,3}', '{2,3,4}');
> ERROR:  could not determine anyarray/anyelement type because input
> has type "unknown"

As mentioned in the "release notes" ;), it's up to you to ensure that
the arrays have the right types, so you'd write the above as:

SELECT array_intersect('{1,2,3}'::int[], '{2,3,4}'::int[]);

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Array intersection

From
Josh Trutwin
Date:
On Wed, 17 Oct 2007 11:26:05 -0700
David Fetter <david@fetter.org> wrote:

> > Doesn't appear to work on 8.1:
> >
> > psql=> select array_intersect('{1,2,3}', '{2,3,4}');
> > ERROR:  could not determine anyarray/anyelement type because input
> > has type "unknown"
>
> As mentioned in the "release notes" ;), it's up to you to ensure
> that the arrays have the right types, so you'd write the above as:
>
> SELECT array_intersect('{1,2,3}'::int[], '{2,3,4}'::int[]);

Ah - I saw that but I figured that just meant you couldn't do
something like this:

SELECT array_intersect('{1,2,3}', '{"a","b","c"}');

So basically with this array intersect function you always need to
cast the inputs.

Josh

Re: Array intersection

From
"Pavel Stehule"
Date:
> > <snip>
> >
> > > CREATE OR REPLACE FUNCTION array_intersect(ANYARRAY, ANYARRAY)
> > > RETURNS ANYARRAY
> > > LANGUAGE SQL
> > > AS $$
> > > SELECT ARRAY(
> > >     SELECT $1[i] AS "the_intersection"
> > >     FROM generate_series(
> > >         array_lower($1,1),
> > >         array_upper($1,1)
> > >     ) AS i
> > >     INTERSECT
> > >     SELECT $2[j] AS "the_intersection"
> > >     FROM generate_series(
> > >         array_lower($2,1),
> > >         array_upper($2,1)
> > >     ) AS j
> > > );
> > > $$;
> >

nice :)

Maybe we can add function "generate_iterator"

CREATE OR REPLACE FUNCTION generate_iterator(ANYARRAY)
RETURNS SETOF integer AS
$$
   SELECT i
      FROM generate_series(array_lower($1, 1), array_upper($1,1)) AS i
$$ LANGUAGE SQL;

then
CREATE OR REPLACE FUNCTION array_intersect(ANYARRAY, ANYARRAY)
RETURNS ANYARRAY
LANGUAGE SQL AS
$$
  SELECT ARRAY(
                           SELECT $1[i]
                               FROM genarate_iterator($1) i
                           INTERSECT
                           SELECT $2[j]
                               FROM generate_iterator($2) j
                           )
$$ ;

Regars
Pavel

Re: Array intersection

From
Sam Mason
Date:
On Wed, Oct 17, 2007 at 01:00:48PM -0500, Josh Trutwin wrote:
> Works like a champ on 8.1.
>
> Thanks!

That's OK, it was a good learning experience.

> Did you see David Fetter's reply to the original post?  He has
> an interesting alternative.

I did, it's much more elegant.  I've never seen generate_series used
like that, it's a very nice way of doing things.  If your arrays are
ever more than a few elements long his is probably going to be much
faster.  Generally for less than 6 or 7 elements it's better doing the
naive thing, not sure if that applies here though.


  Sam