Thread: Array intersection
Hi, Is it possible to find the intersection of two array values? a = '{1,2,3}' b = '{2,3,4}' a intersect b = '{2,3}' Assume I need to write a pl/pgsql function to do this. Josh
On Wed, 17 Oct 2007 10:19:43 -0500 Josh Trutwin <josh@trutwins.homeip.net> wrote: > Hi, > > Is it possible to find the intersection of two array values? > > a = '{1,2,3}' > b = '{2,3,4}' > > a intersect b = '{2,3}' > > Assume I need to write a pl/pgsql function to do this. nm - I just wrote a function - though curious if this is the most effecient way: CREATE OR REPLACE FUNCTION array_has_intersect (array1 INTEGER[], array2 INTEGER[]) RETURNS BOOLEAN AS $$ BEGIN IF array1 IS NULL OR array2 IS NULL THEN RETURN FALSE; END IF; FOR i IN ARRAY_LOWER(array1,1) .. ARRAY_UPPER(array1,1) LOOP FOR j IN ARRAY_LOWER(array2,1) .. ARRAY_UPPER(array2,1) LOOP IF (array1[i] = array2[j]) THEN RETURN TRUE; END IF; END LOOP; END LOOP; RETURN FALSE; END; $$ LANGUAGE PLPGSQL; psql=> select array_has_intersect('{1,2,3}', '{1,3,4}'); array_has_intersect --------------------- t psql=> select array_has_intersect('{1,2,3}', '{21,23,24}'); array_has_intersect --------------------- f It doesn't return the actual intersection, but could easily be modified to do so. Josh
On Wed, Oct 17, 2007 at 10:37:23AM -0500, Josh Trutwin wrote: > On Wed, 17 Oct 2007 10:19:43 -0500 > Josh Trutwin <josh@trutwins.homeip.net> wrote: > > > Hi, > > > > Is it possible to find the intersection of two array values? > > > > a = '{1,2,3}' > > b = '{2,3,4}' > > > > a intersect b = '{2,3}' > > > > Assume I need to write a pl/pgsql function to do this. > > nm - I just wrote a function - though curious if this is the most > effecient way: > > > CREATE OR REPLACE FUNCTION array_has_intersect (array1 INTEGER[], > array2 INTEGER[]) RETURNS BOOLEAN > AS $$ > BEGIN > IF array1 IS NULL OR array2 IS NULL THEN > RETURN FALSE; > END IF; > > FOR i IN ARRAY_LOWER(array1,1) .. ARRAY_UPPER(array1,1) LOOP > FOR j IN ARRAY_LOWER(array2,1) .. ARRAY_UPPER(array2,1) LOOP > IF (array1[i] = array2[j]) THEN > RETURN TRUE; > END IF; > END LOOP; > END LOOP; > > RETURN FALSE; > END; > $$ LANGUAGE PLPGSQL; This is only going to work for one-dimensional arrays (I'm not sure how you would ever fix that with the support postgres has for arrays) but the (computational) complexity of having an embedded FOR loops looks bad for performance. As you can already use '=ANY' syntax to search inside an array, you may as well use that---it's probably a bit more faster than the plpgsql work-alike. Leading to the following implementation of intersect: CREATE OR REPLACE FUNCTION array_intersect (array1 INTEGER[],array2 INTEGER[]) RETURNS INTEGER[] AS $$ DECLARE out INTEGER[]; BEGIN IF array1 IS NULL OR array2 IS NULL THEN RETURN '[]'; END IF; FOR i IN ARRAY_LOWER(array1,1) .. ARRAY_UPPER(array1,1) LOOP IF (array1[i] =ANY (array2)) THEN out := array_append(out,array1[i]); END IF; END LOOP; RETURN out; END; $$ LANGUAGE PLPGSQL; It seems to work for me, but as a side effect will leave the array sorted in the same order as the first parameter and with any duplicates it has. Even more annoyingly if there is no intersection it will return NULL instead of an empty array, how do I fix this? Sam p.s. plpgsql is badly in need of parametric polymorphism, the silly anyarray/anyelement it's got at the moment means it's impossible to declare an array of the same type as the type of its parameters.
On 10/17/07, Josh Trutwin <josh@trutwins.homeip.net> wrote: > nm - I just wrote a function - though curious if this is the most > effecient way: If you only want TRUE or FALSE, you can use '&&': t=# SELECT '{1,2}'::INT[] && '{2,3}'::INT[]; ?column? ---------- t (1 row)
On Wed, 17 Oct 2007 11:12:27 -0500 "Rodrigo De León" <rdeleonp@gmail.com> wrote: > On 10/17/07, Josh Trutwin <josh@trutwins.homeip.net> wrote: > > nm - I just wrote a function - though curious if this is the most > > effecient way: > > If you only want TRUE or FALSE, you can use '&&': > > t=# SELECT '{1,2}'::INT[] && '{2,3}'::INT[]; > ?column? > ---------- > t > (1 row) Is that 8.2 or above? I'm on 8.1.10: psql=> SELECT '{1,2}'::INT[] && '{2,3}'::INT[]; ERROR: operator does not exist: integer[] && integer[] HINT: No operator matches the given name and argument type(s). You may need to add explicit type casts. Josh
> This is only going to work for one-dimensional arrays (I'm not sure > how you would ever fix that with the support postgres has for > arrays) but the (computational) complexity of having an embedded > FOR loops looks bad for performance. As you can already use '=ANY' > syntax to search inside an array, you may as well use that---it's > probably a bit more faster than the plpgsql work-alike. Leading to > the following implementation of intersect: Thanks for the pointers. > It seems to work for me, but as a side effect will leave the array > sorted in the same order as the first parameter and with any > duplicates it has. Even more annoyingly if there is no intersection > it will return NULL instead of an empty array, how do I fix this? It's inelegant, but I just did this: CREATE OR REPLACE FUNCTION array_intersect (array1 INTEGER[],array2 INTEGER[]) RETURNS INTEGER[] AS $$ DECLARE out INTEGER[]; return_empty BOOLEAN := TRUE; BEGIN IF array1 IS NULL OR array2 IS NULL THEN RETURN '[]'; END IF; FOR i IN ARRAY_LOWER(array1,1) .. ARRAY_UPPER(array1,1) LOOP IF (array1[i] =ANY (array2)) THEN out := array_append(out,array1[i]); return_empty := FALSE; END IF; END LOOP; IF return_empty THEN RETURN '{}'; END IF; RETURN out; END; $$ LANGUAGE PLPGSQL; psql=> select array_intersect('{1,2,3}', '{6,7,8}'); array_intersect ----------------- {} (1 row) Josh
On Wed, 17 Oct 2007 17:08:06 +0100 Sam Mason <sam@samason.me.uk> wrote: > CREATE OR REPLACE FUNCTION array_intersect (array1 > INTEGER[],array2 INTEGER[]) RETURNS INTEGER[] AS $$ > DECLARE > out INTEGER[]; > BEGIN > IF array1 IS NULL OR array2 IS NULL THEN > RETURN '[]'; > END IF; > FOR i IN ARRAY_LOWER(array1,1) .. ARRAY_UPPER(array1,1) LOOP > IF (array1[i] =ANY (array2)) THEN > out := array_append(out,array1[i]); > END IF; > END LOOP; > RETURN out; > END; > $$ LANGUAGE PLPGSQL; Is the =ANY specific to PG 8.2 or higher? On 8.1.10: psql=> select array_intersect('{1,2,3}', '{1,2,6,7,8}'); array_intersect ----------------- (1 row) Also, I think the first return needs to be: RETURN '{}'; instead of: RETURN '[]'; Josh
On 10/17/07, Josh Trutwin <josh@trutwins.homeip.net> wrote: > On Wed, 17 Oct 2007 11:12:27 -0500 > "Rodrigo De León" <rdeleonp@gmail.com> wrote: > > > On 10/17/07, Josh Trutwin <josh@trutwins.homeip.net> wrote: > > > nm - I just wrote a function - though curious if this is the most > > > effecient way: > > > > If you only want TRUE or FALSE, you can use '&&': > > > > t=# SELECT '{1,2}'::INT[] && '{2,3}'::INT[]; > > ?column? > > ---------- > > t > > (1 row) > > Is that 8.2 or above? I'm on 8.1.10: introduced in 8.2 merlin
On Wed, 17 Oct 2007 12:33:13 -0400 "Merlin Moncure" <mmoncure@gmail.com> wrote: > On 10/17/07, Josh Trutwin <josh@trutwins.homeip.net> wrote: > > On Wed, 17 Oct 2007 11:12:27 -0500 > > "Rodrigo De León" <rdeleonp@gmail.com> wrote: > > > > > On 10/17/07, Josh Trutwin <josh@trutwins.homeip.net> wrote: > > > > nm - I just wrote a function - though curious if this is the > > > > most effecient way: > > > > > > If you only want TRUE or FALSE, you can use '&&': > > > > > > t=# SELECT '{1,2}'::INT[] && '{2,3}'::INT[]; > > > ?column? > > > ---------- > > > t > > > (1 row) > > > > Is that 8.2 or above? I'm on 8.1.10: > > introduced in 8.2 http://www.postgresql.org/docs/8.2/static/functions-array.html Thanks - yet another reason to upgrade - waiting till 8.3 tho. Josh
On Wed, Oct 17, 2007 at 11:28:31AM -0500, Josh Trutwin wrote: > It's inelegant, but I just did this: > IF return_empty THEN > RETURN '{}'; > END IF; humm, why didn't that seem to work for me... ah well. Next version fixes a problem that I didn't test of the inputs being NULL. '[]' isn't semantically correct, let alone the correct syntax. I've also fixed the problem with items in the first array appearing twice. Try: CREATE OR REPLACE FUNCTION array_intersect (array1 INTEGER[],array2 INTEGER[]) RETURNS INTEGER[] AS $$ DECLARE out INTEGER[]; BEGIN out := '{}'::INTEGER[]; IF array1 IS NULL OR array2 IS NULL THEN RETURN NULL; END IF; FOR i IN array_lower(array1,1) .. array_upper(array1,1) LOOP IF (array1[i] = ANY (array2)) AND NOT array1[i] = ANY (out) THEN out := array_append(out,array1[i]); END IF; END LOOP; RETURN out; END; $$ LANGUAGE PLPGSQL; Sam
On Wed, Oct 17, 2007 at 11:31:51AM -0500, Josh Trutwin wrote: > Is the =ANY specific to PG 8.2 or higher? On 8.1.10: It appears (according to [1] and [2]) that you may be able to just remove the '=' to get it working with 8.1.x. i.e. v ANY ('{1,2}') is correct in 8.1.x but in 8.2.x it's v = ANY ('{1,2}') I don't have an 8.1 box so I can't test unfortunatly. Sam [1] http://www.postgresql.org/docs/8.1/static/functions-comparisons.html#AEN13394 [2] http://www.postgresql.org/docs/8.2/static/functions-comparisons.html#AEN14115
On Wed, Oct 17, 2007 at 10:19:43AM -0500, Josh Trutwin wrote: > Hi, > > Is it possible to find the intersection of two array values? > > a = '{1,2,3}' > b = '{2,3,4}' > > a intersect b = '{2,3}' > > Assume I need to write a pl/pgsql function to do this. You can use an SQL function, which has the advantage of always being available :) It's up to you to ensure that the input arrays have the same type. Cheers, David. CREATE OR REPLACE FUNCTION array_intersect(ANYARRAY, ANYARRAY) RETURNS ANYARRAY LANGUAGE SQL AS $$ SELECT ARRAY( SELECT $1[i] AS "the_intersection" FROM generate_series( array_lower($1,1), array_upper($1,1) ) AS i INTERSECT SELECT $2[j] AS "the_intersection" FROM generate_series( array_lower($2,1), array_upper($2,1) ) AS j ); $$; -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Wed, 17 Oct 2007 17:42:21 +0100 Sam Mason <sam@samason.me.uk> wrote: <snip> > CREATE OR REPLACE FUNCTION array_intersect (array1 > INTEGER[],array2 INTEGER[]) RETURNS INTEGER[] AS $$ > DECLARE > out INTEGER[]; > BEGIN > out := '{}'::INTEGER[]; > IF array1 IS NULL OR array2 IS NULL THEN > RETURN NULL; > END IF; > FOR i IN array_lower(array1,1) .. array_upper(array1,1) LOOP > IF (array1[i] = ANY (array2)) AND NOT array1[i] = ANY > (out) THEN out := array_append(out,array1[i]); > END IF; > END LOOP; > RETURN out; > END; > $$ LANGUAGE PLPGSQL; Works like a champ on 8.1. Thanks! Did you see David Fetter's reply to the original post? He has an interesting alternative. Josh
On Wed, 17 Oct 2007 17:49:01 +0100 Sam Mason <sam@samason.me.uk> wrote: > On Wed, Oct 17, 2007 at 11:31:51AM -0500, Josh Trutwin wrote: > > Is the =ANY specific to PG 8.2 or higher? On 8.1.10: > > It appears (according to [1] and [2]) that you may be able to just > remove the '=' to get it working with 8.1.x. i.e. > > v ANY ('{1,2}') > > is correct in 8.1.x but in 8.2.x it's > > v = ANY ('{1,2}') > > I don't have an 8.1 box so I can't test unfortunatly. = ANY appears to work in 8.1 - at least your function from the other reply worked just fine on 8.1 and I have used = ANY in some other queries. Thanks, Josh
On Wed, 17 Oct 2007 10:04:21 -0700 David Fetter <david@fetter.org> wrote: <snip> > CREATE OR REPLACE FUNCTION array_intersect(ANYARRAY, ANYARRAY) > RETURNS ANYARRAY > LANGUAGE SQL > AS $$ > SELECT ARRAY( > SELECT $1[i] AS "the_intersection" > FROM generate_series( > array_lower($1,1), > array_upper($1,1) > ) AS i > INTERSECT > SELECT $2[j] AS "the_intersection" > FROM generate_series( > array_lower($2,1), > array_upper($2,1) > ) AS j > ); > $$; Doesn't appear to work on 8.1: psql=> select array_intersect('{1,2,3}', '{2,3,4}'); ERROR: could not determine anyarray/anyelement type because input has type "unknown" Hopefully upgrading to 8.3 soon so I'll have another look then. Thanks, Josh
On Wed, Oct 17, 2007 at 01:06:35PM -0500, Josh Trutwin wrote: > On Wed, 17 Oct 2007 10:04:21 -0700 > David Fetter <david@fetter.org> wrote: > > <snip> > > > CREATE OR REPLACE FUNCTION array_intersect(ANYARRAY, ANYARRAY) > > RETURNS ANYARRAY > > LANGUAGE SQL > > AS $$ > > SELECT ARRAY( > > SELECT $1[i] AS "the_intersection" > > FROM generate_series( > > array_lower($1,1), > > array_upper($1,1) > > ) AS i > > INTERSECT > > SELECT $2[j] AS "the_intersection" > > FROM generate_series( > > array_lower($2,1), > > array_upper($2,1) > > ) AS j > > ); > > $$; > > Doesn't appear to work on 8.1: > > psql=> select array_intersect('{1,2,3}', '{2,3,4}'); > ERROR: could not determine anyarray/anyelement type because input > has type "unknown" As mentioned in the "release notes" ;), it's up to you to ensure that the arrays have the right types, so you'd write the above as: SELECT array_intersect('{1,2,3}'::int[], '{2,3,4}'::int[]); Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Wed, 17 Oct 2007 11:26:05 -0700 David Fetter <david@fetter.org> wrote: > > Doesn't appear to work on 8.1: > > > > psql=> select array_intersect('{1,2,3}', '{2,3,4}'); > > ERROR: could not determine anyarray/anyelement type because input > > has type "unknown" > > As mentioned in the "release notes" ;), it's up to you to ensure > that the arrays have the right types, so you'd write the above as: > > SELECT array_intersect('{1,2,3}'::int[], '{2,3,4}'::int[]); Ah - I saw that but I figured that just meant you couldn't do something like this: SELECT array_intersect('{1,2,3}', '{"a","b","c"}'); So basically with this array intersect function you always need to cast the inputs. Josh
> > <snip> > > > > > CREATE OR REPLACE FUNCTION array_intersect(ANYARRAY, ANYARRAY) > > > RETURNS ANYARRAY > > > LANGUAGE SQL > > > AS $$ > > > SELECT ARRAY( > > > SELECT $1[i] AS "the_intersection" > > > FROM generate_series( > > > array_lower($1,1), > > > array_upper($1,1) > > > ) AS i > > > INTERSECT > > > SELECT $2[j] AS "the_intersection" > > > FROM generate_series( > > > array_lower($2,1), > > > array_upper($2,1) > > > ) AS j > > > ); > > > $$; > > nice :) Maybe we can add function "generate_iterator" CREATE OR REPLACE FUNCTION generate_iterator(ANYARRAY) RETURNS SETOF integer AS $$ SELECT i FROM generate_series(array_lower($1, 1), array_upper($1,1)) AS i $$ LANGUAGE SQL; then CREATE OR REPLACE FUNCTION array_intersect(ANYARRAY, ANYARRAY) RETURNS ANYARRAY LANGUAGE SQL AS $$ SELECT ARRAY( SELECT $1[i] FROM genarate_iterator($1) i INTERSECT SELECT $2[j] FROM generate_iterator($2) j ) $$ ; Regars Pavel
On Wed, Oct 17, 2007 at 01:00:48PM -0500, Josh Trutwin wrote: > Works like a champ on 8.1. > > Thanks! That's OK, it was a good learning experience. > Did you see David Fetter's reply to the original post? He has > an interesting alternative. I did, it's much more elegant. I've never seen generate_series used like that, it's a very nice way of doing things. If your arrays are ever more than a few elements long his is probably going to be much faster. Generally for less than 6 or 7 elements it's better doing the naive thing, not sure if that applies here though. Sam