Thread: New array functions

New array functions

From
Greg Stark
Date:
So where are the new array functions and syntaces documented?

Specifically I want to know how to replace my int_array_aggregate(int) and
int_array_enum(_int) calls. And how to replace my "arr *= n" calls too. 

I think these are supposed be "ALL my_array" and "n = ANY myarray" or
something like that?

-- 
greg



Re: New array functions

From
Joe Conway
Date:
Greg Stark wrote:
> So where are the new array functions and syntaces documented?

Mainly here:
http://developer.postgresql.org/docs/postgres/sql-expressions.html#SQL-SYNTAX-ARRAY-CONSTRUCTORS
http://developer.postgresql.org/docs/postgres/arrays.html
http://developer.postgresql.org/docs/postgres/functions-array.html
http://developer.postgresql.org/docs/postgres/functions-comparisons.html#AEN12154

> Specifically I want to know how to replace my int_array_aggregate(int) and
> int_array_enum(_int) calls.

I have no idea what those are -- are they from contrib?

You can create an aggregate to turn arbitrary datatype elements into 
arrays like this:

CREATE AGGREGATE array_aggregate
(  BASETYPE = anyelement,  SFUNC = array_append,  STYPE = anyarray,  INITCOND = '{}'
);

-- silly example, but what the heck ;-)
regression=# select attrelid, array_aggregate(attnum) from pg_attribute 
where attnum > 0 and attnum < 5 group by attrelid limit 3; attrelid | array_aggregate
----------+-----------------    16639 | {1}    16638 | {1}    17022 | {1,2,3,4}
(3 rows)


If int_array_enum() is supposed to take '{1,2,3}' and produce three 
rows, that function was proposed but rejected. Subsequently Peter 
Eisentraut pointed out a SQL99 syntax that does this, but I did not get 
it done for 7.4. Perhaps for 7.5.

> And how to replace my "arr *= n" calls too. 

See:
http://developer.postgresql.org/docs/postgres/functions-comparisons.html#AEN12154

regression=# SELECT g.grosysid, g.groname, s.usesysid, s.usename FROM 
pg_shadow s, pg_group g WHERE s.usesysid = any (g.grolist); grosysid | groname | usesysid | usename
----------+---------+----------+----------      102 | admins  |        1 | postgres      100 | grp1    |      100 |
user1     101 | grp2    |      100 | user1      100 | grp1    |      101 | user2      100 | grp1    |      102 | user3
   101 | grp2    |      102 | user3      102 | admins  |      103 | john
 
(7 rows)


HTH,

Joe



Re: New array functions

From
Greg Stark
Date:
Joe Conway <mail@joeconway.com> writes:

> Greg Stark wrote:
> > So where are the new array functions and syntaces documented?
> 
> Mainly here:
> http://developer.postgresql.org/docs/postgres/arrays.html

excellent. thank you.

> > Specifically I want to know how to replace my int_array_aggregate(int) and
> > int_array_enum(_int) calls.
> 
> I have no idea what those are -- are they from contrib?

indeed in contrib/intagg

> You can create an aggregate to turn arbitrary datatype elements into arrays
> like this:
> 
> CREATE AGGREGATE array_aggregate
> (
>    BASETYPE = anyelement,
>    SFUNC = array_append,
>    STYPE = anyarray,
>    INITCOND = '{}'
> );

Hm, perhaps there should be a standard name for this, rather than have
everyone's code do their own thing.

> If int_array_enum() is supposed to take '{1,2,3}' and produce three rows, that
> function was proposed but rejected. Subsequently Peter Eisentraut pointed out a
> SQL99 syntax that does this, but I did not get it done for 7.4. Perhaps for 7.5.

That's exactly what it does. Hm, I guess I misinterpreted that post. Hm I have
some work to do.

Thanks. And thanks a LOT for doing the work, it'll makes a big difference
and make arrays much more practical to use.

-- 
greg



Re: New array functions

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> Greg Stark wrote:
>> And how to replace my "arr *= n" calls too. 

> See:
> http://developer.postgresql.org/docs/postgres/functions-comparisons.html#AEN12154

That reminds me --- contrib/array is definitely obsolete now, and there
may be parts of the other contrib array- and aggregate-related modules
that are obsoleted by Joe's recent work.

I would like to kill contrib/array for 7.4, because it's one of the
GPL'd contrib modules that I was tasked to get rid of some time ago.

What I'm thinking of doing is removing the code, and replacing the
README with a note explaining how to convert contrib/array queries to
use the new mainstream syntaxes.  That will give contrib/array users
a clue what they're supposed to do.  In a release or three the README
could go away too.

Comments, objections?

Also, does anyone want to look for possible dead code in intagg and
so on?
        regards, tom lane


Re: New array functions

From
Joe Conway
Date:
Tom Lane wrote:
> What I'm thinking of doing is removing the code, and replacing the
> README with a note explaining how to convert contrib/array queries to
> use the new mainstream syntaxes.  That will give contrib/array users
> a clue what they're supposed to do.  In a release or three the README
> could go away too.

I have no objection to removing it now, but previously I think you 
agreed with Bruce's comment that we should leave it intact (but 
deprecated) for 7.4, and remove in 7.5.

> Also, does anyone want to look for possible dead code in intagg and
> so on?

I did a quick review back in July.

IIRC, intagg could be functionally replaced with the aggregate 
definition that I posted, except that intagg is probably a fair bit 
better performance (I didn't actually test), in that it accumulates the 
array in backend memory and just pushes pointers around as int4's. I've 
thought that a safer implementation would be needed to fold it into the 
backend (maybe using hashes keyed with the pointer?), but in any case 
that's a 7.5 thing.

Also IIRC there were some functions in intarray that overlap the new 
backend functionality, but much of it (i.e. using GIST to index into 
large arrays) is not.

I'll try to review them again and make a recommendation in the next 
couple of days, but it might be a stretch because I'm trying to tie up 
lots of loose ends in preparation for a trip next week.

Joe



Re: New array functions

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> I have no objection to removing it now, but previously I think you 
> agreed with Bruce's comment that we should leave it intact (but 
> deprecated) for 7.4, and remove in 7.5.

Did we discuss this already?  I'd forgotten.

In any case, the module isn't visibly deprecated at the moment.
If the idea is to avoid blindsiding its users, then we definitely
must mark it as slated for removal, and provide some docs about
how to replace it.
        regards, tom lane


Re: New array functions

From
Joe Conway
Date:
Hannu Krosing wrote:
> Any idea of performance - is this array_aggregate(anyelement) faster,
> slower or about same than int_array_aggregate(int) ?

I haven't tested, but I'd guess for an array of any significant length 
int_array_aggregate() is faster (see my other post). That's one of the 
reasons I haven't advocated deprecating intagg yet.

Joe



Re: New array functions

From
Hannu Krosing
Date:
Joe Conway kirjutas N, 28.08.2003 kell 21:15:
> Greg Stark wrote:
> > Specifically I want to know how to replace my int_array_aggregate(int) and
> > int_array_enum(_int) calls.
> 
> I have no idea what those are -- are they from contrib?
> 
> You can create an aggregate to turn arbitrary datatype elements into 
> arrays like this:
> 
> CREATE AGGREGATE array_aggregate
> (
>    BASETYPE = anyelement,
>    SFUNC = array_append,
>    STYPE = anyarray,
>    INITCOND = '{}'
> );

Any idea of performance - is this array_aggregate(anyelement) faster,
slower or about same than int_array_aggregate(int) ?

> If int_array_enum() is supposed to take '{1,2,3}' and produce three 
> rows, that function was proposed but rejected. Subsequently Peter 
> Eisentraut pointed out a SQL99 syntax that does this, but I did not get 
> it done for 7.4. Perhaps for 7.5.

So we got to keep intagg at least until 7.5 ...

-----------
Hannu



Re: New array functions

From
Greg Stark
Date:

> See:
> http://developer.postgresql.org/docs/postgres/functions-comparisons.html#AEN12154
> 
> regression=# SELECT g.grosysid, g.groname, s.usesysid, s.usename FROM pg_shadow
> s, pg_group g WHERE s.usesysid = any (g.grolist);


These forms below are all equivalent, right? 

If so ideally they would all be converted to an equivalent form and therefore
produce the same plan. I guess I'm wishing for a pony though. But I think
currently I'm stuck with the worst of these and I don't see any way of
escaping to the better plans.

Incidentally, "HashAggregate"?! Based on the earlier discussion on this I
would have expected that line to read "Materialize"


slo=> explain select * from store_location where store_location_id in (1,2,3);
                  QUERY PLAN                                                               
 

---------------------------------------------------------------------------------------------------------------------------------------Index
Scanusing store_location_pkey, store_location_pkey, store_location_pkey on store_location  (cost=0.00..17.74 rows=3
width=523) Index Cond: ((store_location_id = 1) OR (store_location_id = 2) OR (store_location_id = 3))
 
(2 rows)



slo=> explain select * from store_location where store_location_id in (select 1 union all select 2 union all select 3);
                                         QUERY PLAN                                            
 
--------------------------------------------------------------------------------------------------Nested Loop
(cost=0.10..17.86rows=3 width=523)  ->  HashAggregate  (cost=0.10..0.10 rows=3 width=4)        ->  Subquery Scan
"IN_subquery" (cost=0.00..0.09 rows=3 width=4)              ->  Append  (cost=0.00..0.06 rows=3 width=0)
   ->  Subquery Scan "*SELECT* 1"  (cost=0.00..0.02 rows=1 width=0)                          ->  Result
(cost=0.00..0.01rows=1 width=0)                    ->  Subquery Scan "*SELECT* 2"  (cost=0.00..0.02 rows=1 width=0)
                    ->  Result  (cost=0.00..0.01 rows=1 width=0)                    ->  Subquery Scan "*SELECT* 3"
(cost=0.00..0.02rows=1 width=0)                          ->  Result  (cost=0.00..0.01 rows=1 width=0)  ->  Index Scan
usingstore_location_pkey on store_location  (cost=0.00..5.91 rows=1 width=523)        Index Cond:
(store_location.store_location_id= "outer"."?column?")
 
(12 rows)



slo=> explain select * from store_location where store_location_id = any (array[1,2,3]);
QUERYPLAN                              
 
---------------------------------------------------------------------Seq Scan on store_location  (cost=0.00..825.75
rows=5954width=523)  Filter: (store_location_id = ANY ('{1,2,3}'::integer[]))
 
(2 rows)




-- 
greg



Re: New array functions

From
Joe Conway
Date:
Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
> 
>>I have no objection to removing it now, but previously I think you 
>>agreed with Bruce's comment that we should leave it intact (but 
>>deprecated) for 7.4, and remove in 7.5.
> 
> Did we discuss this already?  I'd forgotten.
> 
> In any case, the module isn't visibly deprecated at the moment.
> If the idea is to avoid blindsiding its users, then we definitely
> must mark it as slated for removal, and provide some docs about
> how to replace it.
> 

I can't find it in the archives for some reason, but here was the exchange:

Tom Lane wrote:> Bruce Momjian <pgman@candle.pha.pa.us> writes:>>>Joe Conway wrote:>>>>>I do agree that it makes
contrib/arrayunnecessary. I was going to>>>suggest we remove that if this was committed.>>>Good idea.>> We could do
that,but it might be more friendly to just mark it as> deprecated for one release cycle before zapping it.  That'd
give>people who use it some time to convert over.
 

So I guess since it was actually you who objected, you have the right to 
change your mind ;-)

Joe



Re: New array functions

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Incidentally, "HashAggregate"?! Based on the earlier discussion on this I
> would have expected that line to read "Materialize"

It's using a grouped aggregation node to implement a UNIQUE filter, so
that it can replace the "WHERE foo IN (subselect)" by a straight join.
Of course in this case the uniqueness filter is a waste of time, but
in general the planner can't be expected to know that.
        regards, tom lane