Thread: array_agg and array_accum (patch)

array_agg and array_accum (patch)

From
Jeff Davis
Date:
Here is a patch to support two aggregate functions:

1) ARRAY_AGG() -- SQL 2008 standard behavior, returns NULL on no input,
and skips NULL inputs.

2) ARRAY_ACCUM() -- Returns empty array on no input, and includes NULL
inputs.

These accumulate the result in a memory context that lives across calls
to the state function, so it's reasonably efficient. On my old laptop it
takes about 5s to generate an array of 1M elements -- not great, but at
least it's linear.

Although array_agg is the standard behavior, array_accum is important
because otherwise you always lose the NULLs, and that's difficult to
work around even with COALESCE.

I added them as new native functions because ARRAY_AGG is in the
standard, but if others think they should live elsewhere that's fine. I
think that they are generally pretty useful functions for people using
arrays.

This patch is contributed by Truviso.

Regards,
    Jeff Davis


Attachment

Re: array_agg and array_accum (patch)

From
"Ian Caulfield"
Date:
I think array_agg also keeps nulls - although the draft standard I
have seems to contradict itself about this...

Ian

2008/10/26 Jeff Davis <pgsql@j-davis.com>:
> Here is a patch to support two aggregate functions:
>
> 1) ARRAY_AGG() -- SQL 2008 standard behavior, returns NULL on no input,
> and skips NULL inputs.
>
> 2) ARRAY_ACCUM() -- Returns empty array on no input, and includes NULL
> inputs.
>
> These accumulate the result in a memory context that lives across calls
> to the state function, so it's reasonably efficient. On my old laptop it
> takes about 5s to generate an array of 1M elements -- not great, but at
> least it's linear.
>
> Although array_agg is the standard behavior, array_accum is important
> because otherwise you always lose the NULLs, and that's difficult to
> work around even with COALESCE.
>
> I added them as new native functions because ARRAY_AGG is in the
> standard, but if others think they should live elsewhere that's fine. I
> think that they are generally pretty useful functions for people using
> arrays.
>
> This patch is contributed by Truviso.
>
> Regards,
>        Jeff Davis
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: array_agg and array_accum (patch)

From
Stephen Frost
Date:
Jeff,

* Jeff Davis (pgsql@j-davis.com) wrote:
> 2) ARRAY_ACCUM() -- Returns empty array on no input, and includes NULL
> inputs.

Excellent..  I added it the easy way (from the online docs), but that's
clearly not at all efficient and was going to try and fix it, for psql
to use with the column-level privs patch.  It'd be great to use a more
efficient mechanism like this, and to remove adding it from my patch
(erm, it's only one line currently, but it would have been alot more
eventually :).

I havn't actually reviewed the code at all, but +1 in general to adding
this to core.
Thanks,
    Stephen

Re: array_agg and array_accum (patch)

From
Tom Lane
Date:
"Ian Caulfield" <ian.caulfield@gmail.com> writes:
> I think array_agg also keeps nulls - although the draft standard I
> have seems to contradict itself about this...

The SQL:2008 draft I have says, in 10.9 <aggregate function> general
rule 8g
NOTE 267 - Null values are not eliminated when computing <arrayaggregate function>. This, plus the optional <sort
specificationlist>,sets <array aggregate function> apart from <general setfunction>s.
 

So that seems to make it perfectly clear that nulls aren't eliminated,
and furthermore to be an intentional override of any other part of the
spec that you might think says nulls should be eliminated.  If you have
an argument to read it otherwise, please say exactly what.

A larger objection to Jeff's draft patch is that it doesn't implement
the <sort specification list>.  I'm entirely happy about not doing that
--- the current SQL committee's willingness to invent random new syntax
and nonorthogonal behavior for every function they can think of will be
the death of SQL yet --- but it's something that we at least need to
document the workaround for.
        regards, tom lane


Re: array_agg and array_accum (patch)

From
Jeff Davis
Date:
On Sun, 2008-10-26 at 23:02 -0400, Tom Lane wrote:
> "Ian Caulfield" <ian.caulfield@gmail.com> writes:
> > I think array_agg also keeps nulls - although the draft standard I
> > have seems to contradict itself about this...
> 
> The SQL:2008 draft I have says, in 10.9 <aggregate function> general
> rule 8g
> 

I apologize, clearly I skimmed the standard too fast.

I'll review the standard, allow array_agg() to collect NULLs, perhaps
drop array_accum (if the only difference is the return value on no
input), and resubmit with docs.

Regards,Jeff Davis



Re: array_agg and array_accum (patch)

From
Peter Eisentraut
Date:
Tom Lane wrote:
> A larger objection to Jeff's draft patch is that it doesn't implement
> the <sort specification list>.  I'm entirely happy about not doing that
> --- the current SQL committee's willingness to invent random new syntax
> and nonorthogonal behavior for every function they can think of will be
> the death of SQL yet --- but it's something that we at least need to
> document the workaround for.

How else will you tell an aggregate function whose result depends on the 
input order which order you want?  The only aggregates defined in the 
standard where this matters are array_agg, array_accum, and xmlagg, but 
it would also be useful in other cases such as a text concatenation 
aggregate function or an aggregate function to calculate the correlation 
(or whatever alternative metric we come up with).  Given that the 
standard does not provide for user-defined aggregates, I think the way 
it's specified is perfectly OK.

Without a way to control the order, how useful are these array 
aggregates really?


Re: array_agg and array_accum (patch)

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> How else will you tell an aggregate function whose result depends on the 
> input order which order you want?

You feed it from a subquery that has ORDER BY.  The only reason the spec
needs this kluge is their insistence that ORDER BY not be used in
subqueries.  Now I grant that there's some basis in relational theory
for that stand, but they certainly feel free to ignore academic notions
of cleanliness everywhere else in the spec.
        regards, tom lane


Re: array_agg and array_accum (patch)

From
Jeff Davis
Date:
On Mon, 2008-10-27 at 18:47 +0200, Peter Eisentraut wrote:
> How else will you tell an aggregate function whose result depends on the 
> input order which order you want?  The only aggregates defined in the 
> standard where this matters are array_agg, array_accum, and xmlagg, but 

I don't see array_accum() in the standard, I wrote it just as an
alternative to array_agg() because I thought array_agg() ignored NULLs.

Regards,Jeff Davis



Re: array_agg and array_accum (patch)

From
"Robert Haas"
Date:
It's worth noting that this is the third version of this idea that has
been submitted.  Ian Caulfield submitted a patch to add this, and so
did I.  Someone should probably look at all three of them and compare.

...Robert

On Mon, Oct 27, 2008 at 1:41 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Mon, 2008-10-27 at 18:47 +0200, Peter Eisentraut wrote:
>> How else will you tell an aggregate function whose result depends on the
>> input order which order you want?  The only aggregates defined in the
>> standard where this matters are array_agg, array_accum, and xmlagg, but
>
> I don't see array_accum() in the standard, I wrote it just as an
> alternative to array_agg() because I thought array_agg() ignored NULLs.
>
> Regards,
>        Jeff Davis
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: array_agg and array_accum (patch)

From
Jeff Davis
Date:
On Wed, 2008-10-29 at 00:08 -0400, Robert Haas wrote:
> It's worth noting that this is the third version of this idea that has
> been submitted.  Ian Caulfield submitted a patch to add this, and so
> did I.  Someone should probably look at all three of them and compare.
> 

If we include a function named array_accum(), it should return an empty
array on no input to match the function in the docs:

http://www.postgresql.org/docs/8.3/static/xaggr.html

Your function returns NULL on no input, which seems more like
array_agg().

Aside from that, I'm pretty open to anything, as long as one of our
patches makes it. If there are potential problems with the standard
(where we don't want to implement a violation), we should just do
array_accum(). If not, we might as well do the standard array_agg(),
perhaps without the ORDER BY clause.

We could also do both, because it is a little annoying to coalesce the
result or array_agg().

Regards,Jeff Davis




Re: array_agg and array_accum (patch)

From
Sam Mason
Date:
On Thu, Oct 30, 2008 at 11:19:15PM -0700, Jeff Davis wrote:
> If there are potential problems with the standard
> (where we don't want to implement a violation), we should just do
> array_accum(). If not, we might as well do the standard array_agg(),
> perhaps without the ORDER BY clause.

I've wanted an array_sort() function before; having this functionality
as a separate function also seems considerably prettier than some ad
hoc grammar, it also generalizes nicely to cases where the array isn't
coming from an aggregate.

 Sam


Re: array_agg and array_accum (patch)

From
Jeff Davis
Date:
Here's an updated patch for just array_accum() with some simple docs. If
I should document this in more places, let me know.

I decided not to include array_agg() in this patch because it doesn't
support the standard's ORDER BY clause.

My reasoning is that, if someone is using the standard array_agg() and
porting to PostgreSQL, there's a fairly high chance they would be using
the ORDER BY clause as well, due to the nature of the function. If not,
and they really want a function called array_agg that returns NULL on no
input, it would be trivial to just create an extra final function that
behaved that way and create a new aggregate.

However, if people want me to put array_agg() back in I will.

Regards,
    Jeff Davis

Attachment

Re: array_agg and array_accum (patch)

From
Peter Eisentraut
Date:
Jeff Davis wrote:
> Here's an updated patch for just array_accum() with some simple docs.

I have committed a "best of Robert Haas and Jeff Davis" array_agg() 
function with standard SQL semantics.  I believe this gives the best 
consistency with other aggregate functions for the no-input-rows case. 
If some other behavior is wanted, it is a coalesce() away, as the 
documentation states.


Re: array_agg and array_accum (patch)

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Jeff Davis wrote:
>> Here's an updated patch for just array_accum() with some simple docs.

> I have committed a "best of Robert Haas and Jeff Davis" array_agg() 
> function with standard SQL semantics.  I believe this gives the best 
> consistency with other aggregate functions for the no-input-rows case. 
> If some other behavior is wanted, it is a coalesce() away, as the 
> documentation states.

The original reason for doing this work, I think, was to let us
deprecate contrib/intagg, or at least turn it into a thin wrapper
around core-provided functionality.  We now have the means to do that
for int_array_aggregate, but what about int_array_enum?

It seems that it would be an easy evening's work to implement unnest(),
at least in the simple formfunction unnest(anyarray) returns setof anyelement

without the WITH ORDINALITY syntax proposed by the SQL spec.  Then
we could eliminate intagg's C code altogether, and just write it
as a couple of wrapper functions.

Does anyone have an objection to doing that?
        regards, tom lane


Re: array_agg and array_accum (patch)

From
Alvaro Herrera
Date:
Tom Lane wrote:

> The original reason for doing this work, I think, was to let us
> deprecate contrib/intagg, or at least turn it into a thin wrapper
> around core-provided functionality.  We now have the means to do that
> for int_array_aggregate, but what about int_array_enum?

And what about the patch to add sorted-array versions of some routines?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: array_agg and array_accum (patch)

From
"Robert Haas"
Date:
> It seems that it would be an easy evening's work to implement unnest(),
> at least in the simple form
>        function unnest(anyarray) returns setof anyelement
>
> without the WITH ORDINALITY syntax proposed by the SQL spec.  Then
> we could eliminate intagg's C code altogether, and just write it
> as a couple of wrapper functions.
>
> Does anyone have an objection to doing that?

I think it would be great.

...Robert


Re: array_agg and array_accum (patch)

From
"Robert Haas"
Date:
It looks to me like section 34.10 of the docs might benefit from some
sort of update in light of this patch, since the builtin array_agg now
does the same thing as the proposed user-defined array_accum, only
better.  Presumably we should either pick a different example, or add
a note that a builtin is available that does the same thing more
efficiently.

...Robert

On Thu, Nov 13, 2008 at 11:07 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
> Jeff Davis wrote:
>>
>> Here's an updated patch for just array_accum() with some simple docs.
>
> I have committed a "best of Robert Haas and Jeff Davis" array_agg() function
> with standard SQL semantics.  I believe this gives the best consistency with
> other aggregate functions for the no-input-rows case. If some other behavior
> is wanted, it is a coalesce() away, as the documentation states.
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: array_agg and array_accum (patch)

From
Tom Lane
Date:
"Robert Haas" <robertmhaas@gmail.com> writes:
> It looks to me like section 34.10 of the docs might benefit from some
> sort of update in light of this patch, since the builtin array_agg now
> does the same thing as the proposed user-defined array_accum, only
> better.  Presumably we should either pick a different example, or add
> a note that a builtin is available that does the same thing more
> efficiently.

I did the latter.  If you can think of an equally plausible and short
example of a polymorphic aggregate, we could certainly replace the
example instead ...
        regards, tom lane


Re: array_agg and array_accum (patch)

From
"Merlin Moncure"
Date:
On Thu, Nov 20, 2008 at 4:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Robert Haas" <robertmhaas@gmail.com> writes:
>> It looks to me like section 34.10 of the docs might benefit from some
>> sort of update in light of this patch, since the builtin array_agg now
>> does the same thing as the proposed user-defined array_accum, only
>> better.  Presumably we should either pick a different example, or add
>> a note that a builtin is available that does the same thing more
>> efficiently.
>
> I did the latter.  If you can think of an equally plausible and short
> example of a polymorphic aggregate, we could certainly replace the
> example instead ...

maybe show how to stack arrays?
see: http://www.nabble.com/text-array-accumulate-to-multidimensional-text-array-td20098591.html

IMO a good example of how you can write aggregates in a language other
than C, which is IMO an underutilized technique.

CREATE OR REPLACE FUNCTION array_cat1(p1 anyarray, p2 anyarray)
RETURNS anyarray AS
$$ SELECT CASE WHEN $1 =  '{}'::text[] THEN ARRAY[p2] ELSE ARRAY_CAT(p1, p2) END;
$$ LANGUAGE sql;

CREATE AGGREGATE array_stack(anyarray)
(  sfunc = array_cat1,  stype = anyarray,  initcond = '{}'
);

merlin