Thread: array_agg and array_accum (patch)
Here is a patch to support two aggregate functions: 1) ARRAY_AGG() -- SQL 2008 standard behavior, returns NULL on no input, and skips NULL inputs. 2) ARRAY_ACCUM() -- Returns empty array on no input, and includes NULL inputs. These accumulate the result in a memory context that lives across calls to the state function, so it's reasonably efficient. On my old laptop it takes about 5s to generate an array of 1M elements -- not great, but at least it's linear. Although array_agg is the standard behavior, array_accum is important because otherwise you always lose the NULLs, and that's difficult to work around even with COALESCE. I added them as new native functions because ARRAY_AGG is in the standard, but if others think they should live elsewhere that's fine. I think that they are generally pretty useful functions for people using arrays. This patch is contributed by Truviso. Regards, Jeff Davis
Attachment
I think array_agg also keeps nulls - although the draft standard I have seems to contradict itself about this... Ian 2008/10/26 Jeff Davis <pgsql@j-davis.com>: > Here is a patch to support two aggregate functions: > > 1) ARRAY_AGG() -- SQL 2008 standard behavior, returns NULL on no input, > and skips NULL inputs. > > 2) ARRAY_ACCUM() -- Returns empty array on no input, and includes NULL > inputs. > > These accumulate the result in a memory context that lives across calls > to the state function, so it's reasonably efficient. On my old laptop it > takes about 5s to generate an array of 1M elements -- not great, but at > least it's linear. > > Although array_agg is the standard behavior, array_accum is important > because otherwise you always lose the NULLs, and that's difficult to > work around even with COALESCE. > > I added them as new native functions because ARRAY_AGG is in the > standard, but if others think they should live elsewhere that's fine. I > think that they are generally pretty useful functions for people using > arrays. > > This patch is contributed by Truviso. > > Regards, > Jeff Davis > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > >
Jeff, * Jeff Davis (pgsql@j-davis.com) wrote: > 2) ARRAY_ACCUM() -- Returns empty array on no input, and includes NULL > inputs. Excellent.. I added it the easy way (from the online docs), but that's clearly not at all efficient and was going to try and fix it, for psql to use with the column-level privs patch. It'd be great to use a more efficient mechanism like this, and to remove adding it from my patch (erm, it's only one line currently, but it would have been alot more eventually :). I havn't actually reviewed the code at all, but +1 in general to adding this to core. Thanks, Stephen
"Ian Caulfield" <ian.caulfield@gmail.com> writes: > I think array_agg also keeps nulls - although the draft standard I > have seems to contradict itself about this... The SQL:2008 draft I have says, in 10.9 <aggregate function> general rule 8g NOTE 267 - Null values are not eliminated when computing <arrayaggregate function>. This, plus the optional <sort specificationlist>,sets <array aggregate function> apart from <general setfunction>s. So that seems to make it perfectly clear that nulls aren't eliminated, and furthermore to be an intentional override of any other part of the spec that you might think says nulls should be eliminated. If you have an argument to read it otherwise, please say exactly what. A larger objection to Jeff's draft patch is that it doesn't implement the <sort specification list>. I'm entirely happy about not doing that --- the current SQL committee's willingness to invent random new syntax and nonorthogonal behavior for every function they can think of will be the death of SQL yet --- but it's something that we at least need to document the workaround for. regards, tom lane
On Sun, 2008-10-26 at 23:02 -0400, Tom Lane wrote: > "Ian Caulfield" <ian.caulfield@gmail.com> writes: > > I think array_agg also keeps nulls - although the draft standard I > > have seems to contradict itself about this... > > The SQL:2008 draft I have says, in 10.9 <aggregate function> general > rule 8g > I apologize, clearly I skimmed the standard too fast. I'll review the standard, allow array_agg() to collect NULLs, perhaps drop array_accum (if the only difference is the return value on no input), and resubmit with docs. Regards,Jeff Davis
Tom Lane wrote: > A larger objection to Jeff's draft patch is that it doesn't implement > the <sort specification list>. I'm entirely happy about not doing that > --- the current SQL committee's willingness to invent random new syntax > and nonorthogonal behavior for every function they can think of will be > the death of SQL yet --- but it's something that we at least need to > document the workaround for. How else will you tell an aggregate function whose result depends on the input order which order you want? The only aggregates defined in the standard where this matters are array_agg, array_accum, and xmlagg, but it would also be useful in other cases such as a text concatenation aggregate function or an aggregate function to calculate the correlation (or whatever alternative metric we come up with). Given that the standard does not provide for user-defined aggregates, I think the way it's specified is perfectly OK. Without a way to control the order, how useful are these array aggregates really?
Peter Eisentraut <peter_e@gmx.net> writes: > How else will you tell an aggregate function whose result depends on the > input order which order you want? You feed it from a subquery that has ORDER BY. The only reason the spec needs this kluge is their insistence that ORDER BY not be used in subqueries. Now I grant that there's some basis in relational theory for that stand, but they certainly feel free to ignore academic notions of cleanliness everywhere else in the spec. regards, tom lane
On Mon, 2008-10-27 at 18:47 +0200, Peter Eisentraut wrote: > How else will you tell an aggregate function whose result depends on the > input order which order you want? The only aggregates defined in the > standard where this matters are array_agg, array_accum, and xmlagg, but I don't see array_accum() in the standard, I wrote it just as an alternative to array_agg() because I thought array_agg() ignored NULLs. Regards,Jeff Davis
It's worth noting that this is the third version of this idea that has been submitted. Ian Caulfield submitted a patch to add this, and so did I. Someone should probably look at all three of them and compare. ...Robert On Mon, Oct 27, 2008 at 1:41 PM, Jeff Davis <pgsql@j-davis.com> wrote: > On Mon, 2008-10-27 at 18:47 +0200, Peter Eisentraut wrote: >> How else will you tell an aggregate function whose result depends on the >> input order which order you want? The only aggregates defined in the >> standard where this matters are array_agg, array_accum, and xmlagg, but > > I don't see array_accum() in the standard, I wrote it just as an > alternative to array_agg() because I thought array_agg() ignored NULLs. > > Regards, > Jeff Davis > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
On Wed, 2008-10-29 at 00:08 -0400, Robert Haas wrote: > It's worth noting that this is the third version of this idea that has > been submitted. Ian Caulfield submitted a patch to add this, and so > did I. Someone should probably look at all three of them and compare. > If we include a function named array_accum(), it should return an empty array on no input to match the function in the docs: http://www.postgresql.org/docs/8.3/static/xaggr.html Your function returns NULL on no input, which seems more like array_agg(). Aside from that, I'm pretty open to anything, as long as one of our patches makes it. If there are potential problems with the standard (where we don't want to implement a violation), we should just do array_accum(). If not, we might as well do the standard array_agg(), perhaps without the ORDER BY clause. We could also do both, because it is a little annoying to coalesce the result or array_agg(). Regards,Jeff Davis
On Thu, Oct 30, 2008 at 11:19:15PM -0700, Jeff Davis wrote: > If there are potential problems with the standard > (where we don't want to implement a violation), we should just do > array_accum(). If not, we might as well do the standard array_agg(), > perhaps without the ORDER BY clause. I've wanted an array_sort() function before; having this functionality as a separate function also seems considerably prettier than some ad hoc grammar, it also generalizes nicely to cases where the array isn't coming from an aggregate. Sam
Here's an updated patch for just array_accum() with some simple docs. If I should document this in more places, let me know. I decided not to include array_agg() in this patch because it doesn't support the standard's ORDER BY clause. My reasoning is that, if someone is using the standard array_agg() and porting to PostgreSQL, there's a fairly high chance they would be using the ORDER BY clause as well, due to the nature of the function. If not, and they really want a function called array_agg that returns NULL on no input, it would be trivial to just create an extra final function that behaved that way and create a new aggregate. However, if people want me to put array_agg() back in I will. Regards, Jeff Davis
Attachment
Jeff Davis wrote: > Here's an updated patch for just array_accum() with some simple docs. I have committed a "best of Robert Haas and Jeff Davis" array_agg() function with standard SQL semantics. I believe this gives the best consistency with other aggregate functions for the no-input-rows case. If some other behavior is wanted, it is a coalesce() away, as the documentation states.
Peter Eisentraut <peter_e@gmx.net> writes: > Jeff Davis wrote: >> Here's an updated patch for just array_accum() with some simple docs. > I have committed a "best of Robert Haas and Jeff Davis" array_agg() > function with standard SQL semantics. I believe this gives the best > consistency with other aggregate functions for the no-input-rows case. > If some other behavior is wanted, it is a coalesce() away, as the > documentation states. The original reason for doing this work, I think, was to let us deprecate contrib/intagg, or at least turn it into a thin wrapper around core-provided functionality. We now have the means to do that for int_array_aggregate, but what about int_array_enum? It seems that it would be an easy evening's work to implement unnest(), at least in the simple formfunction unnest(anyarray) returns setof anyelement without the WITH ORDINALITY syntax proposed by the SQL spec. Then we could eliminate intagg's C code altogether, and just write it as a couple of wrapper functions. Does anyone have an objection to doing that? regards, tom lane
Tom Lane wrote: > The original reason for doing this work, I think, was to let us > deprecate contrib/intagg, or at least turn it into a thin wrapper > around core-provided functionality. We now have the means to do that > for int_array_aggregate, but what about int_array_enum? And what about the patch to add sorted-array versions of some routines? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
> It seems that it would be an easy evening's work to implement unnest(), > at least in the simple form > function unnest(anyarray) returns setof anyelement > > without the WITH ORDINALITY syntax proposed by the SQL spec. Then > we could eliminate intagg's C code altogether, and just write it > as a couple of wrapper functions. > > Does anyone have an objection to doing that? I think it would be great. ...Robert
It looks to me like section 34.10 of the docs might benefit from some sort of update in light of this patch, since the builtin array_agg now does the same thing as the proposed user-defined array_accum, only better. Presumably we should either pick a different example, or add a note that a builtin is available that does the same thing more efficiently. ...Robert On Thu, Nov 13, 2008 at 11:07 AM, Peter Eisentraut <peter_e@gmx.net> wrote: > Jeff Davis wrote: >> >> Here's an updated patch for just array_accum() with some simple docs. > > I have committed a "best of Robert Haas and Jeff Davis" array_agg() function > with standard SQL semantics. I believe this gives the best consistency with > other aggregate functions for the no-input-rows case. If some other behavior > is wanted, it is a coalesce() away, as the documentation states. > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
"Robert Haas" <robertmhaas@gmail.com> writes: > It looks to me like section 34.10 of the docs might benefit from some > sort of update in light of this patch, since the builtin array_agg now > does the same thing as the proposed user-defined array_accum, only > better. Presumably we should either pick a different example, or add > a note that a builtin is available that does the same thing more > efficiently. I did the latter. If you can think of an equally plausible and short example of a polymorphic aggregate, we could certainly replace the example instead ... regards, tom lane
On Thu, Nov 20, 2008 at 4:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Robert Haas" <robertmhaas@gmail.com> writes: >> It looks to me like section 34.10 of the docs might benefit from some >> sort of update in light of this patch, since the builtin array_agg now >> does the same thing as the proposed user-defined array_accum, only >> better. Presumably we should either pick a different example, or add >> a note that a builtin is available that does the same thing more >> efficiently. > > I did the latter. If you can think of an equally plausible and short > example of a polymorphic aggregate, we could certainly replace the > example instead ... maybe show how to stack arrays? see: http://www.nabble.com/text-array-accumulate-to-multidimensional-text-array-td20098591.html IMO a good example of how you can write aggregates in a language other than C, which is IMO an underutilized technique. CREATE OR REPLACE FUNCTION array_cat1(p1 anyarray, p2 anyarray) RETURNS anyarray AS $$ SELECT CASE WHEN $1 = '{}'::text[] THEN ARRAY[p2] ELSE ARRAY_CAT(p1, p2) END; $$ LANGUAGE sql; CREATE AGGREGATE array_stack(anyarray) ( sfunc = array_cat1, stype = anyarray, initcond = '{}' ); merlin