Re: RFC: array_agg() per SQL:200n - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: RFC: array_agg() per SQL:200n
Date
Msg-id 200803250258.m2P2wF909601@momjian.us
Whole thread Raw
In response to RFC: array_agg() per SQL:200n  (Neil Conway <neilc@samurai.com>)
List pgsql-hackers
Add to TODO:

* Add SQL-standard array_agg() and unnest() array functions
 http://archives.postgresql.org/pgsql-hackers/2008-01/msg01017.php


---------------------------------------------------------------------------

Neil Conway wrote:
> I recently noticed that SQL:200n[1] defines a new aggregate function,
> array_agg(). The relevant portions of the spec are:
> 
> p. 66: "If ARRAY_AGG is specified, then an array value with one element
> formed from the <value expression> evaluated for each row that
> qualifies."
> 
> p. 556: <array aggregate function> ::=
>   ARRAY_AGG <left paren>
>       <value expression> [ ORDER BY <sort specification list> ]
>   <right paren>
> 
> p. 564 discusses the required behavior. The result of array_agg() is an
> array with one element per input value, sorted according to the optional
> ORDER BY clause. NULL input values are included in the array, and the
> result for an empty group is NULL, not an empty array. Note that per
> page 66, I'd expect array values in the input to array_agg() not to be
> flattened.
> 
> I'd like to implement array_agg() for 8.4. In the past, we've talked
> about moving the array_accum() example aggregate into the backend[2].
> Now that there's SQL-standard syntax, that's another reason to do it --
> I think this is clearly useful functionality.
> 
> The previous discussion got tied up in how to expose the aggregate's
> transition value to the type system. The problem is that the aggregate
> wants to use a transition value that is not a SQL-level type, to allow
> efficient array append operations. Various solutions were mooted about,
> typically involving a pass-by-val pseudotype used to hold a pointer to
> the C struct holding the transition state.
> 
> AFAIK the conclusion reached by the previous thread was that to be type
> safe, you'd need one distinct pseudotype per aggregate function, along
> with some way to let the planner distinguish this class of pseudotypes
> from other types (in order to apply the heuristic that functions like
> these are likely to consume more memory). You could identify this class
> by an additional column in pg_type, but I think we'd need a lot of
> machinery to do this properly (e.g. to allow these types to be created
> via SQL). I wonder if this isn't over-engineering: the simple approach
> originally followed by Stephen Frost was to declare the transition value
> as, say, int8, and just disallow the transition and final functions from
> being called outside an aggregate context. AFAIK this would be safe,
> although of course it is ugly.
> 
> To parse the ORDER BY clause, we'd need to special-case array_agg() in
> the grammar, which is a bit unfortunate. Implementation-wise, because
> there is no way to lazily evaluate an array expression, I don't think
> there's much to be gained by using the tuplesort infrastructure -- we'll
> need to materialize the entire array into memory when the final function
> is called anyway. Therefore, a simpler approach might be to just
> accumulate inputs in the transition function as usual, and then qsort()
> them in the final function. We could also have the planner arrange for
> the sort to be skipped if it knows that the input to the aggregate will
> be delivered in a compatible ordering.
> 
> Comments welcome.
> 
> -Neil
> 
> [1] http://www.wiscorp.com/SQLStandards.html ; apparently SQL:200n is
> likely to become SQL:2008 without further changes.
> 
> [2] http://archives.postgresql.org/pgsql-hackers/2006-10/msg00362.php
>     http://archives.postgresql.org/pgsql-patches/2006-10/msg00059.php
>     http://archives.postgresql.org/pgsql-hackers/2006-10/msg00683.php
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
> 
>                http://archives.postgresql.org

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: pl/pgsql Plan Invalidation and search_path
Next
From: Greg Smith
Date:
Subject: Re: [GSoC] (Is it OK to choose items without % mark in theToDoList) && (is it an acceptable idea to build index on Flash Disk)