Thread: Aggregates, group, and order by

Aggregates, group, and order by

From

Michael Glaesemann

Date:

07 November 2005, 04:12:13

I'm trying to concatenate strings in variable orders using a custom
aggregate. However, I'm having a difficult time figuring out the SQL
I need to use to accomplish this. Here's a test case that shows the
error I'm getting.

select version();

version
------------------------------------------------------------------------
----------------------------------------------------------------------
PostgreSQL 8.1.0 on powerpc-apple-darwin8.3.0, compiled by GCC
powerpc-apple-darwin8-gcc-4.0.0 (GCC) 4.0.0 (Apple Computer, Inc.
build 5026)
(1 row)


create table ordered_foo (
     foo_id integer not null
     , bar_id integer not null
     , foo_value text not null
     , foo_pos integer not null
     , unique (foo_id, bar_id)
) without oids;

copy ordered_foo (foo_id, bar_id, foo_value, foo_pos) from stdin;
1    1    delta    4
2    1    alpha    1
3    1    charlie    3
4    1    bravo    2
5    2    C    3
6    2    B    2
7    2    A    1
8    2    D    4
\.

CREATE AGGREGATE array_accum (
     sfunc = array_append,
     basetype = anyelement,
     stype = anyarray,
     initcond = '{}'
);

select bar_id, array_accum(foo_value)
from ordered_foo
group by bar_id
order by bar_id;
bar_id |         array_accum
--------+-----------------------------
       1 | {delta,alpha,charlie,bravo}
       2 | {C,B,A,D}


The result I'd like to see is
bar_id |         array_accum
--------+-----------------------------
       1 | {alpha,bravo,charlie,delta}
       2 | {A,B,C,D}


select bar_id, array_accum(foo_value)
from ordered_foo
group by bar_id
order by bar_id, foo_pos;
ERROR:  column "ordered_foo.foo_pos" must appear in the GROUP BY
clause or be used in an aggregate function

So, I thought I'd just use a meaningless aggregate and throw away the
result.

select bar_id, array_accum(foo_value), sum(foo_pos)
from ordered_foo
group by bar_id
order by bar_id, foo_pos;
ERROR:  column "ordered_foo.foo_pos" must appear in the GROUP BY
clause or be used in an aggregate function

select version();

version
------------------------------------------------------------------------
----------------------------------------------------------------------
PostgreSQL 8.1.0 on powerpc-apple-darwin8.3.0, compiled by GCC
powerpc-apple-darwin8-gcc-4.0.0 (GCC) 4.0.0 (Apple Computer, Inc.
build 5026)
(1 row)


I get the same error if I use built-in aggregates, so I know it's not
the custom aggregate:

select bar_id, sum(foo_id), sum(foo_pos)
from ordered_foo
group by bar_id
order by bar_id, foo_pos;
ERROR:  column "ordered_foo.foo_pos" must appear in the GROUP BY
clause or be used in an aggregate function

Taking foo_pos out of the ORDER BY clause gets rid of the error, but
of course I'm no longer guaranteed the ordering of the result.

test=# select bar_id, sum(foo_id), sum(foo_pos)
test-# from ordered_foo
test-# group by bar_id
test-# order by bar_id;
bar_id | sum | sum
--------+-----+-----
       1 |  10 |  10
       2 |  26 |  10

I'm guessing this is happening because the ORDER BY clause is
evaluated relatively late, and on the result set (rather than on an
intermediate stage) and the result set doesn't include the foo_ps
column.

I can't think of a way to push the aggregate with the order by into a
subquery that still produces the proper grouping (by bar_id). Any
hints or suggestions?

Michael Glaesemann
grzm myrealbox com

Re: Aggregates, group, and order by

From

David Fetter

Date:

07 November 2005, 04:48:06

On Mon, Nov 07, 2005 at 05:12:05PM +0900, Michael Glaesmann wrote:
> I'm trying to concatenate strings in variable orders using a custom
> aggregate.  However, I'm having a difficult time figuring out the
> SQL I need to use to accomplish this.

How about using the ARRAY() constructor as below?

> Here's a test case that
> shows the  error I'm getting.
>
> select version();
>
> version
> ------------------------------------------------------------------------
> ----------------------------------------------------------------------
> PostgreSQL 8.1.0 on powerpc-apple-darwin8.3.0, compiled by GCC
> powerpc-apple-darwin8-gcc-4.0.0 (GCC) 4.0.0 (Apple Computer, Inc.
> build 5026)
> (1 row)
>
>
> create table ordered_foo (
>     foo_id integer not null
>     , bar_id integer not null
>     , foo_value text not null
>     , foo_pos integer not null
>     , unique (foo_id, bar_id)
> ) without oids;
>
> copy ordered_foo (foo_id, bar_id, foo_value, foo_pos) from stdin;
> 1    1    delta    4
> 2    1    alpha    1
> 3    1    charlie    3
> 4    1    bravo    2
> 5    2    C    3
> 6    2    B    2
> 7    2    A    1
> 8    2    D    4
> \.
>
> CREATE AGGREGATE array_accum (
>     sfunc = array_append,
>     basetype = anyelement,
>     stype = anyarray,
>     initcond = '{}'
> );
>
> select bar_id, array_accum(foo_value)
> from ordered_foo
> group by bar_id
> order by bar_id;
> bar_id |         array_accum
> --------+-----------------------------
>       1 | {delta,alpha,charlie,bravo}
>       2 | {C,B,A,D}

SELECT DISTINCT
    o1.bar_id,
    ARRAY(
        SELECT o2.foo_value
        FROM ordered_foo o2
        WHERE o1.bar_id = o2.bar_id
        ORDER BY o2.foo_value
    )
FROM ordered_foo o1
ORDER BY o1.bar_id;

 bar_id |          ?column?
--------+-----------------------------
      1 | {alpha,bravo,charlie,delta}
      2 | {A,B,C,D}
(2 rows)

Another way to do this could be with another function like this.

CREATE OR REPLACE FUNCTION array_sort(in_array ANYARRAY)
RETURNS ANYARRAY
LANGUAGE plpgsql
AS $$
BEGIN /* Yes, I know I'm not checking array dimensions.  This is a prototype. */
    RETURN ARRAY (
        SELECT in_array[s.i]
        FROM generate_series(
            array_lower(in_array,1),
            array_upper(in_array,1)
        ) AS s(i)
        ORDER BY 1
    );
END;
$$;

> [snip]
>
> I can't think of a way to push the aggregate with the order by into a
> subquery that still produces the proper grouping (by bar_id). Any
> hints or suggestions?

HTH :)

Cheers,
D
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!

Re: Aggregates, group, and order by

From

Joe Conway

Date:

07 November 2005, 05:09:26

Michael Glaesemann wrote:
> I'm trying to concatenate strings in variable orders using a custom
> aggregate. However, I'm having a difficult time figuring out the SQL  I
> need to use to accomplish this. Here's a test case that shows the  error
> I'm getting.

> select bar_id, array_accum(foo_value)
> from ordered_foo
> group by bar_id
> order by bar_id;
> bar_id |         array_accum
> --------+-----------------------------
>       1 | {delta,alpha,charlie,bravo}
>       2 | {C,B,A,D}
>
>
> The result I'd like to see is
> bar_id |         array_accum
> --------+-----------------------------
>       1 | {alpha,bravo,charlie,delta}
>       2 | {A,B,C,D}

Just use a subselect --  you're looking for this, correct?

regression=# select bar_id, array_accum(foo_value) from (select * from
ordered_foo order by foo_pos) as ss group by bar_id order by bar_id;
  bar_id |         array_accum
--------+-----------------------------
       1 | {alpha,bravo,charlie,delta}
       2 | {A,B,C,D}
(2 rows)

HTH,

Joe

Re: Aggregates, group, and order by

From

Michael Glaesemann

Date:

07 November 2005, 06:15:24

On Nov 7, 2005, at 17:40 , Roger Hand wrote:

> On Monday, November 07, 2005 12:12 AM
> Michael Glaesemann wrote:
>>
>> select bar_id, array_accum(foo_value)
>> from ordered_foo
>> group by bar_id
>> order by bar_id;
>> bar_id |         array_accum
>> --------+-----------------------------
>>        1 | {delta,alpha,charlie,bravo}
>>        2 | {C,B,A,D}
>>
>>
>> The result I'd like to see is
>> bar_id |         array_accum
>> --------+-----------------------------
>>        1 | {alpha,bravo,charlie,delta}
>>        2 | {A,B,C,D}
>
> select bar_id, array_accum(foo_value)
> from
> (SELECT * FROM ordered_foo ORDER BY foo_pos) foo
> group by bar_id
> order by bar_id;
>
> bar_id,array_accum
> 1,{alpha,bravo,charlie,delta}
> 2,{A,B,C,D}

On Nov 7, 2005, at 18:09 , Joe Conway wrote:
>
> Just use a subselect --  you're looking for this, correct?
>
> regression=# select bar_id, array_accum(foo_value) from (select *
> from ordered_foo order by foo_pos) as ss group by bar_id order by
> bar_id;
>  bar_id |         array_accum
> --------+-----------------------------
>       1 | {alpha,bravo,charlie,delta}
>       2 | {A,B,C,D}
> (2 rows)


That is very sweet. Is the subquery guaranteed to retain the order?
My understanding is that a table is not necessarily ordered, so the
result of a subquery isn't necessarily going to be ordered either.

I'm having a bit of trouble getting this to work with the non-reduced
case: the array_accum results aren't always ordered as I want. Going
to mess around with it a little more to see if I can find out why.

Thanks again!

Michael Glaesemann
grzm myrealbox com

Re: Aggregates, group, and order by

From

"Roger Hand"

Date:

07 November 2005, 07:55:07

On Monday, November 07, 2005 12:12 AM
Michael Glaesemann wrote:
>
> select bar_id, array_accum(foo_value)
> from ordered_foo
> group by bar_id
> order by bar_id;
> bar_id |         array_accum
> --------+-----------------------------
>        1 | {delta,alpha,charlie,bravo}
>        2 | {C,B,A,D}
>
>
> The result I'd like to see is
> bar_id |         array_accum
> --------+-----------------------------
>        1 | {alpha,bravo,charlie,delta}
>        2 | {A,B,C,D}

select bar_id, array_accum(foo_value)
from
(SELECT * FROM ordered_foo ORDER BY foo_pos) foo
group by bar_id
order by bar_id;

bar_id,array_accum
1,{alpha,bravo,charlie,delta}
2,{A,B,C,D}

Re: Aggregates, group, and order by

From

Tom Lane

Date:

07 November 2005, 10:24:12

Joe Conway <mail@joeconway.com> writes:
> Michael Glaesemann wrote:
>> I'm trying to concatenate strings in variable orders using a custom
>> aggregate.

> Just use a subselect --  you're looking for this, correct?

> regression=# select bar_id, array_accum(foo_value) from (select * from
> ordered_foo order by foo_pos) as ss group by bar_id order by bar_id;

Strictly speaking, you need this:

    select bar_id, array_accum(foo_value) from
    (select * from ordered_foo order by bar_id, foo_pos) as ss
    group by bar_id order by bar_id;

ie, sort the subselect by the grouping key of the outer query, then
by the thing that should control the aggregation order within groups.

The way Joe shows will work only if the planner chooses to use a hash
aggregate plan.  If it chooses a sort/uniq aggregation plan, the re-sort
will destroy the sort order of the sub-select's output.

            regards, tom lane

Re: Aggregates, group, and order by

From

Michael Glaesemann

Date:

07 November 2005, 10:38:57

On Nov 7, 2005, at 23:24 , Tom Lane wrote:

> Strictly speaking, you need this:
>
>     select bar_id, array_accum(foo_value) from
>     (select * from ordered_foo order by bar_id, foo_pos) as ss
>     group by bar_id order by bar_id;
>
> ie, sort the subselect by the grouping key of the outer query, then
> by the thing that should control the aggregation order within groups.
>
> The way Joe shows will work only if the planner chooses to use a hash
> aggregate plan.  If it chooses a sort/uniq aggregation plan, the re-
> sort
> will destroy the sort order of the sub-select's output.

Thanks, Tom. That fixed the problem in the production version. Is the
ability to do this an artifact of the sorting or defined by the SQL?
I.e., can I expect this to work even if the plan changes? I'm
guessing not, but that's just a guess.

Michael Glaesemann
grzm myrealbox com

Re: Aggregates, group, and order by

From

Tom Lane

Date:

07 November 2005, 11:10:10

Michael Glaesemann <grzm@myrealbox.com> writes:
> On Nov 7, 2005, at 23:24 , Tom Lane wrote:
>> Strictly speaking, you need this:
>>     select bar_id, array_accum(foo_value) from
>>     (select * from ordered_foo order by bar_id, foo_pos) as ss
>>     group by bar_id order by bar_id;
>> ie, sort the subselect by the grouping key of the outer query, then
>> by the thing that should control the aggregation order within groups.

> Thanks, Tom. That fixed the problem in the production version. Is the
> ability to do this an artifact of the sorting or defined by the SQL?

ORDER BY in a sub-select isn't legal at all per the SQL spec, so all of
this is well outside the bounds of the spec.  The planner does take
pains to make this work, though, and I can't see a reason why we'd want
to break it in future versions.

            regards, tom lane

Re: Aggregates, group, and order by

From

Michael Glaesemann

Date:

08 November 2005, 07:59:00

On Nov 7, 2005, at 17:47 , David Fetter wrote:

> On Mon, Nov 07, 2005 at 05:12:05PM +0900, Michael Glaesmann wrote:
>> I'm trying to concatenate strings in variable orders using a custom
>> aggregate.  However, I'm having a difficult time figuring out the
>> SQL I need to use to accomplish this.
>
> How about using the ARRAY() constructor as below?

Thanks for the idea, David. I'll give it a look!

Michael Glaesemann
grzm myrealbox com