Thread: aggregates, distinct, order by, and case - why won't this work

aggregates, distinct, order by, and case - why won't this work

From
"David G. Johnston"
Date:
This...on 9.3

SELECT array_agg(
distinct case when v % 2 = 0 then 'odd' else 'even' end 
order by case when v % 2 = 0 then 1 else 2 end
)
FROM (VALUES (1), (2), (3)) val (v)

I'm not particularly irked at this though I was hoping to fix a somewhat complex query of mine by simply adding a "DISTINCT" to the array_agg that I am building from derived (using CASE) data.

I am curious to the reason for the limitation, particularly as it would relate to this specific instance.

Any givers?

Thank!

David J.

Re: aggregates, distinct, order by, and case - why won't this work

From
Tom Lane
Date:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> This...on 9.3
> SELECT array_agg(
> distinct case when v % 2 = 0 then 'odd' else 'even' end
> order by case when v % 2 = 0 then 1 else 2 end
> )
> FROM (VALUES (1), (2), (3)) val (v)

The error message seems pretty clear to me:

ERROR:  in an aggregate with DISTINCT, ORDER BY expressions must appear in argument list

This is exactly the same as the complaint you'd get with a SELECT-level
DISTINCT, eg

regression=# create table ttt(a int, b int);
CREATE TABLE
regression=# select distinct a from ttt order by b;
ERROR:  for SELECT DISTINCT, ORDER BY expressions must appear in select list
LINE 1: select distinct a from ttt order by b;
                                            ^

and the reason is the same too: the value of b is not necessarily unique
within any one group of rows with the same value of a, so it's not
well-defined what output order this is asking for.

In the example you give, it's possible for a human to see that the two
case expressions give values that must correlate perfectly.  But PG
doesn't try to do that kind of analysis.  It just insists that an ORDER
BY expression be one of the ones being DISTINCT'd on.

            regards, tom lane


Re: aggregates, distinct, order by, and case - why won't this work

From
"David G. Johnston"
Date:
On Fri, Oct 2, 2015 at 5:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> This...on 9.3
> SELECT array_agg(
> distinct case when v % 2 = 0 then 'odd' else 'even' end
> order by case when v % 2 = 0 then 1 else 2 end
> )
> FROM (VALUES (1), (2), (3)) val (v)

The error message seems pretty clear to me:

ERROR:  in an aggregate with DISTINCT, ORDER BY expressions must appear in argument list

This is exactly the same as the complaint you'd get with a SELECT-level
DISTINCT, eg

regression=# create table ttt(a int, b int);
CREATE TABLE
regression=# select distinct a from ttt order by b;
ERROR:  for SELECT DISTINCT, ORDER BY expressions must appear in select list
LINE 1: select distinct a from ttt order by b;
                                            ^

and the reason is the same too: the value of b is not necessarily unique
within any one group of rows with the same value of a, so it's not
well-defined what output order this is asking for.

In the example you give, it's possible for a human to see that the two
case expressions give values that must correlate perfectly.  But PG
doesn't try to do that kind of analysis.  It just insists that an ORDER
BY expression be one of the ones being DISTINCT'd on.


​Thanks.

It definitely makes simple situations a bit more complicated but I can see how it needs to be that way to handle the generalized case.

I guess I'm looking for something that basically performs a sort, a map, and then unique but one that simply leaves the first instance of any values while removing subsequent ones even if non-adjacent.

imagine sorted input with a map function classifying each number - indeed this is not a great example...

EVEN, ODD, ODD, IMAGINARY, ODD, EVEN, INFINITY => EVEN, ODD, IMAGINARY, INFINITY

Put differently I'm trying to perform set-operations while using an array...I should explore this more and see if I can make a set (sub-query) work...
DISTINCT ON may be useful
​.

David J.