Home > mailing lists

Re: Using ORDER BY with AGGREGATE/GROUP BY in a SELECT statement - Mailing list pgsql-sql

From	Tom Lane
Subject	Re: Using ORDER BY with AGGREGATE/GROUP BY in a SELECT statement
Date	May 11, 2001 13:53:42
Msg-id	5443.989598832@sss.pgh.pa.us Whole thread Raw
In response to	Using ORDER BY with AGGREGATE/GROUP BY in a SELECT statement ("David D. Kilzer" <ddkilzer@lubricants-oil.com>)
Responses	Re: Using ORDER BY with AGGREGATE/GROUP BY in a SELECT statement
List	pgsql-sql

Tree view

"David D. Kilzer" <ddkilzer@lubricants-oil.com> writes:
> [ wants to write an aggregate function that returns its last input ]

The SQL model of query processing has a very definite view of the stages
of processing: first group by, then aggregate, and last order by.  Tuple
ordering is irrelevant according to the basic semantics of the language.
Probably the SQL authors would have left out ORDER BY entirely if they
could have got away with it, but instead they made it a vestigial
appendage that is only allowed at the very last instant before query
outputs are forwarded to a client application.

Thus, it is very bad form to write an aggregate that depends on the
order it sees its inputs in.  This won't be changed, because it's part
of the nature of the language.

In PG 7.1 it's possible to hack around this by ordering the result of
a subselect-in-FROM:
SELECT orderedagg(ss.x) FROM (select x from tab order by y) ss;

which is a gross violation of the letter and spirit of the spec, and
should not be expected to be portable to other DBMSes; but it gets the
job done if you are intent on writing an ordering-dependent aggregate.

However, I don't see any good way to combine this with grouping, since
if you apply GROUP BY to the output of the subselect you'll lose the
ordering again.

>   SELECT r.personid               AS personid
>         ,SUM(r.laps)              AS laps
>         ,COUNT(DISTINCT r.id)     AS nightsraced
>         ,(SELECT r.carid
>             FROM race r
>            WHERE r.personid = 14 
>         ORDER BY r.date DESC
>            LIMIT 1)               AS carid
>     FROM race r
>    WHERE r.personid = 14
> GROUP BY r.personid
> ORDER BY r.date;

This is likely to be reasonably efficient, actually, since the subselect
will be evaluated only once per output group --- in fact, as you've
written it it'll only be evaluated once, period, since it has no
dependencies on the outer query.  More usually you'd probably do
       ,(SELECT r2.carid           FROM race r2          WHERE r2.personid = r.personid       ORDER BY r2.date DESC
    LIMIT 1)               AS carid

so that the result tracks the outer query, and in this form it'd be
redone once per output row.
        regards, tom lane

pgsql-sql by date:

From: webb sprague
Date: 11 May 2001, 13:40:32
Subject: Max simultaneous users

From: Roberto Mello
Date: 11 May 2001, 14:05:23
Subject: Re: Postgres function library

Re: Using ORDER BY with AGGREGATE/GROUP BY in a SELECT statement - Mailing list pgsql-sql

Previous

Next