Home > mailing lists

Using ORDER BY with AGGREGATE/GROUP BY in a SELECT statement - Mailing list pgsql-sql

From	David D. Kilzer
Subject	Using ORDER BY with AGGREGATE/GROUP BY in a SELECT statement
Date	May 11, 2001 11:26:25
Msg-id	20010507122759.B12656@elbonia.lubricants-oil.com Whole thread Raw
Responses	Re: Using ORDER BY with AGGREGATE/GROUP BY in a SELECT statement Re: Using ORDER BY with AGGREGATE/GROUP BY in a SELECT statement
List	pgsql-sql

Tree view

NOTE: I did a moderate search through the PgSQL mail list archives, 
but couldn't find an example of a question where both ORDER BY and
aggregation were used.  In all examples it was possible to use either
one or the other with a possible subselect.  [Keep reading...I
discovered how to do this with subselects, but I'm wondering if (1) what
I'm doing with ORDER BY and aggregation will ever be allowed and (2)
whether the subselect solution is much more inefficient than being able
to order a pre-aggregate set.]

I need to query a table and return aggregated results.  The aggregation
occurs many ways (I'm joining 7 tables and returning 19 columns
currently):
 o Columns in a GROUP BY clause o SUM() functions o COUNT() functions

However, I wrote a new FUNCTION and a new AGGREGATE to return the *last*
value of a field during the aggregation process (see below).  This means
that I would like to use an ORDER BY clause with the SELECT ... GROUP BY
statement before the results are aggregated.

DROP FUNCTION "lastint4" (int4, int4);
CREATE FUNCTION "lastint4" (int4, int4)       RETURNS int4            AS 'BEGIN RETURN $2; END;'      LANGUAGE
'PLPGSQL';

DROP AGGREGATE lastitem int4;
CREATE AGGREGATE lastitem (         SFUNC1 = lastint4,       BASETYPE = int4,         STYPE1 = int4);

A simple (made-up) example:
 SELECT r.personid           AS personid       ,SUM(r.laps)          AS laps       ,COUNT(DISTINCT r.id) AS nightsraced
     ,lastitem(r.carid)    AS carid   FROM race r  WHERE r.personid = 14
 
GROUP BY r.personid
ORDER BY r.date DESC;

Does PostgreSQL (or ANSI SQL) allow this?  In the simple tests I ran, I
would get errors similar to the following:

ERROR:  Attribute r.carid must be GROUPed or used in an aggregate function

This seems a bit...obvious because r.carid is already being used in an
aggregate function.  I'm guessing that I'm running into the internals
of how PgSQL processes the query.

Hmm...well I just figured out how I could do this as a subselect, but it
seems hugely inefficient and would require PostgreSQL 7.1 or later
(ORDER BY, LIMIT used in a subselect):
 SELECT r.personid               AS personid       ,SUM(r.laps)              AS laps       ,COUNT(DISTINCT r.id)     AS
nightsraced      ,(SELECT r.carid           FROM race r          WHERE r.personid = 14        ORDER BY r.date DESC
   LIMIT 1)               AS carid   FROM race r  WHERE r.personid = 14
 
GROUP BY r.personid
ORDER BY r.date;

Is a subselect like this really that inefficient (assuming appropriate 
indexes on r.date and r.personid)?  I would think doing this during
aggregation would be much more efficient.

I'm using PostgreSQL 7.0.3 on Debian GNU/Linux 2.2r3
(postgresql-7.0.3-4) on a Linux 2.2.1x kernel.

Thanks for any insight you can provide!

Dave

pgsql-sql by date:

From: "Sylte"
Date: 11 May 2001, 11:23:15
Subject: Re: Informix->PostgreSQL database convertion

From: "Sylte"
Date: 11 May 2001, 11:53:58
Subject: Informix->PostgreSQL database convertion

Using ORDER BY with AGGREGATE/GROUP BY in a SELECT statement - Mailing list pgsql-sql

Previous

Next