Thread: Picking the first of an order in an aggregate query

Picking the first of an order in an aggregate query

From
Robert James
Date:
I have a query

  SELECT grouping_field, MIN(field_a), MIN(field_b)
  FROM ...
  GROUP BY grouping_field

But, instead of picking the MIN field_a and MIN field_b, I'd like to
pick field_a and field_b from the first record, according to an order
I'll specify.  In pseudo-SQL, it would be something like this:

  SELECT grouping_field, FIRST(field_a), FIRST(field_b)
  FROM ...
  ORDER BY field_c DESC, field_d ASC, myfunc(field_e) ASC
  GROUP BY grouping_field

How can I do that with Postgres?


Re: Picking the first of an order in an aggregate query

From
Jack Christensen
Date:
On 12/31/2012 8:33 AM, Robert James wrote:
> I have a query
>
>    SELECT grouping_field, MIN(field_a), MIN(field_b)
>    FROM ...
>    GROUP BY grouping_field
>
> But, instead of picking the MIN field_a and MIN field_b, I'd like to
> pick field_a and field_b from the first record, according to an order
> I'll specify.  In pseudo-SQL, it would be something like this:
>
>    SELECT grouping_field, FIRST(field_a), FIRST(field_b)
>    FROM ...
>    ORDER BY field_c DESC, field_d ASC, myfunc(field_e) ASC
>    GROUP BY grouping_field
>
> How can I do that with Postgres?
>
>
select distinct on (grouping_field), field_a, field_b
from ...
order by grouping_field, field_a asc, field_b asc


http://www.postgresql.org/docs/9.2/static/sql-select.html#SQL-DISTINCT

--
Jack Christensen
http://jackchristensen.com/



Re: Picking the first of an order in an aggregate query

From
Tom Lane
Date:
Jack Christensen <jack@jackchristensen.com> writes:
> On 12/31/2012 8:33 AM, Robert James wrote:
>> SELECT grouping_field, FIRST(field_a), FIRST(field_b)
>> FROM ...
>> ORDER BY field_c DESC, field_d ASC, myfunc(field_e) ASC
>> GROUP BY grouping_field
>>
>> How can I do that with Postgres?

> select distinct on (grouping_field), field_a, field_b
> from ...
> order by grouping_field, field_a asc, field_b asc

Another possibility, if you're using PG 8.4 or newer, is to use window
functions.  It'd go something like

select grouping_field, first_value(field_a) over (partition by grouping_field order by field_a), ...

if memory serves (I'm not quite sure whether you need the PARTITION BY
bit if there's a global GROUP BY in the query).

The DISTINCT ON syntax is a Postgres-ism, while window functions are
SQL-standard so have at least some chance of being portable, if that
matters to you.

            regards, tom lane


Re: Picking the first of an order in an aggregate query

From
Robert James
Date:
DISTINCT is a very simple solution!
But I have one problem: In addition to the FIRST fields, I also do
want some aggregate functions.  More accurately, it would be:

SELECT grouping_field, FIRST(field_a), FIRST(field_b), SUM(field_x),
MAX(field_y)
...

How should I do that? Should I do two queries with a join on the
grouping field? Or is there a more direct way?



On 12/31/12, Jack Christensen <jack@jackchristensen.com> wrote:
> On 12/31/2012 8:33 AM, Robert James wrote:
>> I have a query
>>
>>    SELECT grouping_field, MIN(field_a), MIN(field_b)
>>    FROM ...
>>    GROUP BY grouping_field
>>
>> But, instead of picking the MIN field_a and MIN field_b, I'd like to
>> pick field_a and field_b from the first record, according to an order
>> I'll specify.  In pseudo-SQL, it would be something like this:
>>
>>    SELECT grouping_field, FIRST(field_a), FIRST(field_b)
>>    FROM ...
>>    ORDER BY field_c DESC, field_d ASC, myfunc(field_e) ASC
>>    GROUP BY grouping_field
>>
>> How can I do that with Postgres?
>>
>>
> select distinct on (grouping_field), field_a, field_b
> from ...
> order by grouping_field, field_a asc, field_b asc
>
>
> http://www.postgresql.org/docs/9.2/static/sql-select.html#SQL-DISTINCT
>
> --
> Jack Christensen
> http://jackchristensen.com/
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: Picking the first of an order in an aggregate query

From
François Beausoleil
Date:
Le 2012-12-31 à 15:38, Robert James a écrit :

> DISTINCT is a very simple solution!
> But I have one problem: In addition to the FIRST fields, I also do
> want some aggregate functions.  More accurately, it would be:
>
> SELECT grouping_field, FIRST(field_a), FIRST(field_b), SUM(field_x),
> MAX(field_y)
> ...
>
> How should I do that? Should I do two queries with a join on the
> grouping field? Or is there a more direct way?

WINDOW functions can help you:

SELECT
    grouping_field
  , first_value(field_a) OVER (ORDER BY ...)
  , first_value(field_b) OVER (ORDER BY ...)
  , sum(field_x) OVER ()
  , max(field_y) OVER ()
FROM ...

The empty OVER clauses will make the sum / max work over the full result set, and not a subset. I really recommend
readingthe window functions section on the site. 

Bye!
François

NOTE: Please do not top-post. This list is bottom post.

Re: Picking the first of an order in an aggregate query

From
Robert James
Date:
On 12/31/12, François Beausoleil <francois@teksol.info> wrote:
>
> Le 2012-12-31 à 15:38, Robert James a écrit :
>
>> DISTINCT is a very simple solution!
>> But I have one problem: In addition to the FIRST fields, I also do
>> want some aggregate functions.  More accurately, it would be:
>>
>> SELECT grouping_field, FIRST(field_a), FIRST(field_b), SUM(field_x),
>> MAX(field_y)
>> ...
>>
>> How should I do that? Should I do two queries with a join on the
>> grouping field? Or is there a more direct way?
>
> WINDOW functions can help you:
>
> SELECT
>     grouping_field
>   , first_value(field_a) OVER (ORDER BY ...)
>   , first_value(field_b) OVER (ORDER BY ...)
>   , sum(field_x) OVER ()
>   , max(field_y) OVER ()
> FROM ...
>
> The empty OVER clauses will make the sum / max work over the full result
> set, and not a subset. I really recommend reading the window functions
> section on the site.
>

I see.  Will the optimizer know enough to not repeat the work for each
first_value I do? Or am I better off using a JOIN of some sort?


Re: Picking the first of an order in an aggregate query

From
Vincent Veyron
Date:
Le lundi 31 décembre 2012 à 20:55 -0500, Robert James a écrit :
> On 12/31/12, François Beausoleil <francois@teksol.info> wrote:
> >
> > Le 2012-12-31 à 15:38, Robert James a écrit :
> >
> >> DISTINCT is a very simple solution!
> >> But I have one problem: In addition to the FIRST fields, I also do
> >> want some aggregate functions.  More accurately, it would be:
> >>
> >> SELECT grouping_field, FIRST(field_a), FIRST(field_b), SUM(field_x),
> >> MAX(field_y)
> >> ...
> >>
> >> How should I do that? Should I do two queries with a join on the
> >> grouping field? Or is there a more direct way?
> >
> > WINDOW functions can help you:
> >
> > SELECT
> >     grouping_field
> >   , first_value(field_a) OVER (ORDER BY ...)
> >   , first_value(field_b) OVER (ORDER BY ...)
> >   , sum(field_x) OVER ()
> >   , max(field_y) OVER ()
> > FROM ...
> >
> > The empty OVER clauses will make the sum / max work over the full result
> > set, and not a subset. I really recommend reading the window functions
> > section on the site.
> >
>
> I see.  Will the optimizer know enough to not repeat the work for each
> first_value I do? Or am I better off using a JOIN of some sort?
>
>

You probably can check with explain analyze; if not, a CTE (common table
expression) might help; something like :

with t1 as (
select grouping_field, sum(field_x) as sum_x, max(field_y) as max_y
FROM ...
group by grouping_field
)
SELECT grouping_field,
first_value(field_a) OVER (Partition by grouping_field ORDER BY ...),
first_value(field_b) OVER (Partition by grouping_field ORDER BY ...),
t1.sum_x,
t1.max_y
FROM ... INNER JOIN t1 using (grouping_field)

'Partition by grouping_field' may or may not be necessary in your case,
depending on what you want; see :

http://www.postgresql.org/docs/current/static/tutorial-window.html

--
Vincent Veyron
http://marica.fr
Logiciel pour département juridique