Thread: Picking the first of an order in an aggregate query
I have a query SELECT grouping_field, MIN(field_a), MIN(field_b) FROM ... GROUP BY grouping_field But, instead of picking the MIN field_a and MIN field_b, I'd like to pick field_a and field_b from the first record, according to an order I'll specify. In pseudo-SQL, it would be something like this: SELECT grouping_field, FIRST(field_a), FIRST(field_b) FROM ... ORDER BY field_c DESC, field_d ASC, myfunc(field_e) ASC GROUP BY grouping_field How can I do that with Postgres?
On 12/31/2012 8:33 AM, Robert James wrote: > I have a query > > SELECT grouping_field, MIN(field_a), MIN(field_b) > FROM ... > GROUP BY grouping_field > > But, instead of picking the MIN field_a and MIN field_b, I'd like to > pick field_a and field_b from the first record, according to an order > I'll specify. In pseudo-SQL, it would be something like this: > > SELECT grouping_field, FIRST(field_a), FIRST(field_b) > FROM ... > ORDER BY field_c DESC, field_d ASC, myfunc(field_e) ASC > GROUP BY grouping_field > > How can I do that with Postgres? > > select distinct on (grouping_field), field_a, field_b from ... order by grouping_field, field_a asc, field_b asc http://www.postgresql.org/docs/9.2/static/sql-select.html#SQL-DISTINCT -- Jack Christensen http://jackchristensen.com/
Jack Christensen <jack@jackchristensen.com> writes: > On 12/31/2012 8:33 AM, Robert James wrote: >> SELECT grouping_field, FIRST(field_a), FIRST(field_b) >> FROM ... >> ORDER BY field_c DESC, field_d ASC, myfunc(field_e) ASC >> GROUP BY grouping_field >> >> How can I do that with Postgres? > select distinct on (grouping_field), field_a, field_b > from ... > order by grouping_field, field_a asc, field_b asc Another possibility, if you're using PG 8.4 or newer, is to use window functions. It'd go something like select grouping_field, first_value(field_a) over (partition by grouping_field order by field_a), ... if memory serves (I'm not quite sure whether you need the PARTITION BY bit if there's a global GROUP BY in the query). The DISTINCT ON syntax is a Postgres-ism, while window functions are SQL-standard so have at least some chance of being portable, if that matters to you. regards, tom lane
DISTINCT is a very simple solution! But I have one problem: In addition to the FIRST fields, I also do want some aggregate functions. More accurately, it would be: SELECT grouping_field, FIRST(field_a), FIRST(field_b), SUM(field_x), MAX(field_y) ... How should I do that? Should I do two queries with a join on the grouping field? Or is there a more direct way? On 12/31/12, Jack Christensen <jack@jackchristensen.com> wrote: > On 12/31/2012 8:33 AM, Robert James wrote: >> I have a query >> >> SELECT grouping_field, MIN(field_a), MIN(field_b) >> FROM ... >> GROUP BY grouping_field >> >> But, instead of picking the MIN field_a and MIN field_b, I'd like to >> pick field_a and field_b from the first record, according to an order >> I'll specify. In pseudo-SQL, it would be something like this: >> >> SELECT grouping_field, FIRST(field_a), FIRST(field_b) >> FROM ... >> ORDER BY field_c DESC, field_d ASC, myfunc(field_e) ASC >> GROUP BY grouping_field >> >> How can I do that with Postgres? >> >> > select distinct on (grouping_field), field_a, field_b > from ... > order by grouping_field, field_a asc, field_b asc > > > http://www.postgresql.org/docs/9.2/static/sql-select.html#SQL-DISTINCT > > -- > Jack Christensen > http://jackchristensen.com/ > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Le 2012-12-31 à 15:38, Robert James a écrit : > DISTINCT is a very simple solution! > But I have one problem: In addition to the FIRST fields, I also do > want some aggregate functions. More accurately, it would be: > > SELECT grouping_field, FIRST(field_a), FIRST(field_b), SUM(field_x), > MAX(field_y) > ... > > How should I do that? Should I do two queries with a join on the > grouping field? Or is there a more direct way? WINDOW functions can help you: SELECT grouping_field , first_value(field_a) OVER (ORDER BY ...) , first_value(field_b) OVER (ORDER BY ...) , sum(field_x) OVER () , max(field_y) OVER () FROM ... The empty OVER clauses will make the sum / max work over the full result set, and not a subset. I really recommend readingthe window functions section on the site. Bye! François NOTE: Please do not top-post. This list is bottom post.
On 12/31/12, François Beausoleil <francois@teksol.info> wrote: > > Le 2012-12-31 à 15:38, Robert James a écrit : > >> DISTINCT is a very simple solution! >> But I have one problem: In addition to the FIRST fields, I also do >> want some aggregate functions. More accurately, it would be: >> >> SELECT grouping_field, FIRST(field_a), FIRST(field_b), SUM(field_x), >> MAX(field_y) >> ... >> >> How should I do that? Should I do two queries with a join on the >> grouping field? Or is there a more direct way? > > WINDOW functions can help you: > > SELECT > grouping_field > , first_value(field_a) OVER (ORDER BY ...) > , first_value(field_b) OVER (ORDER BY ...) > , sum(field_x) OVER () > , max(field_y) OVER () > FROM ... > > The empty OVER clauses will make the sum / max work over the full result > set, and not a subset. I really recommend reading the window functions > section on the site. > I see. Will the optimizer know enough to not repeat the work for each first_value I do? Or am I better off using a JOIN of some sort?
Le lundi 31 décembre 2012 à 20:55 -0500, Robert James a écrit : > On 12/31/12, François Beausoleil <francois@teksol.info> wrote: > > > > Le 2012-12-31 à 15:38, Robert James a écrit : > > > >> DISTINCT is a very simple solution! > >> But I have one problem: In addition to the FIRST fields, I also do > >> want some aggregate functions. More accurately, it would be: > >> > >> SELECT grouping_field, FIRST(field_a), FIRST(field_b), SUM(field_x), > >> MAX(field_y) > >> ... > >> > >> How should I do that? Should I do two queries with a join on the > >> grouping field? Or is there a more direct way? > > > > WINDOW functions can help you: > > > > SELECT > > grouping_field > > , first_value(field_a) OVER (ORDER BY ...) > > , first_value(field_b) OVER (ORDER BY ...) > > , sum(field_x) OVER () > > , max(field_y) OVER () > > FROM ... > > > > The empty OVER clauses will make the sum / max work over the full result > > set, and not a subset. I really recommend reading the window functions > > section on the site. > > > > I see. Will the optimizer know enough to not repeat the work for each > first_value I do? Or am I better off using a JOIN of some sort? > > You probably can check with explain analyze; if not, a CTE (common table expression) might help; something like : with t1 as ( select grouping_field, sum(field_x) as sum_x, max(field_y) as max_y FROM ... group by grouping_field ) SELECT grouping_field, first_value(field_a) OVER (Partition by grouping_field ORDER BY ...), first_value(field_b) OVER (Partition by grouping_field ORDER BY ...), t1.sum_x, t1.max_y FROM ... INNER JOIN t1 using (grouping_field) 'Partition by grouping_field' may or may not be necessary in your case, depending on what you want; see : http://www.postgresql.org/docs/current/static/tutorial-window.html -- Vincent Veyron http://marica.fr Logiciel pour département juridique