Re: Extracting superlatives - SQL design philosophy - Mailing list pgsql-performance
From | George Sexton |
---|---|
Subject | Re: Extracting superlatives - SQL design philosophy |
Date | |
Msg-id | 005d01cab59c$bbdeb1e0$339c15a0$@com Whole thread Raw |
In response to | Re: Extracting superlatives - SQL design philosophy ("George Sexton" <georges@mhsoftware.com>) |
List | pgsql-performance |
I missed something: select B.City, MaxCityTemp.Temp, min(B.Date) as FirstMaxDate from bar b INNER JOIN (select city,max(temp) as Temp from Bar group by City) as MaxCityTemp ON B.City=MaxCityTemp.City AND B.Temp=MaxCityTemp.Temp Group by B.City, MaxCityTemp.Temp George Sexton MH Software, Inc. http://www.mhsoftware.com/ Voice: 303 438 9585 > -----Original Message----- > From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance- > owner@postgresql.org] On Behalf Of George Sexton > Sent: Wednesday, February 24, 2010 2:58 PM > To: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Extracting superlatives - SQL design philosophy > > You could do: > > select > B.City, > MaxCityTemp.Temp, > min(B.Date) as FirstMaxDate > from bar b > INNER JOIN (select city,max(temp) as Temp from Bar group by City) > as > MaxCityTemp > ON B.City=MaxCityTemp.City > Group by > B.City, > MaxCityTemp.Temp > > George Sexton > MH Software, Inc. > http://www.mhsoftware.com/ > Voice: 303 438 9585 > > > > -----Original Message----- > > From: pgsql-performance-owner@postgresql.org [mailto:pgsql- > performance- > > owner@postgresql.org] On Behalf Of Dave Crooke > > Sent: Wednesday, February 24, 2010 2:31 PM > > To: pgsql-performance > > Subject: [PERFORM] Extracting superlatives - SQL design philosophy > > > > This is a generic SQL issue and not PG specific, but I'd like to get > > an opinion from this list. > > > > Consider the following data: > > > > # \d bar > > Table "public.bar" > > Column | Type | Modifiers > > --------+-----------------------------+----------- > > city | character varying(255) | > > temp | integer | > > date | timestamp without time zone | > > > > # select * from bar order by city, date; > > city | temp | date > > -----------+------+--------------------- > > Austin | 75 | 2010-02-21 15:00:00 > > Austin | 35 | 2010-02-23 15:00:00 > > Edinburgh | 42 | 2010-02-23 15:00:00 > > New York | 56 | 2010-02-23 15:00:00 > > New York | 78 | 2010-06-23 15:00:00 > > (5 rows) > > > > If you want the highest recorded temperature for a city, that's easy > > to do, since the selection criteria works on the same column that we > > are extracing: > > > > # select city, max(temp) from bar group by city order by 1; > > city | max > > -----------+----- > > Austin | 75 > > Edinburgh | 42 > > New York | 78 > > (3 rows) > > > > > > However there is (AFAIK) no simple way in plain SQL to write a query > > that performs such an aggregation where the aggregation criteria is > on > > one column and you want to return another, e.g. adding the the *date > > of* that highest temperature to the output above, or doing a query to > > get the most recent temperature reading for each city. > > > > What I'd like to do is something like the below (and I'm inventing > > mock syntax here, the following is not valid SQL): > > > > -- Ugly implicit syntax but no worse than an Oracle outer join ;-) > > select city, temp, date from bar where date=max(date) group by city, > > temp order by city; > > > > or perhaps > > > > -- More explicit > > select aggregate_using(max(date), city, temp, date) from bar group by > > city, temp order by city; > > > > Both of the above, if they existed, would be a single data access > > followed by and sort-merge. > > > > The only way I know how to do it involves doing two accesses to the > > data, e.g. > > > > # select city, temp, date from bar a where date=(select max(b.date) > > from bar b where a.city=b.city) order by 1; > > city | temp | date > > -----------+------+--------------------- > > Austin | 35 | 2010-02-23 15:00:00 > > Edinburgh | 42 | 2010-02-23 15:00:00 > > New York | 78 | 2010-06-23 15:00:00 > > (3 rows) > > > > > > # explain select * from bar a where date=(select max(b.date) from bar > > b where a.city=b.city) order by 1; > > QUERY PLAN > > --------------------------------------------------------------------- > -- > > --- > > Sort (cost=1658.86..1658.87 rows=1 width=528) > > Sort Key: a.city > > -> Seq Scan on bar a (cost=0.00..1658.85 rows=1 width=528) > > Filter: (date = (subplan)) > > SubPlan > > -> Aggregate (cost=11.76..11.77 rows=1 width=8) > > -> Seq Scan on bar b (cost=0.00..11.75 rows=1 > > width=8) -- would be an index lookup in a real scenario > > Filter: (($0)::text = (city)::text) > > (8 rows) > > > > -- > > Sent via pgsql-performance mailing list (pgsql- > > performance@postgresql.org) > > To make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-performance > > > > -- > Sent via pgsql-performance mailing list (pgsql- > performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
pgsql-performance by date: