Re: Extracting superlatives - SQL design philosophy

From: Garrett Murphy
Subject: Re: Extracting superlatives - SQL design philosophy
Date: ,
Msg-id: 076DC33A3D38CE4BBC64D35DDD9DE70C0A17259D@mse4be2.mse4.exchange.ms
(view: Whole thread, Raw)
In response to: Extracting superlatives - SQL design philosophy  (Dave Crooke)
Responses: Re: Extracting superlatives - SQL design philosophy  (Dave Crooke)
List: pgsql-performance

Tree view

Extracting superlatives - SQL design philosophy  (Dave Crooke, )
 Re: Extracting superlatives - SQL design philosophy  (Mose, )
  Re: Extracting superlatives - SQL design philosophy  ("Julien Theulier", )
   Re: Extracting superlatives - SQL design philosophy  (Tom Lane, )
 Re: Extracting superlatives - SQL design philosophy  ("Garrett Murphy", )
  Re: Extracting superlatives - SQL design philosophy  (Dave Crooke, )
   Re: Extracting superlatives - SQL design philosophy  (Richard Huxton, )
    Re: Extracting superlatives - SQL design philosophy  (Dave Crooke, )
     Re: Extracting superlatives - SQL design philosophy  (Richard Huxton, )
 Re: Extracting superlatives - SQL design philosophy  ("George Sexton", )
  Re: Extracting superlatives - SQL design philosophy  ("George Sexton", )
 Re: Extracting superlatives - SQL design philosophy  (Craig James, )
 Re: Extracting superlatives - SQL design philosophy  ("Pierre C", )
 Re: Extracting superlatives - SQL design philosophy  (Merlin Moncure, )
  Re: Extracting superlatives - SQL design philosophy  (Dave Crooke, )

This looks to be a perfect use for SELECT DISTINCT ON:

SELECT DISTINCT ON (city)
* FROM bar
ORDER BY city, temp desc

Or am I misunderstanding the issue?

Garrett Murphy

-----Original Message-----
From:  [mailto:] On Behalf Of Dave Crooke
Sent: Wednesday, February 24, 2010 2:31 PM
To: pgsql-performance
Subject: [PERFORM] Extracting superlatives - SQL design philosophy

This is a generic SQL issue and not PG specific, but I'd like to get
an opinion from this list.

Consider the following data:

# \d bar
                Table "public.bar"
 Column |            Type             | Modifiers
--------+-----------------------------+-----------
 city   | character varying(255)      |
 temp   | integer                     |
 date   | timestamp without time zone |

# select * from bar order by city, date;
   city    | temp |        date
-----------+------+---------------------
 Austin    |   75 | 2010-02-21 15:00:00
 Austin    |   35 | 2010-02-23 15:00:00
 Edinburgh |   42 | 2010-02-23 15:00:00
 New York  |   56 | 2010-02-23 15:00:00
 New York  |   78 | 2010-06-23 15:00:00
(5 rows)

If you want the highest recorded temperature for a city, that's easy
to do, since the selection criteria works on the same column that we
are extracing:

# select city, max(temp) from bar group by city order by 1;
   city    | max
-----------+-----
 Austin    |  75
 Edinburgh |  42
 New York  |  78
(3 rows)


However there is (AFAIK) no simple way in plain SQL to write a query
that performs such an aggregation where the aggregation criteria is on
one column and you want to return another, e.g. adding the the *date
of* that highest temperature to the output above, or doing a query to
get the most recent temperature reading for each city.

What I'd like to do is something like the below (and I'm inventing
mock syntax here, the following is not valid SQL):

-- Ugly implicit syntax but no worse than an Oracle outer join ;-)
select city, temp, date from bar where date=max(date) group by city,
temp order by city;

or perhaps

-- More explicit
select aggregate_using(max(date), city, temp, date) from bar group by
city, temp order by city;

Both of the above, if they existed, would be a single data access
followed by and sort-merge.

The only way I know how to do it involves doing two accesses to the data, e.g.

# select city, temp, date from bar a where date=(select max(b.date)
from bar b where a.city=b.city) order by 1;
   city    | temp |        date
-----------+------+---------------------
 Austin    |   35 | 2010-02-23 15:00:00
 Edinburgh |   42 | 2010-02-23 15:00:00
 New York  |   78 | 2010-06-23 15:00:00
(3 rows)


# explain select * from bar a where date=(select max(b.date) from bar
b where a.city=b.city) order by 1;
                                QUERY PLAN
--------------------------------------------------------------------------
 Sort  (cost=1658.86..1658.87 rows=1 width=528)
   Sort Key: a.city
   ->  Seq Scan on bar a  (cost=0.00..1658.85 rows=1 width=528)
         Filter: (date = (subplan))
         SubPlan
           ->  Aggregate  (cost=11.76..11.77 rows=1 width=8)
                 ->  Seq Scan on bar b  (cost=0.00..11.75 rows=1
width=8)     -- would be an index lookup in a real scenario
                       Filter: (($0)::text = (city)::text)
(8 rows)

--
Sent via pgsql-performance mailing list ()
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


pgsql-performance by date:

From: Dave Crooke
Date:
Subject: Re: Extracting superlatives - SQL design philosophy
From: Richard Huxton
Date:
Subject: Re: Extracting superlatives - SQL design philosophy