Re: Extracting superlatives - SQL design philosophy - Mailing list pgsql-performance
From | Pierre C |
---|---|
Subject | Re: Extracting superlatives - SQL design philosophy |
Date | |
Msg-id | op.u8octdbxeorkce@localhost Whole thread Raw |
In response to | Extracting superlatives - SQL design philosophy (Dave Crooke <dcrooke@gmail.com>) |
List | pgsql-performance |
> -- More explicit > select aggregate_using(max(date), city, temp, date) from bar group by > city, temp order by city; select city, max(ROW(temp, date)) from bar group by city; Does not work (alas) for lack of a default comparison for record type. Another solution, which works wonders if you've got the list of cities in a separate table, and an index on (city, temp) is this : SELECT c.city, (SELECT ROW( t.date, t.temp ) FROM cities_temp t WHERE t.city=c.city ORDER BY temp DESC LIMIT 1) FROM cities; This will do a nested loop index scan and it is the fastest way, except if you have very few rows per city. The syntax is ugly and you have to extract the stuff from the ROW() afterwards, though. Unfortunately, this does not work : SELECT c.city, (SELECT t.date, t.temp FROM cities_temp t WHERE t.city=c.city ORDER BY temp DESC LIMIT 1) AS m FROM cities; because the subselect isn't allowed to return more than 1 column. Note that you can also get the usually annoying top-N by category to use the index by doing something like : SELECT c.city, (SELECT array_agg(date) FROM (SELECT t.date FROM cities_temp t WHERE t.city=c.city ORDER BY temp DESC LIMIT 5)) AS m FROM cities; The results aren't in a very usable form either, but : CREATE INDEX ti ON annonces( type_id, price ) WHERE price IS NOT NULL; EXPLAIN ANALYZE SELECT t.id, (SELECT ROW(a.id, a.price, a.date_annonce) FROM annonces a WHERE a.type_id = t.id AND price IS NOT NULL ORDER BY price DESC LIMIT 1) FROM types_bien t; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------- Seq Scan on types_bien t (cost=0.00..196.09 rows=57 width=4) (actual time=0.025..0.511 rows=57 loops=1) SubPlan 1 -> Limit (cost=0.00..3.41 rows=1 width=16) (actual time=0.008..0.008 rows=1 loops=57) -> Index Scan Backward using ti on annonces a (cost=0.00..8845.65 rows=2592 width=16) (actual time=0.007..0.007 rows=1 loops=57) Index Cond: (type_id = $0) Total runtime: 0.551 ms explain analyze select distinct type_id, first_value(price) over w as max_price from annonces where price is not null window w as (partition by type_id order by price desc); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ HashAggregate (cost=30515.41..30626.87 rows=11146 width=10) (actual time=320.927..320.971 rows=46 loops=1) -> WindowAgg (cost=27729.14..29958.16 rows=111451 width=10) (actual time=195.289..282.150 rows=111289 loops=1) -> Sort (cost=27729.14..28007.76 rows=111451 width=10) (actual time=195.278..210.762 rows=111289 loops=1) Sort Key: type_id, price Sort Method: quicksort Memory: 8289kB -> Seq Scan on annonces (cost=0.00..18386.17 rows=111451 width=10) (actual time=0.009..72.589 rows=111289 loops=1) Filter: (price IS NOT NULL) Total runtime: 322.382 ms Here using the index is 600x faster... worth a bit of ugly SQL, you decide. By disabling seq_scan and bitmapscan, you can corecr this plan : EXPLAIN ANALYZE SELECT DISTINCT ON (type_id) type_id, date_annonce, price FROM annonces WHERE price IS NOT NULL ORDER BY type_id, price LIMIT 40; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=0.00..78757.61 rows=33 width=14) (actual time=0.021..145.509 rows=40 loops=1) -> Unique (cost=0.00..78757.61 rows=33 width=14) (actual time=0.021..145.498 rows=40 loops=1) -> Index Scan using ti on annonces (cost=0.00..78478.99 rows=111451 width=14) (actual time=0.018..132.671 rows=110796 loops=1) Total runtime: 145.549 ms This plan would be very bad (unless the whole table is in RAM) because I guess the index scan isn't aware of the DISTINCT ON, so it scans all rows in the index and in the table.
pgsql-performance by date: