Re: PostgreSQL runs a query much slower than BDE and MySQL - Mailing list pgsql-performance
From | Scott Lamb |
---|---|
Subject | Re: PostgreSQL runs a query much slower than BDE and MySQL |
Date | |
Msg-id | EB1CB9B3-3853-401C-A947-FB9B2648902C@slamb.org Whole thread Raw |
In response to | Re: PostgreSQL runs a query much slower than BDE and MySQL (Scott Lamb <slamb@slamb.org>) |
List | pgsql-performance |
Peter, I compared these using the data you supplied on my PostgreSQL 8.1.4 system: On Aug 17, 2006, at 12:09 PM, Scott Lamb wrote: > On Aug 16, 2006, at 3:51 PM, Tom Lane wrote: >>> /* Select all sheep who's most recent transfer was into the >>> subject flock */ >>> SELECT DISTINCT f1.regn_no, f1.transfer_date as date_in >>> FROM SHEEP_FLOCK f1 JOIN >>> /* The last transfer date for each sheep */ >>> (SELECT f.regn_no, MAX(f.transfer_date) as last_xfer_date >>> FROM SHEEP_FLOCK f >>> GROUP BY f.regn_no) f2 >>> ON f1.regn_no = f2.regn_no >>> WHERE f1.flock_no = '1359' >>> AND f1.transfer_date = f2.last_xfer_date QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------------------ Unique (cost=2575.07..2575.08 rows=1 width=36) (actual time=1083.579..1083.696 rows=32 loops=1) -> Sort (cost=2575.07..2575.07 rows=1 width=36) (actual time=1083.576..1083.613 rows=32 loops=1) Sort Key: f1.regn_no, f1.transfer_date -> Nested Loop (cost=1364.00..2575.06 rows=1 width=36) (actual time=287.895..1083.297 rows=32 loops=1) -> HashAggregate (cost=1364.00..1366.50 rows=200 width=36) (actual time=262.345..337.940 rows=38815 loops=1) -> Seq Scan on sheep_flock f (cost=0.00..1116.00 rows=49600 width=36) (actual time=0.005..119.282 rows=81802 loops=1) -> Index Scan using sheep_flock_pkey on sheep_flock f1 (cost=0.00..6.02 rows=1 width=36) (actual time=0.016..0.016 rows=0 loops=38815) Index Cond: (((f1.regn_no)::text = ("outer".regn_no)::text) AND ((f1.flock_no)::text = '1359'::text) AND (f1.transfer_date = "outer"."?column2?")) Total runtime: 1085.115 ms (9 rows) >> >> This seems pretty closely related to this recent thread: >> http://archives.postgresql.org/pgsql-performance/2006-08/msg00220.php >> in which the OP is doing a very similar kind of query in almost >> exactly >> the same way. >> >> I can't help thinking that there's probably a better way to phrase >> this >> type of query in SQL, though it's not jumping out at me what that is. > > I don't know about better, but I tend to phrase these in a quite > different way that's (hopefully) equivalent: > > select latest.regn_no, > latest.transfer_date as date_in > from sheep_flock latest > where not exists ( > select 'x' > from sheep_flock even_later > where latest.regn_no = even_later.regn_no > and latest.transfer_date < even_later.transfer_date) > and latest.flock_no = '1359' > > There's no MAX() or DISTINCT here, so maybe this is easier to > optimize? Q UERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------ Bitmap Heap Scan on sheep_flock latest (cost=764.60..2185.05 rows=124 width=36) (actual time=11.915..13.800 rows=32 loops=1) Recheck Cond: ((flock_no)::text = '1359'::text) Filter: (NOT (subplan)) -> Bitmap Index Scan on sheep_flock_pkey (cost=0.00..764.60 rows=248 width=0) (actual time=10.950..10.950 rows=127 loops=1) Index Cond: ((flock_no)::text = '1359'::text) SubPlan -> Index Scan using sheep_flock_pkey on sheep_flock even_later (cost=0.00..317.49 rows=83 width=0) (actual time=0.016..0.016 rows=1 loops=127) Index Cond: ((($0)::text = (regn_no)::text) AND ($1 < transfer_date)) Total runtime: 13.902 ms (9 rows) seems to return the same data in two orders of magnitude less time. -- Scott Lamb <http://www.slamb.org/>
pgsql-performance by date: