Re: SELECT DISTINCT performance issue - Mailing list pgsql-general
From | K C Lau |
---|---|
Subject | Re: SELECT DISTINCT performance issue |
Date | |
Msg-id | 6.2.1.2.0.20050606121632.02c346d0@localhost Whole thread Raw |
In response to | Re: SELECT DISTINCT performance issue (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-general |
At 13:06 05/06/05, Tom Lane wrote: >K C Lau <kclau60@netvigator.com> writes: > > esdt=> explain analyze select DISTINCT ON (PlayerID) PlayerID,AtDate from > > Player > > where PlayerID='22220' order by PlayerID desc, AtDate desc; > > Unique (cost=0.00..2507.66 rows=1 width=23) (actual time=0.000..187.000 > > rows=1 loops=1) > > -> Index Scan Backward using pk_player on player (cost=0.00..2505.55 > > rows=8 > > 43 width=23) (actual time=0.000..187.000 rows=1227 loops=1) > > Index Cond: ((playerid)::text = '22220'::text) > > Total runtime: 187.000 ms > > > It appears that all the 1227 data records for that player were searched, > > even when doing a backward index scan. I would presume that, after > locating > > the index for the highest AtDate, only the first data record needs to be > > retrieved. > >If you'd said LIMIT 1, it indeed would have stopped sooner. Since you >did not, it had to scan for more outputs. > > regards, tom lane I added LIMIT 1 to the query and it worked fine for a direct query on the table. However, our system uses views and joined views extensively and I still got the performance problem after many tries and finally modifying the view to: create or replace view VCurPlayer (...) as select a.... from Player a where a.AtDate = (select b.AtDate from Player b where b.PlayerID = a.PlayerID and b.AtDate = (select DISTINCT ON (c.PlayerID) c.AtDate from Player c where c.PlayerID = b.PlayerID and c.PlayerID = a.PlayerID order by c.PlayerID desc, c.AtDate desc LIMIT 1) order by b.PlayerID desc, b.AtDate desc LIMIT 1); esdt=> explain analyze select PlayerID,AtDate,version from VCurPlayer where PlayerID='22220'; Index Scan using pk_player on player a (cost=0.00..3969606927.13 rows=59 width =27) (actual time=328.000..328.000 rows=1 loops=1) Index Cond: ((playerid)::text = '22220'::text) Filter: ((atdate)::text = ((subplan))::text) SubPlan -> Limit (cost=0.00..337263.45 rows=1 width=23) (actual time=0.180..0.180 rows=1 loops=1743) -> Index Scan Backward using pk_player on player b (cost=0.00..1652 5909.12 rows=49 width=23) (actual time=0.180..0.180 rows=1 loops=1743) Index Cond: ((playerid)::text = ($1)::text) Filter: ((atdate)::text = ((subplan))::text) SubPlan -> Limit (cost=0.00..1697.53 rows=1 width=23) (actual time= 0.072..0.072 rows=1 loops=1743) -> Unique (cost=0.00..1697.53 rows=1 width=23) (actua l time=0.063..0.063 rows=1 loops=1743) -> Index Scan Backward using pk_player on player c (cost=0.00..1695.32 rows=885 width=23) (actual time=0.063..0.063 rows=1 loop s=1743) Index Cond: (((playerid)::text = ($0)::text ) AND ((playerid)::text = ($1)::text)) Total runtime: 328.000 ms Two subqueries are needed, whereas a single subquery would return no rows. Please note that when we first ported our system to PostgreSQL 7.1.1 a few years ago, we used LIMIT 1 (without DISTINCT) and we did not encounter this performance problem. Our client is currently using MS Windows (using SELECT TOP 1 construct) with no such problem either. Any suggestions? Regards, KC.
pgsql-general by date: