Performance woes relating to DISTINCT (I think) - Mailing list pgsql-general
From | boinger |
---|---|
Subject | Performance woes relating to DISTINCT (I think) |
Date | |
Msg-id | 9e6d8b530509261348fe7d1de@mail.gmail.com Whole thread Raw |
Responses |
Re: Performance woes relating to DISTINCT (I think)
|
List | pgsql-general |
Hello. I'm not sure if this is a question suited for here, the -sql list, or the -performance list, so if I'm mis-posting, please direct me to the right list. I was unable to come up with anything relevant from the archives (though...I am not too sure where to start so I may just have been looking for the wrong things). I am trying to convert a bunch of code from MySQL to Postgresql. In MySQL I was running this particular query with more complexity (with joins and such) and it takes far less than a second. The Pg stripped-down version takes over 45 seconds (it was taking over 80 seconds with the joins). The table-in-question (tasks_applied) contains 12 columns: cid, modcode, yearcode, seid, tid, actid, pkgid, optional, corrected, labor, lmid, parts_price I have the following indexes: CREATE INDEX actid ON tasks_applied USING btree (actid) CREATE INDEX pkgid ON tasks_applied USING btree (pkgid) CREATE INDEX strafe_group ON tasks_applied USING btree (modcode, yearcode, seid, tid) WHERE cid = 0 CREATE UNIQUE INDEX tasks_applied_pkey ON tasks_applied USING btree (cid, modcode, yearcode, seid, tid, actid, optional) Here is my Pg query: ----------------- SELECT DISTINCT modcode, yearcode, seid, COUNT(DISTINCT(tid)) AS task_count FROM tasks_applied WHERE cid=0 AND seid=100001 GROUP BY modcode, yearcode, seid HAVING COUNT(tid)>=0 ORDER BY modcode ASC, yearcode ASC, seid ASC --------------- Here's my EXPLAIN ANAYZE output: ----------------- QUERY PLAN Unique (cost=85168.84..85168.98 rows=11 width=22) (actual time=45602.908..45607.399 rows=515 loops=1) -> Sort (cost=85168.84..85168.87 rows=11 width=22) (actual time=45602.897..45604.286 rows=515 loops=1) Sort Key: modcode, yearcode, seid, count(DISTINCT tid) -> GroupAggregate (cost=0.00..85168.65 rows=11 width=22) (actual time=3149.916..45578.292 rows=515 loops=1) Filter: (count(tid) >= 0) -> Index Scan using strafe_group on tasks_applied (cost=0.00..85167.23 rows=107 width=22) (actual time=3144.908..45366.147 rows=29893 loops=1) Filter: ((cid = 0) AND (seid = 100001)) Total runtime: 45609.207 ms ------------- Finally, here's my MySQL query: SELECT tasks_applied.modcode AS modcode, vin_models.shortname AS shortname, vin_years.year AS year, vin_years.yearcode AS yearcode, service_events.details AS se, service_events.intl_details AS i_se, service_events.seid AS seid, COUNT(DISTINCT(tid)) AS task_count FROM tasks_applied, service_events, vin_models, vin_years WHERE cid=0 AND tasks_applied.yearcode=vin_years.yearcode AND tasks_applied.modcode=vin_models.modcode AND tasks_applied.seid=service_events.seid AND tasks_applied.seid=100001 GROUP BY se, modcode, year HAVING COUNT(tid)>=0 ORDER BY tasks_applied.modcode ASC, vin_years.year ASC, service_events.seid ASC --------- Any help would be greatly appreciated (even if it's just "RTFM on xxx"). Thanks --jeff
pgsql-general by date: