Performance woes relating to DISTINCT (I think) - Mailing list pgsql-general

From boinger
Subject Performance woes relating to DISTINCT (I think)
Date
Msg-id 9e6d8b530509261348fe7d1de@mail.gmail.com
Whole thread Raw
Responses Re: Performance woes relating to DISTINCT (I think)
List pgsql-general
Hello.

I'm not sure if this is a question suited for here, the -sql list, or
the -performance list, so if I'm mis-posting, please direct me to the
right list.

I was unable to come up with anything relevant from the archives
(though...I am not too sure where to start so I may just have been
looking for the wrong things).

I am trying to convert a bunch of code from MySQL to Postgresql.  In
MySQL I was running this particular query with more complexity (with
joins and such) and it takes far less than a second.  The Pg
stripped-down version takes over 45 seconds (it was taking over 80
seconds with the joins).

The table-in-question (tasks_applied) contains 12 columns:
cid, modcode, yearcode, seid, tid, actid, pkgid, optional, corrected,
labor, lmid, parts_price

I have the following indexes:
CREATE INDEX actid ON tasks_applied USING btree (actid)
CREATE INDEX pkgid ON tasks_applied USING btree (pkgid)
CREATE INDEX strafe_group ON tasks_applied USING btree (modcode,
yearcode, seid, tid) WHERE cid = 0
CREATE UNIQUE INDEX tasks_applied_pkey ON tasks_applied USING btree
(cid, modcode, yearcode, seid, tid, actid, optional)

Here is my Pg query:
-----------------
        SELECT DISTINCT
            modcode,
            yearcode,
            seid,
            COUNT(DISTINCT(tid))    AS task_count
        FROM
            tasks_applied
        WHERE
            cid=0 AND
            seid=100001
        GROUP BY
            modcode,
            yearcode,
            seid
        HAVING
            COUNT(tid)>=0
        ORDER BY
            modcode ASC,
            yearcode ASC,
            seid ASC
---------------


Here's my EXPLAIN ANAYZE output:
-----------------
QUERY PLAN

Unique  (cost=85168.84..85168.98 rows=11 width=22) (actual
time=45602.908..45607.399 rows=515 loops=1)

  ->  Sort  (cost=85168.84..85168.87 rows=11 width=22) (actual
time=45602.897..45604.286 rows=515 loops=1)

        Sort Key: modcode, yearcode, seid, count(DISTINCT tid)

        ->  GroupAggregate  (cost=0.00..85168.65 rows=11 width=22)
(actual time=3149.916..45578.292 rows=515 loops=1)

              Filter: (count(tid) >= 0)

              ->  Index Scan using strafe_group on tasks_applied
(cost=0.00..85167.23 rows=107 width=22) (actual
time=3144.908..45366.147 rows=29893 loops=1)

                    Filter: ((cid = 0) AND (seid = 100001))

Total runtime: 45609.207 ms

-------------

Finally, here's my MySQL query:
               SELECT
                       tasks_applied.modcode           AS modcode,
                       vin_models.shortname            AS shortname,
                       vin_years.year                  AS year,
                       vin_years.yearcode              AS yearcode,
                       service_events.details          AS se,
                       service_events.intl_details     AS i_se,
                       service_events.seid             AS seid,
                       COUNT(DISTINCT(tid))            AS task_count
               FROM
                       tasks_applied,
                       service_events,
                       vin_models,
                       vin_years
               WHERE
                       cid=0
                   AND tasks_applied.yearcode=vin_years.yearcode
                   AND tasks_applied.modcode=vin_models.modcode
                   AND tasks_applied.seid=service_events.seid
                   AND tasks_applied.seid=100001
               GROUP BY
                       se, modcode, year
               HAVING
                       COUNT(tid)>=0
               ORDER BY
                       tasks_applied.modcode ASC,
                       vin_years.year ASC,
                       service_events.seid ASC
---------

Any help would be greatly appreciated (even if it's just "RTFM on xxx").

Thanks

--jeff

pgsql-general by date:

Previous
From: Frodo Larik
Date:
Subject: Query Question
Next
From: Yonatan Ben-Nes
Date:
Subject: Re: How many insert + update should one transaction handle?