Re: Performance woes relating to DISTINCT (I think) - Mailing list pgsql-general

From boinger
Subject Re: Performance woes relating to DISTINCT (I think)
Date
Msg-id 9e6d8b53050927070755640862@mail.gmail.com
Whole thread Raw
In response to Re: Performance woes relating to DISTINCT (I think)  (Dawid Kuroczko <qnex42@gmail.com>)
Responses Re: Performance woes relating to DISTINCT (I think)
List pgsql-general
On 9/27/05, Dawid Kuroczko <qnex42@gmail.com> wrote:
> > QUERY PLAN
> >         ->  GroupAggregate  (cost=0.00..85168.65 rows=11
> width=22)
> > (actual time=3149.916..45578.292 rows=515 loops=1)
>
>  Hmm, planner expected 11 rows, got 515
>
>
> > (cost=0.00..85167.23 rows=107 width=22) (actual
> > time=3144.908..45366.147 rows=29893 loops=1)
>
>
>  planner expected 107 rows, got 29893...
>   I guess the problem here is that planner has wrong idea how your
>  data looks.  Try doing two things:
>
>  VACUUM ANALYZE;
>  (of tables in question or whole database)
>
>  If that doesn't help, do increase the statistics target.  By default
> PostgreSQL
>  keeps 10 samples, but you might want to increase it to 50 or even 100.
>  And then rerun VACUUM ANALYZE.
>
>  If it doesn't help -- please repost the new query plan once again.

I actually kind of inadvertently "fixed" it.

I threw my hands up and thought to myself "FINE! If it's going to take
that long, at least it can do all the joins and whatnot instead of
having to loop back and do separate queries"

So, I piled in everything I needed it to do, and now it's inexplicably
(to me) fast (!?).

I'm still running a full VACUUM ANALYZE on your recommendation...maybe
shave a few more ms off.

Here's what I have, now (pre-vacuum):

SQL:
SELECT
            tasks_applied.modcode               AS modcode,
            tasks_applied.seid                  AS seid,
            tasks_applied.yearcode              AS yearcode,
            vin_years.year                      AS year,
            COUNT(DISTINCT(tid))                AS task_count
        FROM
            "SS_valid_modelyears",
            tasks_applied,
            vin_years
        WHERE
            cid=0
            AND tasks_applied.seid='500001'

            AND "SS_valid_modelyears".modcode=tasks_applied.modcode

            AND "SS_valid_modelyears".year=vin_years.year
            AND tasks_applied.yearcode=vin_years.yearcode

            AND "SS_valid_modelyears".valid=1
        GROUP BY
            tasks_applied.seid,
            vin_years.year,
            tasks_applied.modcode,
            "SS_valid_modelyears".shortname,
            tasks_applied.yearcode
        ORDER BY
            tasks_applied.seid ASC,
            vin_years.year ASC


QUERY PLAN:
GroupAggregate  (cost=201.39..201.42 rows=1 width=69) (actual
time=80.383..80.386 rows=1 loops=1)

  ->  Sort  (cost=201.39..201.40 rows=1 width=69) (actual
time=79.737..79.898 rows=59 loops=1)

        Sort Key: tasks_applied.seid, vin_years."year",
tasks_applied.modcode, "SS_valid_modelyears".shortname,
tasks_applied.yearcode

        ->  Nested Loop  (cost=1.38..201.38 rows=1 width=69) (actual
time=72.599..78.765 rows=59 loops=1)

              ->  Hash Join  (cost=1.38..165.15 rows=6 width=61)
(actual time=0.530..18.881 rows=1188 loops=1)

                    Hash Cond: ("outer"."year" = "inner"."year")

                    ->  Seq Scan on "SS_valid_modelyears"
(cost=0.00..163.54 rows=36 width=56) (actual time=0.183..9.202
rows=1188 loops=1)

                          Filter: ("valid" = 1)

                    ->  Hash  (cost=1.30..1.30 rows=30 width=9)
(actual time=0.230..0.230 rows=0 loops=1)

                          ->  Seq Scan on vin_years  (cost=0.00..1.30
rows=30 width=9) (actual time=0.019..0.116 rows=30 loops=1)

              ->  Index Scan using strafe_group on tasks_applied
(cost=0.00..6.02 rows=1 width=22) (actual time=0.042..0.043 rows=0
loops=1188)

                    Index Cond: ((("outer".modcode)::text =
(tasks_applied.modcode)::text) AND (tasks_applied.yearcode =
"outer".yearcode) AND (tasks_applied.seid = 500001))

                    Filter: (cid = 0)

Total runtime: 80.764 ms

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Mysterious query plan
Next
From: Poul Møller Hansen
Date:
Subject: Re: Slow query using LIMIT [SOLVED]