Thread: Query optimization

Query optimization

From
"Ryan Riehle"
Date:
I have this query that is taking way too long for what I want to do:
 
explain analyze select cs.contractcode from contractservices cs left join serviceevents se
    on se.businessserviceid = cs.businessserviceid and se.contractcode = cs.contractcode
where cs.businessserviceid = 167
group by cs.contractcode, cs.businessserviceid having count(se.businessserviceid) = 0;
 
the results of the explain analyze are:
 
"HashAggregate  (cost=24094.79..24095.29 rows=101 width=22) (actual time=14501.476..14502.124 rows=143 loops=1)"
"  Filter: (count(businessserviceid) = 0)"
"  ->  Merge Left Join  (cost=22983.55..24093.64 rows=153 width=22) (actual time=14454.046..14481.790 rows=1482 loops=1)"
"        Merge Cond: (("outer".businessserviceid = "inner".businessserviceid) AND ("outer"."?column3?" = "inner"."?column3?"))"
"        ->  Sort  (cost=432.97..433.23 rows=101 width=18) (actual time=57.763..57.879 rows=160 loops=1)"
"              Sort Key: cs.businessserviceid, (cs.contractcode)::text"
"              ->  Seq Scan on contractservices cs  (cost=0.00..429.61 rows=101 width=18) (actual time=46.401..54.840 rows=160 loops=1)"
"                    Filter: (businessserviceid = 167)"
"        ->  Sort  (cost=22550.58..22919.85 rows=147708 width=19) (actual time=13875.680..14140.073 rows=146205 loops=1)"
"              Sort Key: se.businessserviceid, (se.contractcode)::text"
"              ->  Seq Scan on serviceevents se  (cost=0.00..6098.08 rows=147708 width=19) (actual time=0.108..1056.808 rows=147708 loops=1)"
"Total runtime: 14512.805 ms"
I want to optimize this query, but I'm new to the process and can use some input.  Do you see any red flags here?  If so what might I try to improve the query?
 
  -RYAN
Attachment

Re: Query optimization

From
Tom Lane
Date:
"Ryan Riehle" <rkr@buildways.com> writes:
> I have this query that is taking way too long for what I want to do:

The sort step seems to be taking the bulk of the time, so the
micro-optimization answer would be to boost sort_mem, and maybe also
take a second look at your datatypes (perhaps se.businessserviceid
is a low-performance type such as numeric?)

Given the disparity of the rowcounts in the tables, another possibility
is to write something like

select distinct cs.contractcode from contractservices cs
where cs.businessserviceid = 167
and not exists
(select 1 from serviceevents se where
 se.businessserviceid = cs.businessserviceid and se.contractcode = cs.contractcode )

If you have an index on (se.businessserviceid, se.contractcode) then
the EXISTS should result in one index probe into se for each cs
row, which'll probably be faster than the mergejoin approach.

Also, do you actually need the "distinct" (formerly "group by")?
If there are quite a few duplicates then it might be better to
factor the query so that the distinct elimination happens before
the EXISTS test.

            regards, tom lane