Thread: Query optimization
I have this query that is taking way too long for what I want to do:
explain analyze select cs.contractcode from contractservices cs left join serviceevents se
on se.businessserviceid = cs.businessserviceid and se.contractcode = cs.contractcode
where cs.businessserviceid = 167
group by cs.contractcode, cs.businessserviceid having count(se.businessserviceid) = 0;
where cs.businessserviceid = 167
group by cs.contractcode, cs.businessserviceid having count(se.businessserviceid) = 0;
the results of the explain analyze are:
"HashAggregate (cost=24094.79..24095.29 rows=101 width=22) (actual time=14501.476..14502.124 rows=143 loops=1)"
" Filter: (count(businessserviceid) = 0)"
" -> Merge Left Join (cost=22983.55..24093.64 rows=153 width=22) (actual time=14454.046..14481.790 rows=1482 loops=1)"
" Merge Cond: (("outer".businessserviceid = "inner".businessserviceid) AND ("outer"."?column3?" = "inner"."?column3?"))"
" -> Sort (cost=432.97..433.23 rows=101 width=18) (actual time=57.763..57.879 rows=160 loops=1)"
" Sort Key: cs.businessserviceid, (cs.contractcode)::text"
" -> Seq Scan on contractservices cs (cost=0.00..429.61 rows=101 width=18) (actual time=46.401..54.840 rows=160 loops=1)"
" Filter: (businessserviceid = 167)"
" -> Sort (cost=22550.58..22919.85 rows=147708 width=19) (actual time=13875.680..14140.073 rows=146205 loops=1)"
" Sort Key: se.businessserviceid, (se.contractcode)::text"
" -> Seq Scan on serviceevents se (cost=0.00..6098.08 rows=147708 width=19) (actual time=0.108..1056.808 rows=147708 loops=1)"
"Total runtime: 14512.805 ms"
" Filter: (count(businessserviceid) = 0)"
" -> Merge Left Join (cost=22983.55..24093.64 rows=153 width=22) (actual time=14454.046..14481.790 rows=1482 loops=1)"
" Merge Cond: (("outer".businessserviceid = "inner".businessserviceid) AND ("outer"."?column3?" = "inner"."?column3?"))"
" -> Sort (cost=432.97..433.23 rows=101 width=18) (actual time=57.763..57.879 rows=160 loops=1)"
" Sort Key: cs.businessserviceid, (cs.contractcode)::text"
" -> Seq Scan on contractservices cs (cost=0.00..429.61 rows=101 width=18) (actual time=46.401..54.840 rows=160 loops=1)"
" Filter: (businessserviceid = 167)"
" -> Sort (cost=22550.58..22919.85 rows=147708 width=19) (actual time=13875.680..14140.073 rows=146205 loops=1)"
" Sort Key: se.businessserviceid, (se.contractcode)::text"
" -> Seq Scan on serviceevents se (cost=0.00..6098.08 rows=147708 width=19) (actual time=0.108..1056.808 rows=147708 loops=1)"
"Total runtime: 14512.805 ms"
I want to optimize this query, but I'm new to the process and can use some input. Do you see any red flags here? If so what might I try to improve the query?
-RYAN
Attachment
"Ryan Riehle" <rkr@buildways.com> writes: > I have this query that is taking way too long for what I want to do: The sort step seems to be taking the bulk of the time, so the micro-optimization answer would be to boost sort_mem, and maybe also take a second look at your datatypes (perhaps se.businessserviceid is a low-performance type such as numeric?) Given the disparity of the rowcounts in the tables, another possibility is to write something like select distinct cs.contractcode from contractservices cs where cs.businessserviceid = 167 and not exists (select 1 from serviceevents se where se.businessserviceid = cs.businessserviceid and se.contractcode = cs.contractcode ) If you have an index on (se.businessserviceid, se.contractcode) then the EXISTS should result in one index probe into se for each cs row, which'll probably be faster than the mergejoin approach. Also, do you actually need the "distinct" (formerly "group by")? If there are quite a few duplicates then it might be better to factor the query so that the distinct elimination happens before the EXISTS test. regards, tom lane