Thread: Help on explain analyze
Hi guys, I have a rather complex view that sometimes takes an awful long time to execute. I have tried to do an 'explain analyze'on it. My intention was to try to optimize the tables involved by creating some indexes to help the lookup. I lookedfor the "Seq Scan's and created appropriate indexes, I thought. However, in most cases the search got even slower.I have "expanded" the view as follows: cims=# explain analyze select * from (SELECT t.id AS oid, d.id AS devid, d.description AS devname, cd.value AS period, upper(dt.typename::text)AS devtype, (date_part('epoch'::text, timezone('GMT'::text, t.firstrun))::bigint - (z.timezone::integer- CASE WHEN z.daylightsaving <> 'Y'::bpchar THEN 0 ELSE CASE WHEN cy.dl_start < now() AND now() < cy.dl_finish THEN 1 ELSE 0 END END) * 3600) % 86400::bigint AS firstrun, t."interval", t.id AS tid, ti.id AS tiid, t.runstatus, t.last, tt.handler,td.value AS ctrlid, td.string AS alarm, z.timezone AS real_timezone, cy.dl_start < now() AND now() < cy.dl_finishAS daylight, z.timezone::integer - CASE WHEN z.daylightsaving <> 'Y'::bpchar THEN 0 ELSE CASE WHEN cy.dl_start < now() AND now() < cy.dl_finish THEN 1 ELSE 0 END END AS timezone FROM device d LEFT JOIN task_info ti ON ti.ctrlid = d.id LEFT JOIN task t ON t.id = ti.taskid LEFT JOIN ctrl_definitions cd ON d.id = cd.ctrlid AND cd.name::text = 'IrrPeriodStart'::text, task_type tt, task_definitionstd, devtype dt, ctrl_definitions cd2, zip z, county cy WHERE td.name = 'UseWSData'::text AND ti.id = td.taskinfoid AND d.devtypeid = dt.id AND tt.id = t.tasktypeid AND (tt.handler= 'modthcswi.so'::text OR tt.handler = 'modthcswb.so'::text) AND btrim(cd2.string) = z.zip::text AND cd2.ctrlid= td.value AND cd2.name::text = 'ZIP'::text AND z.countyfips = cy.countyfips AND z.state = cy.state AND date_part('year'::text,now()) = date_part('year'::text, cy.dl_start)) AS wstaskdist WHERE wstaskdist.ctrlid = 401 AND CAST( alarm AS boolean ) = 't'; The view is actually the sub-SELECT which I have name 'wstaskdist', and my search criteria is the bottom WHERE. The resultof the ANALYZE is: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=284.88..9767.82 rows=1 width=109) (actual time=2515.318..40073.432 rows=10 loops=1) -> Nested Loop (cost=284.88..9745.05 rows=70 width=102) (actual time=2515.184..40071.697 rows=10 loops=1) -> Nested Loop (cost=229.56..5692.38 rows=1 width=88) (actual time=2512.044..39401.729 rows=10 loops=1) -> Nested Loop (cost=229.56..5692.07 rows=1 width=80) (actual time=2511.999..39401.291 rows=10 loops=1) -> Nested Loop (cost=229.56..5691.76 rows=1 width=77) (actual time=2511.943..39400.680 rows=10 loops=1) Join Filter: (ti.id = td.taskinfoid) -> Seq Scan on task_definitions td (cost=0.00..13.68 rows=1 width=22) (actual time=0.204..0.322rows=10 loops=1) Filter: ((name = 'UseWSData'::text) AND (value = 401) AND (string)::boolean) -> Hash Left Join (cost=229.56..5672.72 rows=429 width=59) (actual time=7.159..3939.536 rows=429loops=10) Hash Cond: (d.id = cd.ctrlid) -> Nested Loop (cost=24.66..5442.80 rows=429 width=55) (actual time=6.797..3937.349 rows=429loops=10) -> Hash Join (cost=16.65..282.84 rows=429 width=38) (actual time=0.078..6.587 rows=429loops=10) Hash Cond: (t.id = ti.taskid) -> Seq Scan on task t (cost=0.00..260.29 rows=429 width=30) (actual time=0.022..5.089rows=429 loops=10) -> Hash (cost=11.29..11.29 rows=429 width=12) (actual time=0.514..0.514 rows=429loops=1) -> Seq Scan on task_info ti (cost=0.00..11.29 rows=429 width=12) (actualtime=0.020..0.302 rows=429 loops=1) -> Bitmap Heap Scan on device d (cost=8.01..12.02 rows=1 width=21) (actual time=9.145..9.146rows=1 loops=4290) Recheck Cond: (d.id = ti.ctrlid) -> Bitmap Index Scan on pk_device (cost=0.00..8.01 rows=1 width=0) (actualtime=0.463..0.463 rows=1569 loops=4290) Index Cond: (d.id = ti.ctrlid) -> Hash (cost=202.61..202.61 rows=183 width=8) (actual time=3.534..3.534 rows=343 loops=1) -> Seq Scan on ctrl_definitions cd (cost=0.00..202.61 rows=183 width=8) (actualtime=0.034..3.298 rows=343 loops=1) Filter: ((name)::text = 'IrrPeriodStart'::text) -> Index Scan using devtype_pkey on devtype dt (cost=0.00..0.30 rows=1 width=11) (actual time=0.053..0.055rows=1 loops=10) Index Cond: (dt.id = d.devtypeid) -> Index Scan using pk_task_type on task_type tt (cost=0.00..0.30 rows=1 width=16) (actual time=0.036..0.039rows=1 loops=10) Index Cond: (tt.id = t.tasktypeid) Filter: ((tt.handler = 'modthcswi.so'::text) OR (tt.handler = 'modthcswb.so'::text)) -> Hash Join (cost=55.33..4050.56 rows=211 width=18) (actual time=3.000..66.988 rows=1 loops=10) Hash Cond: ((z.zip)::text = btrim(cd2.string)) -> Seq Scan on zip z (cost=0.00..3729.23 rows=42223 width=20) (actual time=0.017..43.637 rows=42108 loops=10) -> Hash (cost=55.31..55.31 rows=1 width=10) (actual time=0.138..0.138 rows=1 loops=1) -> Bitmap Heap Scan on ctrl_definitions cd2 (cost=4.39..55.31 rows=1 width=10) (actual time=0.089..0.128rows=1 loops=1) Recheck Cond: (ctrlid = 401) Filter: ((name)::text = 'ZIP'::text) -> Bitmap Index Scan on ctrl_def_ctrlid (cost=0.00..4.39 rows=19 width=0) (actual time=0.055..0.055rows=25 loops=1) Index Cond: (ctrlid = 401) -> Index Scan using county_state_fips_inx on county cy (cost=0.00..0.31 rows=1 width=25) (actual time=0.083..0.146 rows=1loops=10) Index Cond: ((cy.state = z.state) AND (cy.countyfips = z.countyfips)) Filter: (date_part('year'::text, now()) = date_part('year'::text, cy.dl_start)) Total runtime: 40073.738 ms (41 rows) I have concentrate my effort on the (double) 'Seq Scan': -> Hash Join (cost=16.65..282.84 rows=429 width=38) (actual time=0.078..6.587 rows=429loops=10) Hash Cond: (t.id = ti.taskid) -> Seq Scan on task t (cost=0.00..260.29 rows=429 width=30) (actual time=0.022..5.089rows=429 loops=10) -> Hash (cost=11.29..11.29 rows=429 width=12) (actual time=0.514..0.514 rows=429loops=1) -> Seq Scan on task_info ti (cost=0.00..11.29 rows=429 width=12) (actualtime=0.020..0.302 rows=429 loops=1) and tried various indexes on the two table 'task' and 'task_info' to help avoid the sequential scans, but at best it givesthe same result, at worst it was 25% slower :-(. The tables aren't that big. I think the largest are the 'zip' and the'county' tables with ~42000 and ~22000 rows, respectively. What is it that I'm missing, Leif
Hello, did you also try defininig partial indexes? e.g. CREATE INDEX xx on task_definitions (ctrlid) WHERE (name::text = 'UseWSData') CREATE INDEX yy on ctrl_definitions (ctrlid) WHERE (name::text = 'IrrPeriodStart') HTH, Marc Mamin -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Leif Jensen Sent: Freitag, 26. November 2010 06:04 To: pgsql-general Subject: [GENERAL] Help on explain analyze Hi guys, I have a rather complex view that sometimes takes an awful long time to execute. I have tried to do an 'explain analyze'on it. My intention was to try to optimize the tables involved by creating some indexes to help the lookup. I lookedfor the "Seq Scan's and created appropriate indexes, I thought. However, in most cases the search got even slower.I have "expanded" the view as follows: cims=# explain analyze select * from (SELECT t.id AS oid, d.id AS devid, d.description AS devname, cd.value AS period, upper(dt.typename::text)AS devtype, (date_part('epoch'::text, timezone('GMT'::text, t.firstrun))::bigint - (z.timezone::integer- CASE WHEN z.daylightsaving <> 'Y'::bpchar THEN 0 ELSE CASE WHEN cy.dl_start < now() AND now() < cy.dl_finish THEN 1 ELSE 0 END END) * 3600) % 86400::bigint AS firstrun, t."interval", t.id AS tid, ti.id AS tiid, t.runstatus, t.last, tt.handler,td.value AS ctrlid, td.string AS alarm, z.timezone AS real_timezone, cy.dl_start < now() AND now() < cy.dl_finishAS daylight, z.timezone::integer - CASE WHEN z.daylightsaving <> 'Y'::bpchar THEN 0 ELSE CASE WHEN cy.dl_start < now() AND now() < cy.dl_finish THEN 1 ELSE 0 END END AS timezone FROM device d LEFT JOIN task_info ti ON ti.ctrlid = d.id LEFT JOIN task t ON t.id = ti.taskid LEFT JOIN ctrl_definitions cd ON d.id = cd.ctrlid AND cd.name::text = 'IrrPeriodStart'::text, task_type tt, task_definitionstd, devtype dt, ctrl_definitions cd2, zip z, county cy WHERE td.name = 'UseWSData'::text AND ti.id = td.taskinfoid AND d.devtypeid = dt.id AND tt.id = t.tasktypeid AND (tt.handler= 'modthcswi.so'::text OR tt.handler = 'modthcswb.so'::text) AND btrim(cd2.string) = z.zip::text AND cd2.ctrlid= td.value AND cd2.name::text = 'ZIP'::text AND z.countyfips = cy.countyfips AND z.state = cy.state AND date_part('year'::text,now()) = date_part('year'::text, cy.dl_start)) AS wstaskdist WHERE wstaskdist.ctrlid = 401 AND CAST( alarm AS boolean ) = 't'; The view is actually the sub-SELECT which I have name 'wstaskdist', and my search criteria is the bottom WHERE. The resultof the ANALYZE is: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=284.88..9767.82 rows=1 width=109) (actual time=2515.318..40073.432 rows=10 loops=1) -> Nested Loop (cost=284.88..9745.05 rows=70 width=102) (actual time=2515.184..40071.697 rows=10 loops=1) -> Nested Loop (cost=229.56..5692.38 rows=1 width=88) (actual time=2512.044..39401.729 rows=10 loops=1) -> Nested Loop (cost=229.56..5692.07 rows=1 width=80) (actual time=2511.999..39401.291 rows=10 loops=1) -> Nested Loop (cost=229.56..5691.76 rows=1 width=77) (actual time=2511.943..39400.680 rows=10 loops=1) Join Filter: (ti.id = td.taskinfoid) -> Seq Scan on task_definitions td (cost=0.00..13.68 rows=1 width=22) (actual time=0.204..0.322rows=10 loops=1) Filter: ((name = 'UseWSData'::text) AND (value = 401) AND (string)::boolean) -> Hash Left Join (cost=229.56..5672.72 rows=429 width=59) (actual time=7.159..3939.536 rows=429loops=10) Hash Cond: (d.id = cd.ctrlid) -> Nested Loop (cost=24.66..5442.80 rows=429 width=55) (actual time=6.797..3937.349 rows=429loops=10) -> Hash Join (cost=16.65..282.84 rows=429 width=38) (actual time=0.078..6.587 rows=429loops=10) Hash Cond: (t.id = ti.taskid) -> Seq Scan on task t (cost=0.00..260.29 rows=429 width=30) (actual time=0.022..5.089rows=429 loops=10) -> Hash (cost=11.29..11.29 rows=429 width=12) (actual time=0.514..0.514 rows=429loops=1) -> Seq Scan on task_info ti (cost=0.00..11.29 rows=429 width=12) (actualtime=0.020..0.302 rows=429 loops=1) -> Bitmap Heap Scan on device d (cost=8.01..12.02 rows=1 width=21) (actual time=9.145..9.146rows=1 loops=4290) Recheck Cond: (d.id = ti.ctrlid) -> Bitmap Index Scan on pk_device (cost=0.00..8.01 rows=1 width=0) (actualtime=0.463..0.463 rows=1569 loops=4290) Index Cond: (d.id = ti.ctrlid) -> Hash (cost=202.61..202.61 rows=183 width=8) (actual time=3.534..3.534 rows=343 loops=1) -> Seq Scan on ctrl_definitions cd (cost=0.00..202.61 rows=183 width=8) (actualtime=0.034..3.298 rows=343 loops=1) Filter: ((name)::text = 'IrrPeriodStart'::text) -> Index Scan using devtype_pkey on devtype dt (cost=0.00..0.30 rows=1 width=11) (actual time=0.053..0.055rows=1 loops=10) Index Cond: (dt.id = d.devtypeid) -> Index Scan using pk_task_type on task_type tt (cost=0.00..0.30 rows=1 width=16) (actual time=0.036..0.039rows=1 loops=10) Index Cond: (tt.id = t.tasktypeid) Filter: ((tt.handler = 'modthcswi.so'::text) OR (tt.handler = 'modthcswb.so'::text)) -> Hash Join (cost=55.33..4050.56 rows=211 width=18) (actual time=3.000..66.988 rows=1 loops=10) Hash Cond: ((z.zip)::text = btrim(cd2.string)) -> Seq Scan on zip z (cost=0.00..3729.23 rows=42223 width=20) (actual time=0.017..43.637 rows=42108 loops=10) -> Hash (cost=55.31..55.31 rows=1 width=10) (actual time=0.138..0.138 rows=1 loops=1) -> Bitmap Heap Scan on ctrl_definitions cd2 (cost=4.39..55.31 rows=1 width=10) (actual time=0.089..0.128rows=1 loops=1) Recheck Cond: (ctrlid = 401) Filter: ((name)::text = 'ZIP'::text) -> Bitmap Index Scan on ctrl_def_ctrlid (cost=0.00..4.39 rows=19 width=0) (actual time=0.055..0.055rows=25 loops=1) Index Cond: (ctrlid = 401) -> Index Scan using county_state_fips_inx on county cy (cost=0.00..0.31 rows=1 width=25) (actual time=0.083..0.146 rows=1loops=10) Index Cond: ((cy.state = z.state) AND (cy.countyfips = z.countyfips)) Filter: (date_part('year'::text, now()) = date_part('year'::text, cy.dl_start)) Total runtime: 40073.738 ms (41 rows) I have concentrate my effort on the (double) 'Seq Scan': -> Hash Join (cost=16.65..282.84 rows=429 width=38) (actual time=0.078..6.587 rows=429loops=10) Hash Cond: (t.id = ti.taskid) -> Seq Scan on task t (cost=0.00..260.29 rows=429 width=30) (actual time=0.022..5.089rows=429 loops=10) -> Hash (cost=11.29..11.29 rows=429 width=12) (actual time=0.514..0.514 rows=429loops=1) -> Seq Scan on task_info ti (cost=0.00..11.29 rows=429 width=12) (actualtime=0.020..0.302 rows=429 loops=1) and tried various indexes on the two table 'task' and 'task_info' to help avoid the sequential scans, but at best it givesthe same result, at worst it was 25% slower :-(. The tables aren't that big. I think the largest are the 'zip' and the'county' tables with ~42000 and ~22000 rows, respectively. What is it that I'm missing, Leif -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On 26 Nov 2010, at 6:04, Leif Jensen wrote: Looking at your query, I'm kind of wondering why you're not letting the database do the time-zone calculations? But, that's not what your question was about: > I have concentrate my effort on the (double) 'Seq Scan': > > -> Hash Join (cost=16.65..282.84 rows=429 width=38) (actual time=0.078..6.587 rows=429loops=10) > Hash Cond: (t.id = ti.taskid) > -> Seq Scan on task t (cost=0.00..260.29 rows=429 width=30) (actual time=0.022..5.089rows=429 loops=10) > -> Hash (cost=11.29..11.29 rows=429 width=12) (actual time=0.514..0.514 rows=429loops=1) > -> Seq Scan on task_info ti (cost=0.00..11.29 rows=429 width=12) (actualtime=0.020..0.302 rows=429 loops=1) These aren't your problem, they take up a whole whopping 65 ms. Your problem is caused by the nested loops. See also: http://explain.depesz.com/s/z38 > Nested Loop (cost=284.88..9767.82 rows=1 width=109) (actual time=2515.318..40073.432 rows=10 loops=1) > -> Nested Loop (cost=284.88..9745.05 rows=70 width=102) (actual time=2515.184..40071.697 rows=10 loops=1) > -> Nested Loop (cost=229.56..5692.38 rows=1 width=88) (actual time=2512.044..39401.729 rows=10 loops=1) > -> Nested Loop (cost=229.56..5692.07 rows=1 width=80) (actual time=2511.999..39401.291 rows=10 loops=1) > -> Nested Loop (cost=229.56..5691.76 rows=1 width=77) (actual time=2511.943..39400.680 rows=10 loops=1) > Join Filter: (ti.id = td.taskinfoid) > -> Seq Scan on task_definitions td (cost=0.00..13.68 rows=1 width=22) (actual time=0.204..0.322rows=10 loops=1) > Filter: ((name = 'UseWSData'::text) AND (value = 401) AND (string)::boolean) > -> Hash Left Join (cost=229.56..5672.72 rows=429 width=59) (actual time=7.159..3939.536 rows=429loops=10) > Hash Cond: (d.id = cd.ctrlid) > -> Nested Loop (cost=24.66..5442.80 rows=429 width=55) (actual time=6.797..3937.349 rows=429loops=10) The problem is inside this loop alright, but... This is the patch of lines that you were looking at, it takes around 6.5ms each time it is looped over, which happens only10 times => 65ms total. > -> Hash Join (cost=16.65..282.84 rows=429 width=38) (actual time=0.078..6.587rows=429 loops=10) > Hash Cond: (t.id = ti.taskid) > -> Seq Scan on task t (cost=0.00..260.29 rows=429 width=30) (actual time=0.022..5.089rows=429 loops=10) > -> Hash (cost=11.29..11.29 rows=429 width=12) (actual time=0.514..0.514 rows=429loops=1) > -> Seq Scan on task_info ti (cost=0.00..11.29 rows=429 width=12) (actualtime=0.020..0.302 rows=429 loops=1) This bit here is taking about 9 ms each time it is looped over, which happens over 4000 times => 39s total. That's the major contribution of what makes the outer nested loop jump from 6.5ms to almost 4s! And that loop runs another10 times, putting its total time at 39s. > -> Bitmap Heap Scan on device d (cost=8.01..12.02 rows=1 width=21) (actual time=9.145..9.146rows=1 loops=4290) > Recheck Cond: (d.id = ti.ctrlid) > -> Bitmap Index Scan on pk_device (cost=0.00..8.01 rows=1 width=0) (actualtime=0.463..0.463 rows=1569 loops=4290) > Index Cond: (d.id = ti.ctrlid) In the below the estimates seem to be off. Still, for only 343 rows a seq-scan may be fastest. Is there an index on thatname-column? > -> Hash (cost=202.61..202.61 rows=183 width=8) (actual time=3.534..3.534 rows=343 loops=1) > -> Seq Scan on ctrl_definitions cd (cost=0.00..202.61 rows=183 width=8) (actualtime=0.034..3.298 rows=343 loops=1) > Filter: ((name)::text = 'IrrPeriodStart'::text) > -> Index Scan using devtype_pkey on devtype dt (cost=0.00..0.30 rows=1 width=11) (actual time=0.053..0.055rows=1 loops=10) > Index Cond: (dt.id = d.devtypeid) > -> Index Scan using pk_task_type on task_type tt (cost=0.00..0.30 rows=1 width=16) (actual time=0.036..0.039rows=1 loops=10) > Index Cond: (tt.id = t.tasktypeid) > Filter: ((tt.handler = 'modthcswi.so'::text) OR (tt.handler = 'modthcswb.so'::text)) > -> Hash Join (cost=55.33..4050.56 rows=211 width=18) (actual time=3.000..66.988 rows=1 loops=10) > Hash Cond: ((z.zip)::text = btrim(cd2.string)) > -> Seq Scan on zip z (cost=0.00..3729.23 rows=42223 width=20) (actual time=0.017..43.637 rows=42108 loops=10) > -> Hash (cost=55.31..55.31 rows=1 width=10) (actual time=0.138..0.138 rows=1 loops=1) > -> Bitmap Heap Scan on ctrl_definitions cd2 (cost=4.39..55.31 rows=1 width=10) (actual time=0.089..0.128rows=1 loops=1) > Recheck Cond: (ctrlid = 401) > Filter: ((name)::text = 'ZIP'::text) > -> Bitmap Index Scan on ctrl_def_ctrlid (cost=0.00..4.39 rows=19 width=0) (actual time=0.055..0.055rows=25 loops=1) > Index Cond: (ctrlid = 401) > -> Index Scan using county_state_fips_inx on county cy (cost=0.00..0.31 rows=1 width=25) (actual time=0.083..0.146rows=1 loops=10) > Index Cond: ((cy.state = z.state) AND (cy.countyfips = z.countyfips)) > Filter: (date_part('year'::text, now()) = date_part('year'::text, cy.dl_start)) > Total runtime: 40073.738 ms > (41 rows) > Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4cf1168c802655526680284!