Thread: query problem
Hi, I sent this to general earlier but I was redirected to performance. The query have been running ok for quite some time, but after I did a vacuum on the database, it's very very slow. This IN-query is only 2 ids. Before the problem that in was a subselect-query returning around 6-7 ids. The tables included in the query are described in database.txt. status=# select count(id) from data; count --------- 1577621 (1 row) status=# select count(data_id) from data_values; count --------- 9680931 (1 row) I did run a new explain analyze on the query and found the attached result. The obvious problem I see is a full index scan in "idx_dv_data_id". I tried dropping and adding the index again, thats why is't called "idx_data_values_data_id" in the dump. status=# EXPLAIN ANALYZE status-# SELECT status-# data.entered, status-# data.machine_id, status-# datatemplate_intervals.template_id, status-# data_values.value status-# FROM status-# data, data_values, datatemplate_intervals status-# WHERE status-# datatemplate_intervals.id = data_values.template_id AND status-# data_values.data_id = data.id AND status-# data.machine_id IN (2,3) AND status-# current_timestamp::timestamp - interval '60 seconds' < data.entered; Regards, Robin -- Robin Ericsson <robin.ericsson@profecta.se> Profecta HB
Attachment
On Wed, 2004-10-13 at 02:21, Robin Ericsson wrote: > Hi, > > I sent this to general earlier but I was redirected to performance. > > The query have been running ok for quite some time, but after I did a > vacuum on the database, it's very very slow. Did you do a VACUUM FULL ANALYZE on the database or just a VACUUM? It looks like your statistics in your query are all off which ANALYZE should fix. > This IN-query is only 2 > ids. Before the problem that in was a subselect-query returning around > 6-7 ids. The tables included in the query are described in database.txt. > > status=# select count(id) from data; > count > --------- > 1577621 > (1 row) > > status=# select count(data_id) from data_values; > count > --------- > 9680931 > (1 row) > > I did run a new explain analyze on the query and found the attached > result. The obvious problem I see is a full index scan in > "idx_dv_data_id". I tried dropping and adding the index again, thats why > is't called "idx_data_values_data_id" in the dump. > > status=# EXPLAIN ANALYZE > status-# SELECT > status-# data.entered, > status-# data.machine_id, > status-# datatemplate_intervals.template_id, > status-# data_values.value > status-# FROM > status-# data, data_values, datatemplate_intervals > status-# WHERE > status-# datatemplate_intervals.id = data_values.template_id AND > status-# data_values.data_id = data.id AND > status-# data.machine_id IN (2,3) AND > status-# current_timestamp::timestamp - interval '60 seconds' < > data.entered; > > > > Regards, > Robin >
Robin Ericsson <robin.ericsson@profecta.se> writes: > I sent this to general earlier but I was redirected to performance. Actually, I think I suggested that you consult the pgsql-performance archives, where this type of problem has been hashed out before. See for instance this thread: http://archives.postgresql.org/pgsql-performance/2004-07/msg00169.php particularly http://archives.postgresql.org/pgsql-performance/2004-07/msg00175.php http://archives.postgresql.org/pgsql-performance/2004-07/msg00184.php http://archives.postgresql.org/pgsql-performance/2004-07/msg00185.php which show three different ways of getting the planner to do something sane with an index range bound like "now() - interval". regards, tom lane
On Wed, 2004-10-13 at 11:03 -0400, Tom Lane wrote: > Robin Ericsson <robin.ericsson@profecta.se> writes: > > I sent this to general earlier but I was redirected to performance. > > Actually, I think I suggested that you consult the pgsql-performance > archives, where this type of problem has been hashed out before. > See for instance this thread: > http://archives.postgresql.org/pgsql-performance/2004-07/msg00169.php > particularly > http://archives.postgresql.org/pgsql-performance/2004-07/msg00175.php > http://archives.postgresql.org/pgsql-performance/2004-07/msg00184.php > http://archives.postgresql.org/pgsql-performance/2004-07/msg00185.php > which show three different ways of getting the planner to do something > sane with an index range bound like "now() - interval". Using exact timestamp makes the query go back as it should in speed (see explain below). However I still have the problem using a stored procedure or even using the "ago"-example from above. regards, Robin status=# explain analyse status-# SELECT status-# data.entered, status-# data.machine_id, status-# datatemplate_intervals.template_id, status-# data_values.value status-# FROM status-# data, data_values, datatemplate_intervals status-# WHERE status-# datatemplate_intervals.id = data_values.template_id AND status-# data_values.data_id = data.id AND status-# data.machine_id IN (SELECT machine_id FROM machine_group_xref WHERE group_id = 1) AND status-# '2004-10-13 17:47:36.902062' < data.entered status-# ; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=3.09..481.28 rows=777 width=24) (actual time=0.637..1.804 rows=57 loops=1) Hash Cond: ("outer".template_id = "inner".id) -> Nested Loop (cost=1.17..467.71 rows=776 width=24) (actual time=0.212..1.012 rows=57 loops=1) -> Hash IN Join (cost=1.17..9.56 rows=146 width=16) (actual time=0.165..0.265 rows=9 loops=1) Hash Cond: ("outer".machine_id = "inner".machine_id) -> Index Scan using idx_d_entered on data (cost=0.00..6.14 rows=159 width=16) (actual time=0.051..0.097 rows=10 loops=1) Index Cond: ('2004-10-13 17:47:36.902062'::timestamp without time zone < entered) -> Hash (cost=1.14..1.14 rows=11 width=4) (actual time=0.076..0.076 rows=0 loops=1) -> Seq Scan on machine_group_xref (cost=0.00..1.14 rows=11 width=4) (actual time=0.017..0.054 rows=11 loops=1) Filter: (group_id = 1) -> Index Scan using idx_data_values_data_id on data_values (cost=0.00..3.07 rows=5 width=16) (actual time=0.018..0.047 rows=6 loops=9) Index Cond: (data_values.data_id = "outer".id) -> Hash (cost=1.74..1.74 rows=74 width=8) (actual time=0.382..0.382 rows=0 loops=1) -> Seq Scan on datatemplate_intervals (cost=0.00..1.74 rows=74 width=8) (actual time=0.024..0.248 rows=74 loops=1) Total runtime: 2.145 ms (15 rows)