Thread: query gone haywire :)
The query have been running ok for some time now, but this morning I decided to run vacuum analyze (vacuumdb -a -z) on the cluster, and suddenly the query isn't running very well at all. This query has only one value in the "IN", if I add another id the query becomes really really slow. Query: SELECT data.entered, data.machine_id, datatemplate_intervals.template_id, data_values.value FROM data, data_values, datatemplate_intervals WHERE datatemplate_intervals.id = data_values.template_id AND data_values.data_id = data.id AND data.machine_id IN (2) AND current_timestamp::timestamp - interval '60 seconds' < data.entered Indexes exists on data_values.template_id, data.entered, data.machine_id, datatemplate_intervals.machine_id, datatemplate_intervals.template_id. Data contains almost 1.5milj entries, and data_values around 9.1milj. As I write this letter I check the tables in pgAdmin, and it tells me this for table data """ Rows (estimated) 1 Rows (counted) 1491401 """ even though I run vacuum analyze on the table itself from pgadmin. Explain analyze result attached as explain-analyze.txt Explain without analyze when using IN(2,3) attached as explain.txt Regards, Robin
Attachment
Robin Ericsson <robin.ericsson@profecta.se> writes: > The query have been running ok for some time now, but this morning I > decided to run vacuum analyze (vacuumdb -a -z) on the cluster, and > suddenly the query isn't running very well at all. > -> Index Scan using idx_d_entered on data (cost=0.00..18024.04 rows=50360 width=16) (actual time=0.210..0.247rows=1 loops=1) > Index Cond: (((('now'::text)::timestamp(6) with time zone)::timestamp without time zone - '00:01:00'::interval)< entered) You're running into the well-known problem that the planner can't make good estimates for index conditions that involve non-constant terms (such as CURRENT_TIMESTAMP). Lacking a decent estimate, it guesses that this scan will produce many more rows than it really will, and so it tends to favor plans that would be good in that scenario, but are not optimal for retrieving just a couple of rows. One workaround is to do the date arithmetic on the client side; another is to cheat by hiding the arithmetic in a function like "ago(interval)" that you lyingly claim is IMMUTABLE. See the pgsql-performance archives. regards, tom lane
On Fri, 2004-10-08 at 10:11 -0400, Tom Lane wrote: > Robin Ericsson <robin.ericsson@profecta.se> writes: > > -> Index Scan using idx_d_entered on data (cost=0.00..18024.04 rows=50360 width=16) (actual time=0.210..0.247rows=1 loops=1) > > Index Cond: (((('now'::text)::timestamp(6) with time zone)::timestamp without time zone - '00:01:00'::interval)< entered) > > You're running into the well-known problem that the planner can't make > good estimates for index conditions that involve non-constant terms > (such as CURRENT_TIMESTAMP). Lacking a decent estimate, it guesses that > this scan will produce many more rows than it really will, and so it > tends to favor plans that would be good in that scenario, but are not > optimal for retrieving just a couple of rows. > > One workaround is to do the date arithmetic on the client side; another > is to cheat by hiding the arithmetic in a function like "ago(interval)" > that you lyingly claim is IMMUTABLE. See the pgsql-performance > archives. I did run a new explain analyze on the query and found the attached result. status=# EXPLAIN ANALYZE status-# SELECT status-# data.entered, status-# data.machine_id, status-# datatemplate_intervals.template_id, status-# data_values.value status-# FROM status-# data, data_values, datatemplate_intervals status-# WHERE status-# datatemplate_intervals.id = data_values.template_id AND status-# data_values.data_id = data.id AND status-# data.machine_id IN (2,3) AND status-# current_timestamp::timestamp - interval '60 seconds' < data.entered; It seems very strange that it does a full index scan on idx_dv_data_id. Regards, Robin