Thread: query gone haywire :)

query gone haywire :)

From

Robin Ericsson

Date:

08 October 2004, 10:13:30

The query have been running ok for some time now, but this morning I
decided to run vacuum analyze (vacuumdb -a -z) on the cluster, and
suddenly the query isn't running very well at all. This query has only
one value in the "IN", if I add another id the query becomes really
really slow.

Query:
SELECT
    data.entered,
    data.machine_id,
    datatemplate_intervals.template_id,
    data_values.value
FROM
    data, data_values, datatemplate_intervals
WHERE
    datatemplate_intervals.id = data_values.template_id AND
    data_values.data_id = data.id AND
    data.machine_id IN (2) AND
    current_timestamp::timestamp - interval '60 seconds' < data.entered

Indexes exists on data_values.template_id, data.entered,
data.machine_id, datatemplate_intervals.machine_id,
datatemplate_intervals.template_id.

Data contains almost 1.5milj entries, and data_values around 9.1milj. As
I write this letter I check the tables in pgAdmin, and it tells me this
for table data """
Rows (estimated) 1
Rows (counted)   1491401
""" even though I run vacuum analyze on the table itself from pgadmin.

Explain analyze result attached as explain-analyze.txt

Explain without analyze when using IN(2,3) attached as explain.txt


Regards,
Robin

Attachment

Re: query gone haywire :)

From

Tom Lane

Date:

08 October 2004, 15:11:44

Robin Ericsson <robin.ericsson@profecta.se> writes:
> The query have been running ok for some time now, but this morning I
> decided to run vacuum analyze (vacuumdb -a -z) on the cluster, and
> suddenly the query isn't running very well at all.

>          ->  Index Scan using idx_d_entered on data  (cost=0.00..18024.04 rows=50360 width=16) (actual
time=0.210..0.247rows=1 loops=1) 
>                Index Cond: (((('now'::text)::timestamp(6) with time zone)::timestamp without time zone -
'00:01:00'::interval)< entered) 

You're running into the well-known problem that the planner can't make
good estimates for index conditions that involve non-constant terms
(such as CURRENT_TIMESTAMP).  Lacking a decent estimate, it guesses that
this scan will produce many more rows than it really will, and so it
tends to favor plans that would be good in that scenario, but are not
optimal for retrieving just a couple of rows.

One workaround is to do the date arithmetic on the client side; another
is to cheat by hiding the arithmetic in a function like "ago(interval)"
that you lyingly claim is IMMUTABLE.  See the pgsql-performance
archives.

            regards, tom lane

Re: query gone haywire :)

From

Robin Ericsson

Date:

12 October 2004, 15:31:20

On Fri, 2004-10-08 at 10:11 -0400, Tom Lane wrote:
> Robin Ericsson <robin.ericsson@profecta.se> writes:
> >          ->  Index Scan using idx_d_entered on data  (cost=0.00..18024.04 rows=50360 width=16) (actual
time=0.210..0.247rows=1 loops=1) 
> >                Index Cond: (((('now'::text)::timestamp(6) with time zone)::timestamp without time zone -
'00:01:00'::interval)< entered) 
>
> You're running into the well-known problem that the planner can't make
> good estimates for index conditions that involve non-constant terms
> (such as CURRENT_TIMESTAMP).  Lacking a decent estimate, it guesses that
> this scan will produce many more rows than it really will, and so it
> tends to favor plans that would be good in that scenario, but are not
> optimal for retrieving just a couple of rows.
>
> One workaround is to do the date arithmetic on the client side; another
> is to cheat by hiding the arithmetic in a function like "ago(interval)"
> that you lyingly claim is IMMUTABLE.  See the pgsql-performance
> archives.

I did run a new explain analyze on the query and found the attached
result.

status=# EXPLAIN ANALYZE
status-# SELECT
status-#         data.entered,
status-#         data.machine_id,
status-#         datatemplate_intervals.template_id,
status-#         data_values.value
status-# FROM
status-#         data, data_values, datatemplate_intervals
status-# WHERE
status-#         datatemplate_intervals.id = data_values.template_id AND
status-#         data_values.data_id = data.id AND
status-#         data.machine_id IN (2,3) AND
status-#         current_timestamp::timestamp - interval '60 seconds' <
data.entered;

It seems very strange that it does a full index scan on idx_dv_data_id.



Regards,
Robin

Attachment

explain2.txt