Thread: query problem

query problem

From
Robin Ericsson
Date:
Hi,

I sent this to general earlier but I was redirected to performance.

The query have been running ok for quite some time, but after I did a
vacuum on the database, it's very very slow. This IN-query is only 2
ids. Before the problem that in was a subselect-query returning around
6-7 ids. The tables included in the query are described in database.txt.

status=# select count(id) from data;
  count
---------
 1577621
(1 row)

status=# select count(data_id) from data_values;
  count
---------
 9680931
(1 row)

I did run a new explain analyze on the query and found the attached
result. The obvious problem I see is a full index scan in
"idx_dv_data_id". I tried dropping and adding the index again, thats why
is't called "idx_data_values_data_id" in the dump.

status=# EXPLAIN ANALYZE
status-# SELECT
status-#         data.entered,
status-#         data.machine_id,
status-#         datatemplate_intervals.template_id,
status-#         data_values.value
status-# FROM
status-#         data, data_values, datatemplate_intervals
status-# WHERE
status-#         datatemplate_intervals.id = data_values.template_id AND
status-#         data_values.data_id = data.id AND
status-#         data.machine_id IN (2,3) AND
status-#         current_timestamp::timestamp - interval '60 seconds' <
data.entered;



Regards,
Robin


--
Robin Ericsson <robin.ericsson@profecta.se>
Profecta HB

Attachment

Re: query problem

From
ken
Date:
On Wed, 2004-10-13 at 02:21, Robin Ericsson wrote:
> Hi,
>
> I sent this to general earlier but I was redirected to performance.
>
> The query have been running ok for quite some time, but after I did a
> vacuum on the database, it's very very slow.

Did you do a VACUUM FULL ANALYZE on the database or just a VACUUM?  It
looks like your statistics in your query are all off which ANALYZE
should fix.



> This IN-query is only 2
> ids. Before the problem that in was a subselect-query returning around
> 6-7 ids. The tables included in the query are described in database.txt.
>
> status=# select count(id) from data;
>   count
> ---------
>  1577621
> (1 row)
>
> status=# select count(data_id) from data_values;
>   count
> ---------
>  9680931
> (1 row)
>
> I did run a new explain analyze on the query and found the attached
> result. The obvious problem I see is a full index scan in
> "idx_dv_data_id". I tried dropping and adding the index again, thats why
> is't called "idx_data_values_data_id" in the dump.
>
> status=# EXPLAIN ANALYZE
> status-# SELECT
> status-#         data.entered,
> status-#         data.machine_id,
> status-#         datatemplate_intervals.template_id,
> status-#         data_values.value
> status-# FROM
> status-#         data, data_values, datatemplate_intervals
> status-# WHERE
> status-#         datatemplate_intervals.id = data_values.template_id AND
> status-#         data_values.data_id = data.id AND
> status-#         data.machine_id IN (2,3) AND
> status-#         current_timestamp::timestamp - interval '60 seconds' <
> data.entered;
>
>
>
> Regards,
> Robin
>


Re: query problem

From
Tom Lane
Date:
Robin Ericsson <robin.ericsson@profecta.se> writes:
> I sent this to general earlier but I was redirected to performance.

Actually, I think I suggested that you consult the pgsql-performance
archives, where this type of problem has been hashed out before.
See for instance this thread:
http://archives.postgresql.org/pgsql-performance/2004-07/msg00169.php
particularly
http://archives.postgresql.org/pgsql-performance/2004-07/msg00175.php
http://archives.postgresql.org/pgsql-performance/2004-07/msg00184.php
http://archives.postgresql.org/pgsql-performance/2004-07/msg00185.php
which show three different ways of getting the planner to do something
sane with an index range bound like "now() - interval".

            regards, tom lane

Re: query problem

From
Robin Ericsson
Date:
On Wed, 2004-10-13 at 11:03 -0400, Tom Lane wrote:
> Robin Ericsson <robin.ericsson@profecta.se> writes:
> > I sent this to general earlier but I was redirected to performance.
>
> Actually, I think I suggested that you consult the pgsql-performance
> archives, where this type of problem has been hashed out before.
> See for instance this thread:
> http://archives.postgresql.org/pgsql-performance/2004-07/msg00169.php
> particularly
> http://archives.postgresql.org/pgsql-performance/2004-07/msg00175.php
> http://archives.postgresql.org/pgsql-performance/2004-07/msg00184.php
> http://archives.postgresql.org/pgsql-performance/2004-07/msg00185.php
> which show three different ways of getting the planner to do something
> sane with an index range bound like "now() - interval".

Using exact timestamp makes the query go back as it should in speed (see
explain below). However I still have the problem using a stored
procedure or even using the "ago"-example from above.




regards,
Robin

status=# explain analyse
status-# SELECT
status-#                 data.entered,
status-#                 data.machine_id,
status-#                 datatemplate_intervals.template_id,
status-#                 data_values.value
status-#         FROM
status-#                 data, data_values, datatemplate_intervals
status-#         WHERE
status-#                 datatemplate_intervals.id =
data_values.template_id AND
status-#                 data_values.data_id = data.id AND
status-#                 data.machine_id IN (SELECT machine_id FROM
machine_group_xref WHERE group_id = 1) AND
status-#                 '2004-10-13 17:47:36.902062' < data.entered
status-# ;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=3.09..481.28 rows=777 width=24) (actual
time=0.637..1.804 rows=57 loops=1)
   Hash Cond: ("outer".template_id = "inner".id)
   ->  Nested Loop  (cost=1.17..467.71 rows=776 width=24) (actual
time=0.212..1.012 rows=57 loops=1)
         ->  Hash IN Join  (cost=1.17..9.56 rows=146 width=16) (actual
time=0.165..0.265 rows=9 loops=1)
               Hash Cond: ("outer".machine_id = "inner".machine_id)
               ->  Index Scan using idx_d_entered on data
(cost=0.00..6.14 rows=159 width=16) (actual time=0.051..0.097 rows=10
loops=1)
                     Index Cond: ('2004-10-13
17:47:36.902062'::timestamp without time zone < entered)
               ->  Hash  (cost=1.14..1.14 rows=11 width=4) (actual
time=0.076..0.076 rows=0 loops=1)
                     ->  Seq Scan on machine_group_xref
(cost=0.00..1.14 rows=11 width=4) (actual time=0.017..0.054 rows=11
loops=1)
                           Filter: (group_id = 1)
         ->  Index Scan using idx_data_values_data_id on data_values
(cost=0.00..3.07 rows=5 width=16) (actual time=0.018..0.047 rows=6
loops=9)
               Index Cond: (data_values.data_id = "outer".id)
   ->  Hash  (cost=1.74..1.74 rows=74 width=8) (actual time=0.382..0.382
rows=0 loops=1)
         ->  Seq Scan on datatemplate_intervals  (cost=0.00..1.74
rows=74 width=8) (actual time=0.024..0.248 rows=74 loops=1)
 Total runtime: 2.145 ms
(15 rows)