Re: Query running slow - Mailing list pgsql-performance

From PFC
Subject Re: Query running slow
Date
Msg-id op.t9p8e0oocigqcu@apollo13.peufeu.com
Whole thread Raw
In response to Query running slow  ("samantha mahindrakar" <sam.mahindrakar@gmail.com>)
List pgsql-performance
On Wed, 16 Apr 2008 23:14:11 +0200, samantha mahindrakar
<sam.mahindrakar@gmail.com> wrote:

> Hi....
> Iam finding the following query is working a bit slow:
> EXECUTE '(SELECT ARRAY(SELECT DISTINCT date_part(''day'',
> measurement_start)
> FROM ' || gettablestring(dates)|| '
> WHERE lane_id IN (' || lanesidarr || ')))'
> INTO temparr;
>
> This function is trying to find all the days in a prticular month
> whihc has data for the particular lane and put it in an array...which
> can be used later.
> gettablestring(dates) returns the partition name from which the data
> needs to be extracted. These partitions have index on the
> measurement_start field.
> lanesidarr is a lane number. The partition has an index on this field to.
> Could anyone give me some hints???/

    OK so I guess you have one partition per month since there is no month in
your WHERE.
    If this is a table which hasn't got much write activity (probably the
case for last month's partition, for instance), CLUSTER it on something
appropriate that you use often in queries, like lane_id here.
    And you can use SELECT foo GROUP BY foo, this will use a hash, it is
faster than a sort.
    Example :

CREATE TABLE blop AS SELECT '2008-01-01'::TIMESTAMP + ((n%30)*'1
DAY'::INTERVAL) AS t FROM generate_series(1,100000) AS n;
ALTER TABLE blop ADD d DATE NULL;
UPDATE blop SET d=t;
VACUUM FULL ANALYZE blop;

-- Now blop contains 100K timestamps and 100K dates from the month 2008-01

EXPLAIN ANALYZE SELECT DISTINCT EXTRACT( DAY from t )  FROM blop;
                                                        QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------
  Unique  (cost=10051.82..10551.82 rows=30 width=8) (actual
time=221.740..289.801 rows=30 loops=1)
    ->  Sort  (cost=10051.82..10301.82 rows=100000 width=8) (actual
time=221.737..250.911 rows=100000 loops=1)
          Sort Key: (date_part('day'::text, t))
          Sort Method:  quicksort  Memory: 5955kB
          ->  Seq Scan on blop  (cost=0.00..1747.00 rows=100000 width=8)
(actual time=0.021..115.254 rows=100000 loops=1)
  Total runtime: 290.237 ms
(6 lignes)

Temps : 290,768 ms

EXPLAIN ANALYZE SELECT EXTRACT( DAY from t ) AS day FROM blop GROUP BY day;
                                                     QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
  HashAggregate  (cost=1997.00..1997.38 rows=30 width=8) (actual
time=198.375..198.390 rows=30 loops=1)
    ->  Seq Scan on blop  (cost=0.00..1747.00 rows=100000 width=8) (actual
time=0.021..129.779 rows=100000 loops=1)
  Total runtime: 198.437 ms
(3 lignes)

Temps : 198,894 ms

==> Hash is faster than Sort

EXPLAIN ANALYZE SELECT d FROM blop GROUP BY d;
                                                     QUERY PLAN
------------------------------------------------------------------------------------------------------------------
  HashAggregate  (cost=1747.00..1747.30 rows=30 width=4) (actual
time=101.829..101.842 rows=30 loops=1)
    ->  Seq Scan on blop  (cost=0.00..1497.00 rows=100000 width=4) (actual
time=0.012..33.428 rows=100000 loops=1)
  Total runtime: 101.905 ms
(3 lignes)

Temps : 102,516 ms

==> Not computing the EXTRACT is faster obviously

(actually EXPLAIN ANALYZE adds some overhead, the query really takes 60 ms)


    If you have an index lane_id, measurement_date, you can always do :

for day in 1..31:
    find 1 row with which has this day
reutrn the days you found











pgsql-performance by date:

Previous
From: Thomas Spreng
Date:
Subject: Re: Oddly slow queries
Next
From: Craig James
Date:
Subject: Re: Anybody using the Dell Powervault MD3000 array?