Re: Query running slow - Mailing list pgsql-performance
From | PFC |
---|---|
Subject | Re: Query running slow |
Date | |
Msg-id | op.t9p8e0oocigqcu@apollo13.peufeu.com Whole thread Raw |
In response to | Query running slow ("samantha mahindrakar" <sam.mahindrakar@gmail.com>) |
List | pgsql-performance |
On Wed, 16 Apr 2008 23:14:11 +0200, samantha mahindrakar <sam.mahindrakar@gmail.com> wrote: > Hi.... > Iam finding the following query is working a bit slow: > EXECUTE '(SELECT ARRAY(SELECT DISTINCT date_part(''day'', > measurement_start) > FROM ' || gettablestring(dates)|| ' > WHERE lane_id IN (' || lanesidarr || ')))' > INTO temparr; > > This function is trying to find all the days in a prticular month > whihc has data for the particular lane and put it in an array...which > can be used later. > gettablestring(dates) returns the partition name from which the data > needs to be extracted. These partitions have index on the > measurement_start field. > lanesidarr is a lane number. The partition has an index on this field to. > Could anyone give me some hints???/ OK so I guess you have one partition per month since there is no month in your WHERE. If this is a table which hasn't got much write activity (probably the case for last month's partition, for instance), CLUSTER it on something appropriate that you use often in queries, like lane_id here. And you can use SELECT foo GROUP BY foo, this will use a hash, it is faster than a sort. Example : CREATE TABLE blop AS SELECT '2008-01-01'::TIMESTAMP + ((n%30)*'1 DAY'::INTERVAL) AS t FROM generate_series(1,100000) AS n; ALTER TABLE blop ADD d DATE NULL; UPDATE blop SET d=t; VACUUM FULL ANALYZE blop; -- Now blop contains 100K timestamps and 100K dates from the month 2008-01 EXPLAIN ANALYZE SELECT DISTINCT EXTRACT( DAY from t ) FROM blop; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Unique (cost=10051.82..10551.82 rows=30 width=8) (actual time=221.740..289.801 rows=30 loops=1) -> Sort (cost=10051.82..10301.82 rows=100000 width=8) (actual time=221.737..250.911 rows=100000 loops=1) Sort Key: (date_part('day'::text, t)) Sort Method: quicksort Memory: 5955kB -> Seq Scan on blop (cost=0.00..1747.00 rows=100000 width=8) (actual time=0.021..115.254 rows=100000 loops=1) Total runtime: 290.237 ms (6 lignes) Temps : 290,768 ms EXPLAIN ANALYZE SELECT EXTRACT( DAY from t ) AS day FROM blop GROUP BY day; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=1997.00..1997.38 rows=30 width=8) (actual time=198.375..198.390 rows=30 loops=1) -> Seq Scan on blop (cost=0.00..1747.00 rows=100000 width=8) (actual time=0.021..129.779 rows=100000 loops=1) Total runtime: 198.437 ms (3 lignes) Temps : 198,894 ms ==> Hash is faster than Sort EXPLAIN ANALYZE SELECT d FROM blop GROUP BY d; QUERY PLAN ------------------------------------------------------------------------------------------------------------------ HashAggregate (cost=1747.00..1747.30 rows=30 width=4) (actual time=101.829..101.842 rows=30 loops=1) -> Seq Scan on blop (cost=0.00..1497.00 rows=100000 width=4) (actual time=0.012..33.428 rows=100000 loops=1) Total runtime: 101.905 ms (3 lignes) Temps : 102,516 ms ==> Not computing the EXTRACT is faster obviously (actually EXPLAIN ANALYZE adds some overhead, the query really takes 60 ms) If you have an index lane_id, measurement_date, you can always do : for day in 1..31: find 1 row with which has this day reutrn the days you found
pgsql-performance by date: