Re: more problems with count(*) on large table - Mailing list pgsql-general

From Mike Charnoky
Subject Re: more problems with count(*) on large table
Date
Msg-id 470127A5.1060505@nextbus.com
Whole thread Raw
In response to Re: more problems with count(*) on large table  (Gregory Stark <stark@enterprisedb.com>)
Responses Re: more problems with count(*) on large table  (Alan Hodgson <ahodgson@simkin.ca>)
Re: more problems with count(*) on large table  (Alvaro Herrera <alvherre@commandprompt.com>)
Re: more problems with count(*) on large table  (Bill Moran <wmoran@potentialtech.com>)
Re: more problems with count(*) on large table  (Gregory Stark <stark@enterprisedb.com>)
List pgsql-general
This is strange... count(*) operations over a period of one day's worth
of data now take ~1-2 minutes to run or ~40 minutes.  It seems that the
first time the data is queried it takes about 40 minutes.  If I try the
query again, it finishes in 1-2 minutes!

Again, nothing else is happening on this db server except for a constant
insertion into this table and a few others.  I have done "set statistics
100" for the evtime field in this table.

Here is the output from EXPLAIN ANALYZE.  This is the same query run
back to back, first time takes 42 minutes, second time takes less than 2
minutes!

mydb=# explain analyze select count(*) from prediction_accuracy where
evtime between '2007-09-29' and '2007-09-30';

  QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=3.02..3.03 rows=1 width=0) (actual
time=2549854.351..2549854.352 rows=1 loops=1)
   ->  Index Scan using pred_acc_evtime_index on prediction_accuracy
(cost=0.00..3.02 rows=1 width=0) (actual time=97.676..2532824.892
rows=11423786 loops=1)
         Index Cond: ((evtime >= '2007-09-29 00:00:00-07'::timestamp
with time zone) AND (evtime <= '2007-09-30 00:00:00-07'::timestamp with
time zone))
 Total runtime: 2549854.411 ms
(4 rows)

Time: 2549943.506 ms
mydb=# explain analyze select count(*) from prediction_accuracy where
evtime between '2007-09-29' and '2007-09-30';

 QUERY PLAN


-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=3.02..3.03 rows=1 width=0) (actual
time=111200.943..111200.944 rows=1 loops=1)
   ->  Index Scan using pred_acc_evtime_index on prediction_accuracy
(cost=0.00..3.02 rows=1 width=0) (actual time=36.396..96347.483
rows=11423786 loops=1)
         Index Cond: ((evtime >= '2007-09-29 00:00:00-07'::timestamp
with time zone) AND (evtime <= '2007-09-30 00:00:00-07'::timestamp with
time zone))
 Total runtime: 111201.000 ms
(4 rows)

Time: 111298.695 ms


Mike

Gregory Stark wrote:
> "Mike Charnoky" <noky@nextbus.com> writes:
>
>> I altered the table in question, with "set statistics 100" on the
>> timestamp column, then ran analyze.  This seemed to help somewhat.  Now,
>> queries don't seem to hang, but it still takes a long time to do the count:
>>  * "where evtime between '2007-09-26' and '2007-09-27'"
>>    took 37 minutes to run (result was ~12 million)
>>  * "where evtime between '2007-09-25' and '2007-09-26'"
>>    took 40 minutes to run (result was ~14 million)
>>
>> Still stymied about the seemingly random performance, especially since I
>> have seen this query execute in 2 minutes.
>
>
> And the "explain analyze" for these?
>
> Are you still sure it's certain date ranges which are consistently problems
> and others are consistently fast? Or could it be something unrelated.
>

pgsql-general by date:

Previous
From: Nico Sabbi
Date:
Subject: Re: row->ARRAY or row->table casting?
Next
From: Gowrishankar
Date:
Subject: Data cube in PostgreSQL