Home > mailing lists

Re: more problems with count(*) on large table - Mailing list pgsql-general

From	Mike Charnoky
Subject	Re: more problems with count(*) on large table
Date	October 1, 2007 14:00:31
Msg-id	470127A5.1060505@nextbus.com Whole thread Raw
In response to	Re: more problems with count(*) on large table (Gregory Stark <stark@enterprisedb.com>)
Responses	Re: more problems with count() on large table Re: more problems with count() on large table Re: more problems with count() on large table Re: more problems with count() on large table
List	pgsql-general

Tree view

This is strange... count(*) operations over a period of one day's worth
of data now take ~1-2 minutes to run or ~40 minutes.  It seems that the
first time the data is queried it takes about 40 minutes.  If I try the
query again, it finishes in 1-2 minutes!

Again, nothing else is happening on this db server except for a constant
insertion into this table and a few others.  I have done "set statistics
100" for the evtime field in this table.

Here is the output from EXPLAIN ANALYZE.  This is the same query run
back to back, first time takes 42 minutes, second time takes less than 2
minutes!

mydb=# explain analyze select count(*) from prediction_accuracy where
evtime between '2007-09-29' and '2007-09-30';

  QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=3.02..3.03 rows=1 width=0) (actual
time=2549854.351..2549854.352 rows=1 loops=1)
   ->  Index Scan using pred_acc_evtime_index on prediction_accuracy
(cost=0.00..3.02 rows=1 width=0) (actual time=97.676..2532824.892
rows=11423786 loops=1)
         Index Cond: ((evtime >= '2007-09-29 00:00:00-07'::timestamp
with time zone) AND (evtime <= '2007-09-30 00:00:00-07'::timestamp with
time zone))
 Total runtime: 2549854.411 ms
(4 rows)

Time: 2549943.506 ms
mydb=# explain analyze select count(*) from prediction_accuracy where
evtime between '2007-09-29' and '2007-09-30';

 QUERY PLAN


-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=3.02..3.03 rows=1 width=0) (actual
time=111200.943..111200.944 rows=1 loops=1)
   ->  Index Scan using pred_acc_evtime_index on prediction_accuracy
(cost=0.00..3.02 rows=1 width=0) (actual time=36.396..96347.483
rows=11423786 loops=1)
         Index Cond: ((evtime >= '2007-09-29 00:00:00-07'::timestamp
with time zone) AND (evtime <= '2007-09-30 00:00:00-07'::timestamp with
time zone))
 Total runtime: 111201.000 ms
(4 rows)

Time: 111298.695 ms


Mike

Gregory Stark wrote:
> "Mike Charnoky" <noky@nextbus.com> writes:
>
>> I altered the table in question, with "set statistics 100" on the
>> timestamp column, then ran analyze.  This seemed to help somewhat.  Now,
>> queries don't seem to hang, but it still takes a long time to do the count:
>>  * "where evtime between '2007-09-26' and '2007-09-27'"
>>    took 37 minutes to run (result was ~12 million)
>>  * "where evtime between '2007-09-25' and '2007-09-26'"
>>    took 40 minutes to run (result was ~14 million)
>>
>> Still stymied about the seemingly random performance, especially since I
>> have seen this query execute in 2 minutes.
>
>
> And the "explain analyze" for these?
>
> Are you still sure it's certain date ranges which are consistently problems
> and others are consistently fast? Or could it be something unrelated.
>

pgsql-general by date:

From: Nico Sabbi
Date: 01 October 2007, 13:35:14
Subject: Re: row->ARRAY or row->table casting?

From: Gowrishankar
Date: 01 October 2007, 14:15:04
Subject: Data cube in PostgreSQL

Re: more problems with count(*) on large table - Mailing list pgsql-general

Previous

Next