Re: more problems with count(*) on large table - Mailing list pgsql-general

From Mike Charnoky
Subject Re: more problems with count(*) on large table
Date
Msg-id 47010B4F.70208@nextbus.com
Whole thread Raw
In response to Re: more problems with count(*) on large table  (Gregory Stark <stark@enterprisedb.com>)
Responses Re: more problems with count(*) on large table
Re: more problems with count(*) on large table
List pgsql-general
I altered the table in question, with "set statistics 100" on the
timestamp column, then ran analyze.  This seemed to help somewhat.  Now,
queries don't seem to hang, but it still takes a long time to do the count:
 * "where evtime between '2007-09-26' and '2007-09-27'"
   took 37 minutes to run (result was ~12 million)
 * "where evtime between '2007-09-25' and '2007-09-26'"
   took 40 minutes to run (result was ~14 million)

Still stymied about the seemingly random performance, especially since I
have seen this query execute in 2 minutes.

Nothing should be locking the table, the only things happening with this
database are: periodic autovacuums and a process which is constantly
inserting data into the large table in question (and a few other smaller
logging tables).

FYI: Here is the output from explain (I never knew about "\timing" in
psql, that is a big help!).  This was run immediately after I ran the
query a first time (without using the prepare and explain)... This
second run took a little over a minute!  Bizarre...

mydb=# prepare stmt as select count(*) from prediction_accuracy where
evtime between '2007-09-25' and '2007-09-26';
PREPARE
Time: 90.854 ms
mydb=# explain execute stmt;

QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=642338.27..642338.28 rows=1 width=0)
   ->  Index Scan using pred_acc_evtime_index on prediction_accuracy
(cost=0.00..600219.37 rows=16847557 width=0)
         Index Cond: ((evtime >= '2007-09-25 00:00:00-07'::timestamp
with time zone) AND (evtime <= '2007-09-26 00:00:00-07'::timestamp with
time zone))
(3 rows)

Time: 131.559 ms
mydb# execute stmt;
  count
----------
 14150928
(1 row)

Time: 101721.346 ms


Mike

Gregory Stark wrote:
> "Alban Hertroys" <a.hertroys@magproductions.nl> writes:
>
>> Mike Charnoky wrote:
>>> With respect to the ALTER TABLE SET STATISTICS... how do I determine a
>>> good value to use?  This wasn't really clear in the pg docs.  Also, do I
>>> need to run ANALYZE on the table after I change the statistics?
>>>
>>> Here are the EXPLAINs from the queries:
>>>
>>> db=# explain select count(*) from prediction_accuracy where evtime
>>> between '2007-09-25' and '2007-09-26';
>>>
>>> QUERY PLAN
>>>
>>>
---------------------------------------------------------------------------------------------------------------------------------------------------------
>>>  Aggregate  (cost=475677.40..475677.41 rows=1 width=0)
>>>    ->  Index Scan using pred_acc_evtime_index on prediction_accuracy
>>> (cost=0.00..444451.44 rows=12490383 width=0)
>>>          Index Cond: ((evtime >= '2007-09-25 00:00:00-07'::timestamp
>>> with time zone) AND (evtime <= '2007-09-26 00:00:00-07'::timestamp with
>>> time zone))
>>> (3 rows)
>>>
>>> db=# explain select count(*) from prediction_accuracy where evtime
>>> between '2007-09-26' and '2007-09-27';
>>>
>>> QUERY PLAN
>>>
>>>
---------------------------------------------------------------------------------------------------------------------------------------------------------
>>>  Aggregate  (cost=486615.04..486615.05 rows=1 width=0)
>>>    ->  Index Scan using pred_acc_evtime_index on prediction_accuracy
>>> (cost=0.00..454671.07 rows=12777586 width=0)
>>>          Index Cond: ((evtime >= '2007-09-26 00:00:00-07'::timestamp
>>> with time zone) AND (evtime <= '2007-09-27 00:00:00-07'::timestamp with
>>> time zone))
>>> (3 rows)
>> Interesting, same plans and no sequential scans... Yet totally different
>> run times. Almost as if something prevents you to read some records
>> between 26 and 27 september...
>
> Just to be sure we're looking at the right plan do this:
>
> \timing
> PREPARE stmt AS
>  SELECT count(*)
>    FROM prediction_accuracy
>   WHERE evtime BETWEEN '2007-09-25' AND '2007-09-26';
>
> EXPLAIN EXECUTE stmt;
> EXECUTE stmt;
>
>> I'm no expert on locking in Postgres, but AFAIK locks that prevent you
>> from reading records are rather rare and probably only issued from
>> userland code.
>
> Pages can be locked but never for very long.
>
> What other work is going on in this server? Is there anything which might be
> locking the table periodically (either a VACUUM FULL, CLUSTER, ALTER TABLE,
> etc?)
>
> Perhaps there's a hardware problem, is there anything in your system logs from
> dmesg?
>

pgsql-general by date:

Previous
From: Alban Hertroys
Date:
Subject: Re: Normalized Tables & SELECT [was: Find "smallest common year"]
Next
From: Stefan Schwarzer
Date:
Subject: Re: Normalized Tables & SELECT [was: Find "smallest common year"]