Thread: more problems with count(*) on large table

more problems with count(*) on large table

From
Mike Charnoky
Date:
Hi,

I am still having problems performing a count(*) on a large table.  This
is a followup from a recent thread:

http://archives.postgresql.org/pgsql-general/2007-09/msg00561.php

Since the last time these problems happened, we have tweaked some
postgresql config parameters (fsm, etc).  I also recreated the large
table, with the assumption it was somehow corrupted.

Now, certain count(*) queries are failing to complete for certain time
ranges (I killed the query after about 24 hours).  The table is indexed
on a timestamp field.  Here is one query that hangs:

select count(*) from mytable where evtime between '2007-09-26' and
'2007-09-27';

However, this query runs successfully and takes 2 minutes:

select count(*) from mytable where evtime between '2007-09-25' and
'2007-09-26';

  count
----------
 14150928
(1 row)

What is going on?  I analyzed the table before running the query and
have no reason to believe that the amount of data added to the table
varies much from day to day.  No data has been deleted from the table
yet, just added.

Here is some config info:

PostgreSQL 8.1.8 on Fedora 3

shared_buffers = 8000
temp_buffers = 1000
work_mem = 16384
maintenance_work_mem = 262144
max_fsm_pages = 500000
max_fsm_relations = 30000


Mike

Re: more problems with count(*) on large table

From
"A. Kretschmer"
Date:
am  Fri, dem 28.09.2007, um 11:56:46 -0400 mailte Mike Charnoky folgendes:
> Hi,
>
> I am still having problems performing a count(*) on a large table.  This
>
> Now, certain count(*) queries are failing to complete for certain time
> ranges (I killed the query after about 24 hours).  The table is indexed
> on a timestamp field.  Here is one query that hangs:

Again: an index can't help! Because of MVCC: 'select count(*)' without
WHERE-condition forces an seq. table-scan.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: more problems with count(*) on large table

From
Alvaro Herrera
Date:
A. Kretschmer wrote:
> am  Fri, dem 28.09.2007, um 11:56:46 -0400 mailte Mike Charnoky folgendes:
> > Hi,
> >
> > I am still having problems performing a count(*) on a large table.  This
> >
> > Now, certain count(*) queries are failing to complete for certain time
> > ranges (I killed the query after about 24 hours).  The table is indexed
> > on a timestamp field.  Here is one query that hangs:
>
> Again: an index can't help! Because of MVCC: 'select count(*)' without
> WHERE-condition forces an seq. table-scan.

But he does have a WHERE condition.  THe problem is, probably, that the
condition is not selective enough so the planner chooses to do a
seqscan.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: more problems with count(*) on large table

From
Andrew Sullivan
Date:
On Fri, Sep 28, 2007 at 12:50:34PM -0400, Alvaro Herrera wrote:
>
> But he does have a WHERE condition.  THe problem is, probably, that the
> condition is not selective enough so the planner chooses to do a
> seqscan.

Or else the planner has a bad idea of how selective the condition is.
I've found that this sort of inconsistent selectivity on an indexed
column can be because the column statistics aren't good enough.  You
can try ALTER TABLE...SET STATISTICS to see if it helps.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
This work was visionary and imaginative, and goes to show that visionary
and imaginative work need not end up well.
        --Dennis Ritchie

Re: more problems with count(*) on large table

From
"A. Kretschmer"
Date:
am  Fri, dem 28.09.2007, um 12:50:34 -0400 mailte Alvaro Herrera folgendes:
> A. Kretschmer wrote:
> > am  Fri, dem 28.09.2007, um 11:56:46 -0400 mailte Mike Charnoky folgendes:
> > > Hi,
> > >
> > > I am still having problems performing a count(*) on a large table.  This
> > >
> > > Now, certain count(*) queries are failing to complete for certain time
> > > ranges (I killed the query after about 24 hours).  The table is indexed
> > > on a timestamp field.  Here is one query that hangs:
> >
> > Again: an index can't help! Because of MVCC: 'select count(*)' without
> > WHERE-condition forces an seq. table-scan.
>
> But he does have a WHERE condition.  THe problem is, probably, that the
> condition is not selective enough so the planner chooses to do a
> seqscan.

Yes, i'm sorry, i havn't recognize this. Maybe wrong statistics. As
Andrew suggested, a 'ALTER TABLE...SET STATISTICS' can help.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: more problems with count(*) on large table

From
Gregory Stark
Date:
"Alvaro Herrera" <alvherre@commandprompt.com> writes:

> A. Kretschmer wrote:
>> am  Fri, dem 28.09.2007, um 11:56:46 -0400 mailte Mike Charnoky folgendes:
>> > Hi,
>> >
>> > I am still having problems performing a count(*) on a large table.  This
>> >
>> > Now, certain count(*) queries are failing to complete for certain time
>> > ranges (I killed the query after about 24 hours).  The table is indexed
>> > on a timestamp field.  Here is one query that hangs:
>>
>> Again: an index can't help! Because of MVCC: 'select count(*)' without
>> WHERE-condition forces an seq. table-scan.
>
> But he does have a WHERE condition.  THe problem is, probably, that the
> condition is not selective enough so the planner chooses to do a
> seqscan.

What does EXPLAIN SELECT ... say?



--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: more problems with count(*) on large table

From
Sean Davis
Date:
Mike Charnoky wrote:
> Hi,
>
> I am still having problems performing a count(*) on a large table.  This
> is a followup from a recent thread:
>
> http://archives.postgresql.org/pgsql-general/2007-09/msg00561.php
>
> Since the last time these problems happened, we have tweaked some
> postgresql config parameters (fsm, etc).  I also recreated the large
> table, with the assumption it was somehow corrupted.
>
> Now, certain count(*) queries are failing to complete for certain time
> ranges (I killed the query after about 24 hours).  The table is indexed
> on a timestamp field.  Here is one query that hangs:
>
> select count(*) from mytable where evtime between '2007-09-26' and
> '2007-09-27';
>
> However, this query runs successfully and takes 2 minutes:
>
> select count(*) from mytable where evtime between '2007-09-25' and
> '2007-09-26';
>
>   count
> ----------
>  14150928
> (1 row)
>
> What is going on?  I analyzed the table before running the query and
> have no reason to believe that the amount of data added to the table
> varies much from day to day.  No data has been deleted from the table
> yet, just added.
>
> Here is some config info:
>
> PostgreSQL 8.1.8 on Fedora 3

Have you vacuumed recently?

Sean


Re: more problems with count(*) on large table

From
Mike Charnoky
Date:
The autovacuum is turned on.  Since this is pg8.1, I don't know when the
table was actually last vacuumed.  I *did* run analyze on the table,
though.  Also, nothing has been deleted in this table... so vacuum
should have no affect, right?


Mike

Sean Davis wrote:
> Mike Charnoky wrote:
>> Hi,
>>
>> I am still having problems performing a count(*) on a large table.
> Have you vacuumed recently?
>
> Sean

Re: more problems with count(*) on large table

From
Bill Moran
Date:
In response to Mike Charnoky <noky@nextbus.com>:

> The autovacuum is turned on.  Since this is pg8.1, I don't know when the
> table was actually last vacuumed.  I *did* run analyze on the table,
> though.  Also, nothing has been deleted in this table... so vacuum
> should have no affect, right?

Updated rows also produce dead tuples that require vacuuming.  If the
table is insert only, you don't need vacuum.

--
Bill Moran
http://www.potentialtech.com

Re: more problems with count(*) on large table

From
Mike Charnoky
Date:
With respect to the ALTER TABLE SET STATISTICS... how do I determine a
good value to use?  This wasn't really clear in the pg docs.  Also, do I
need to run ANALYZE on the table after I change the statistics?

Here are the EXPLAINs from the queries:

db=# explain select count(*) from prediction_accuracy where evtime
between '2007-09-25' and '2007-09-26';

QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=475677.40..475677.41 rows=1 width=0)
   ->  Index Scan using pred_acc_evtime_index on prediction_accuracy
(cost=0.00..444451.44 rows=12490383 width=0)
         Index Cond: ((evtime >= '2007-09-25 00:00:00-07'::timestamp
with time zone) AND (evtime <= '2007-09-26 00:00:00-07'::timestamp with
time zone))
(3 rows)

db=# explain select count(*) from prediction_accuracy where evtime
between '2007-09-26' and '2007-09-27';

QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=486615.04..486615.05 rows=1 width=0)
   ->  Index Scan using pred_acc_evtime_index on prediction_accuracy
(cost=0.00..454671.07 rows=12777586 width=0)
         Index Cond: ((evtime >= '2007-09-26 00:00:00-07'::timestamp
with time zone) AND (evtime <= '2007-09-27 00:00:00-07'::timestamp with
time zone))
(3 rows)


Mike

Gregory Stark wrote:
> "Alvaro Herrera" <alvherre@commandprompt.com> writes:
>
>> A. Kretschmer wrote:
>>> am  Fri, dem 28.09.2007, um 11:56:46 -0400 mailte Mike Charnoky folgendes:
>>>> Hi,
>>>>
>>>> I am still having problems performing a count(*) on a large table.  This
>>>>
>>>> Now, certain count(*) queries are failing to complete for certain time
>>>> ranges (I killed the query after about 24 hours).  The table is indexed
>>>> on a timestamp field.  Here is one query that hangs:
>>> Again: an index can't help! Because of MVCC: 'select count(*)' without
>>> WHERE-condition forces an seq. table-scan.
>> But he does have a WHERE condition.  THe problem is, probably, that the
>> condition is not selective enough so the planner chooses to do a
>> seqscan.
>
> What does EXPLAIN SELECT ... say?
>
>
>

Re: more problems with count(*) on large table

From
"Scott Marlowe"
Date:
On 9/28/07, Mike Charnoky <noky@nextbus.com> wrote:
> Hi,
>
> I am still having problems performing a count(*) on a large table.  This
> is a followup from a recent thread:
>
> http://archives.postgresql.org/pgsql-general/2007-09/msg00561.php
>
> Since the last time these problems happened, we have tweaked some
> postgresql config parameters (fsm, etc).  I also recreated the large
> table, with the assumption it was somehow corrupted.
>
> Now, certain count(*) queries are failing to complete for certain time
> ranges (I killed the query after about 24 hours).  The table is indexed
> on a timestamp field.  Here is one query that hangs:
>
> select count(*) from mytable where evtime between '2007-09-26' and
> '2007-09-27';
>
> However, this query runs successfully and takes 2 minutes:
>
> select count(*) from mytable where evtime between '2007-09-25' and
> '2007-09-26';

I had a similar problem pop up recently in my medium large reporting
database.  I found the problem solved by two things, one was upping
the stats target on the time column to 100, and the other was running
regularly scheduled analyzes on the large reporting table, about once
every 8 hours.

autovacuum works fine otherwise, but was not running analyze often enough

Re: more problems with count(*) on large table

From
Alban Hertroys
Date:
A. Kretschmer wrote:
> Again: an index can't help! Because of MVCC: 'select count(*)' without
> WHERE-condition forces an seq. table-scan.

That has very little to do with MVCC.

If I understand correctly, MVCC is about the availability of records in
the current transaction. For that it makes no difference whether a seq
scan or an index scan is performed - both cases need to check at the
record level whether it's visible (where the seq scan is already looking
at the actual record, of course).

The only difference MVCC makes here is that the balance between a seq
scan or an index scan being more efficient is somewhat sooner in favour
of the seq scan than on some other databases, because the index scan
needs to look at the actual record for visibility.

I pleed not guilty ;)

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

Re: more problems with count(*) on large table

From
Alban Hertroys
Date:
Mike Charnoky wrote:
> With respect to the ALTER TABLE SET STATISTICS... how do I determine a
> good value to use?  This wasn't really clear in the pg docs.  Also, do I
> need to run ANALYZE on the table after I change the statistics?
>
> Here are the EXPLAINs from the queries:
>
> db=# explain select count(*) from prediction_accuracy where evtime
> between '2007-09-25' and '2007-09-26';
>
> QUERY PLAN
>
>
---------------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=475677.40..475677.41 rows=1 width=0)
>    ->  Index Scan using pred_acc_evtime_index on prediction_accuracy
> (cost=0.00..444451.44 rows=12490383 width=0)
>          Index Cond: ((evtime >= '2007-09-25 00:00:00-07'::timestamp
> with time zone) AND (evtime <= '2007-09-26 00:00:00-07'::timestamp with
> time zone))
> (3 rows)
>
> db=# explain select count(*) from prediction_accuracy where evtime
> between '2007-09-26' and '2007-09-27';
>
> QUERY PLAN
>
>
---------------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=486615.04..486615.05 rows=1 width=0)
>    ->  Index Scan using pred_acc_evtime_index on prediction_accuracy
> (cost=0.00..454671.07 rows=12777586 width=0)
>          Index Cond: ((evtime >= '2007-09-26 00:00:00-07'::timestamp
> with time zone) AND (evtime <= '2007-09-27 00:00:00-07'::timestamp with
> time zone))
> (3 rows)

Interesting, same plans and no sequential scans... Yet totally different
run times. Almost as if something prevents you to read some records
between 26 and 27 september...

I'm no expert on locking in Postgres, but AFAIK locks that prevent you
from reading records are rather rare and probably only issued from
userland code.

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

Re: more problems with count(*) on large table

From
Gregory Stark
Date:
"Alban Hertroys" <a.hertroys@magproductions.nl> writes:

> Mike Charnoky wrote:
>> With respect to the ALTER TABLE SET STATISTICS... how do I determine a
>> good value to use?  This wasn't really clear in the pg docs.  Also, do I
>> need to run ANALYZE on the table after I change the statistics?
>>
>> Here are the EXPLAINs from the queries:
>>
>> db=# explain select count(*) from prediction_accuracy where evtime
>> between '2007-09-25' and '2007-09-26';
>>
>> QUERY PLAN
>>
>>
---------------------------------------------------------------------------------------------------------------------------------------------------------
>>  Aggregate  (cost=475677.40..475677.41 rows=1 width=0)
>>    ->  Index Scan using pred_acc_evtime_index on prediction_accuracy
>> (cost=0.00..444451.44 rows=12490383 width=0)
>>          Index Cond: ((evtime >= '2007-09-25 00:00:00-07'::timestamp
>> with time zone) AND (evtime <= '2007-09-26 00:00:00-07'::timestamp with
>> time zone))
>> (3 rows)
>>
>> db=# explain select count(*) from prediction_accuracy where evtime
>> between '2007-09-26' and '2007-09-27';
>>
>> QUERY PLAN
>>
>>
---------------------------------------------------------------------------------------------------------------------------------------------------------
>>  Aggregate  (cost=486615.04..486615.05 rows=1 width=0)
>>    ->  Index Scan using pred_acc_evtime_index on prediction_accuracy
>> (cost=0.00..454671.07 rows=12777586 width=0)
>>          Index Cond: ((evtime >= '2007-09-26 00:00:00-07'::timestamp
>> with time zone) AND (evtime <= '2007-09-27 00:00:00-07'::timestamp with
>> time zone))
>> (3 rows)
>
> Interesting, same plans and no sequential scans... Yet totally different
> run times. Almost as if something prevents you to read some records
> between 26 and 27 september...

Just to be sure we're looking at the right plan do this:

\timing
PREPARE stmt AS
 SELECT count(*)
   FROM prediction_accuracy
  WHERE evtime BETWEEN '2007-09-25' AND '2007-09-26';

EXPLAIN EXECUTE stmt;
EXECUTE stmt;

> I'm no expert on locking in Postgres, but AFAIK locks that prevent you
> from reading records are rather rare and probably only issued from
> userland code.

Pages can be locked but never for very long.

What other work is going on in this server? Is there anything which might be
locking the table periodically (either a VACUUM FULL, CLUSTER, ALTER TABLE,
etc?)

Perhaps there's a hardware problem, is there anything in your system logs from
dmesg?

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: more problems with count(*) on large table

From
"Albe Laurenz"
Date:
Alban Hertroys wrote:
> A. Kretschmer wrote:
>> Again: an index can't help! Because of MVCC: 'select count(*)'
without
>> WHERE-condition forces an seq. table-scan.
>
> That has very little to do with MVCC.
>
> [...] For that it makes no difference whether a seq
> scan or an index scan is performed - both cases need to check at the
> record level whether it's visible (where the seq scan is
> already looking at the actual record, of course).

If you do not use MVCC (say, you use DB2), you need not check
the record itself because if it is there (which it is if there
is an index entry), it will be 'visible'.

> I pleed not guilty ;)

Declined, sorry.

Yours,
Laurenz Albe

Re: more problems with count(*) on large table

From
Alban Hertroys
Date:
Albe Laurenz wrote:
> Alban Hertroys wrote:
>> A. Kretschmer wrote:
>>> Again: an index can't help! Because of MVCC: 'select count(*)'
> without
>>> WHERE-condition forces an seq. table-scan.
>> That has very little to do with MVCC.
>>
>> [...] For that it makes no difference whether a seq
>> scan or an index scan is performed - both cases need to check at the
>> record level whether it's visible (where the seq scan is
>> already looking at the actual record, of course).
>
> If you do not use MVCC (say, you use DB2), you need not check
> the record itself because if it is there (which it is if there
> is an index entry), it will be 'visible'.

Still, that's not because of MVCC, but because of the way it is
implemented in PostgreSQL.

There has been talk in the past (regularly) about why the MVCC
information is not in the index and whether it should be, see the ML
archives.

Besides, there are still many situations where a sequential scan
(whether for count(*) or not) is faster than an index scan, no matter
whether you have MVCC or not.

As I said, MVCC has little to do with it.
The real problem is that in postgres you cannot tell from an index
whether a record is visible or not, while you can in DB2 (because it has
an index entry or not).

>> I pleed not guilty ;)
>
> Declined, sorry.

Overruled, sorry.

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

Re: more problems with count(*) on large table

From
Mike Charnoky
Date:
I altered the table in question, with "set statistics 100" on the
timestamp column, then ran analyze.  This seemed to help somewhat.  Now,
queries don't seem to hang, but it still takes a long time to do the count:
 * "where evtime between '2007-09-26' and '2007-09-27'"
   took 37 minutes to run (result was ~12 million)
 * "where evtime between '2007-09-25' and '2007-09-26'"
   took 40 minutes to run (result was ~14 million)

Still stymied about the seemingly random performance, especially since I
have seen this query execute in 2 minutes.

Nothing should be locking the table, the only things happening with this
database are: periodic autovacuums and a process which is constantly
inserting data into the large table in question (and a few other smaller
logging tables).

FYI: Here is the output from explain (I never knew about "\timing" in
psql, that is a big help!).  This was run immediately after I ran the
query a first time (without using the prepare and explain)... This
second run took a little over a minute!  Bizarre...

mydb=# prepare stmt as select count(*) from prediction_accuracy where
evtime between '2007-09-25' and '2007-09-26';
PREPARE
Time: 90.854 ms
mydb=# explain execute stmt;

QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=642338.27..642338.28 rows=1 width=0)
   ->  Index Scan using pred_acc_evtime_index on prediction_accuracy
(cost=0.00..600219.37 rows=16847557 width=0)
         Index Cond: ((evtime >= '2007-09-25 00:00:00-07'::timestamp
with time zone) AND (evtime <= '2007-09-26 00:00:00-07'::timestamp with
time zone))
(3 rows)

Time: 131.559 ms
mydb# execute stmt;
  count
----------
 14150928
(1 row)

Time: 101721.346 ms


Mike

Gregory Stark wrote:
> "Alban Hertroys" <a.hertroys@magproductions.nl> writes:
>
>> Mike Charnoky wrote:
>>> With respect to the ALTER TABLE SET STATISTICS... how do I determine a
>>> good value to use?  This wasn't really clear in the pg docs.  Also, do I
>>> need to run ANALYZE on the table after I change the statistics?
>>>
>>> Here are the EXPLAINs from the queries:
>>>
>>> db=# explain select count(*) from prediction_accuracy where evtime
>>> between '2007-09-25' and '2007-09-26';
>>>
>>> QUERY PLAN
>>>
>>>
---------------------------------------------------------------------------------------------------------------------------------------------------------
>>>  Aggregate  (cost=475677.40..475677.41 rows=1 width=0)
>>>    ->  Index Scan using pred_acc_evtime_index on prediction_accuracy
>>> (cost=0.00..444451.44 rows=12490383 width=0)
>>>          Index Cond: ((evtime >= '2007-09-25 00:00:00-07'::timestamp
>>> with time zone) AND (evtime <= '2007-09-26 00:00:00-07'::timestamp with
>>> time zone))
>>> (3 rows)
>>>
>>> db=# explain select count(*) from prediction_accuracy where evtime
>>> between '2007-09-26' and '2007-09-27';
>>>
>>> QUERY PLAN
>>>
>>>
---------------------------------------------------------------------------------------------------------------------------------------------------------
>>>  Aggregate  (cost=486615.04..486615.05 rows=1 width=0)
>>>    ->  Index Scan using pred_acc_evtime_index on prediction_accuracy
>>> (cost=0.00..454671.07 rows=12777586 width=0)
>>>          Index Cond: ((evtime >= '2007-09-26 00:00:00-07'::timestamp
>>> with time zone) AND (evtime <= '2007-09-27 00:00:00-07'::timestamp with
>>> time zone))
>>> (3 rows)
>> Interesting, same plans and no sequential scans... Yet totally different
>> run times. Almost as if something prevents you to read some records
>> between 26 and 27 september...
>
> Just to be sure we're looking at the right plan do this:
>
> \timing
> PREPARE stmt AS
>  SELECT count(*)
>    FROM prediction_accuracy
>   WHERE evtime BETWEEN '2007-09-25' AND '2007-09-26';
>
> EXPLAIN EXECUTE stmt;
> EXECUTE stmt;
>
>> I'm no expert on locking in Postgres, but AFAIK locks that prevent you
>> from reading records are rather rare and probably only issued from
>> userland code.
>
> Pages can be locked but never for very long.
>
> What other work is going on in this server? Is there anything which might be
> locking the table periodically (either a VACUUM FULL, CLUSTER, ALTER TABLE,
> etc?)
>
> Perhaps there's a hardware problem, is there anything in your system logs from
> dmesg?
>

Re: more problems with count(*) on large table

From
Gregory Stark
Date:
"Mike Charnoky" <noky@nextbus.com> writes:

> I altered the table in question, with "set statistics 100" on the
> timestamp column, then ran analyze.  This seemed to help somewhat.  Now,
> queries don't seem to hang, but it still takes a long time to do the count:
>  * "where evtime between '2007-09-26' and '2007-09-27'"
>    took 37 minutes to run (result was ~12 million)
>  * "where evtime between '2007-09-25' and '2007-09-26'"
>    took 40 minutes to run (result was ~14 million)
>
> Still stymied about the seemingly random performance, especially since I
> have seen this query execute in 2 minutes.


And the "explain analyze" for these?

Are you still sure it's certain date ranges which are consistently problems
and others are consistently fast? Or could it be something unrelated.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: more problems with count(*) on large table

From
Mike Charnoky
Date:
This is strange... count(*) operations over a period of one day's worth
of data now take ~1-2 minutes to run or ~40 minutes.  It seems that the
first time the data is queried it takes about 40 minutes.  If I try the
query again, it finishes in 1-2 minutes!

Again, nothing else is happening on this db server except for a constant
insertion into this table and a few others.  I have done "set statistics
100" for the evtime field in this table.

Here is the output from EXPLAIN ANALYZE.  This is the same query run
back to back, first time takes 42 minutes, second time takes less than 2
minutes!

mydb=# explain analyze select count(*) from prediction_accuracy where
evtime between '2007-09-29' and '2007-09-30';

  QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=3.02..3.03 rows=1 width=0) (actual
time=2549854.351..2549854.352 rows=1 loops=1)
   ->  Index Scan using pred_acc_evtime_index on prediction_accuracy
(cost=0.00..3.02 rows=1 width=0) (actual time=97.676..2532824.892
rows=11423786 loops=1)
         Index Cond: ((evtime >= '2007-09-29 00:00:00-07'::timestamp
with time zone) AND (evtime <= '2007-09-30 00:00:00-07'::timestamp with
time zone))
 Total runtime: 2549854.411 ms
(4 rows)

Time: 2549943.506 ms
mydb=# explain analyze select count(*) from prediction_accuracy where
evtime between '2007-09-29' and '2007-09-30';

 QUERY PLAN


-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=3.02..3.03 rows=1 width=0) (actual
time=111200.943..111200.944 rows=1 loops=1)
   ->  Index Scan using pred_acc_evtime_index on prediction_accuracy
(cost=0.00..3.02 rows=1 width=0) (actual time=36.396..96347.483
rows=11423786 loops=1)
         Index Cond: ((evtime >= '2007-09-29 00:00:00-07'::timestamp
with time zone) AND (evtime <= '2007-09-30 00:00:00-07'::timestamp with
time zone))
 Total runtime: 111201.000 ms
(4 rows)

Time: 111298.695 ms


Mike

Gregory Stark wrote:
> "Mike Charnoky" <noky@nextbus.com> writes:
>
>> I altered the table in question, with "set statistics 100" on the
>> timestamp column, then ran analyze.  This seemed to help somewhat.  Now,
>> queries don't seem to hang, but it still takes a long time to do the count:
>>  * "where evtime between '2007-09-26' and '2007-09-27'"
>>    took 37 minutes to run (result was ~12 million)
>>  * "where evtime between '2007-09-25' and '2007-09-26'"
>>    took 40 minutes to run (result was ~14 million)
>>
>> Still stymied about the seemingly random performance, especially since I
>> have seen this query execute in 2 minutes.
>
>
> And the "explain analyze" for these?
>
> Are you still sure it's certain date ranges which are consistently problems
> and others are consistently fast? Or could it be something unrelated.
>

Re: more problems with count(*) on large table

From
Alan Hodgson
Date:
On Monday 01 October 2007, Mike Charnoky <noky@nextbus.com> wrote:
> This is strange... count(*) operations over a period of one day's worth
> of data now take ~1-2 minutes to run or ~40 minutes.  It seems that the
> first time the data is queried it takes about 40 minutes.  If I try the
> query again, it finishes in 1-2 minutes!
>
> Again, nothing else is happening on this db server except for a constant
> insertion into this table and a few others.  I have done "set statistics
> 100" for the evtime field in this table.

The first time, you're reading from disk. The second time, you're reading
from cache. Tens of millions of disk seeks don't come cheap.

--
We're Microsoft. Everything else is just patent infringement.


Re: more problems with count(*) on large table

From
Alvaro Herrera
Date:
Mike Charnoky wrote:
> This is strange... count(*) operations over a period of one day's worth
> of data now take ~1-2 minutes to run or ~40 minutes.  It seems that the
> first time the data is queried it takes about 40 minutes.  If I try the
> query again, it finishes in 1-2 minutes!

This is just cache effect.



--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: more problems with count(*) on large table

From
Bill Moran
Date:
In response to Mike Charnoky <noky@nextbus.com>:

> This is strange... count(*) operations over a period of one day's worth
> of data now take ~1-2 minutes to run or ~40 minutes.  It seems that the
> first time the data is queried it takes about 40 minutes.  If I try the
> query again, it finishes in 1-2 minutes!

This sounds like a caching issue.  My guess at what's happening is that
other operations are pushing this data out of the shared_buffers, so
when you run it, the system has to pull a bunch of tuples off the disk
to check them.  If you run it again immediately, the tuples are still in
memory, and it runs very fast.

If this is the case, you can speed up things by adding RAM/shared_buffers,
or by moving to faster disks.  The RAM solution is going to give you the
biggest performance improvement.

However, if there's enough other data on this system, you may have
difficulty getting enough RAM to mitigate the problem, in which case,
faster disks are going to be your best bet.

How much RAM do you have, and how much of it is allocated to shared_buffers?
What's your IO subsystem look like?

> Again, nothing else is happening on this db server except for a constant
> insertion into this table and a few others.  I have done "set statistics
> 100" for the evtime field in this table.
>
> Here is the output from EXPLAIN ANALYZE.  This is the same query run
> back to back, first time takes 42 minutes, second time takes less than 2
> minutes!
>
> mydb=# explain analyze select count(*) from prediction_accuracy where
> evtime between '2007-09-29' and '2007-09-30';
>
>   QUERY PLAN
>
>
---------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=3.02..3.03 rows=1 width=0) (actual
> time=2549854.351..2549854.352 rows=1 loops=1)
>    ->  Index Scan using pred_acc_evtime_index on prediction_accuracy
> (cost=0.00..3.02 rows=1 width=0) (actual time=97.676..2532824.892
> rows=11423786 loops=1)
>          Index Cond: ((evtime >= '2007-09-29 00:00:00-07'::timestamp
> with time zone) AND (evtime <= '2007-09-30 00:00:00-07'::timestamp with
> time zone))
>  Total runtime: 2549854.411 ms
> (4 rows)
>
> Time: 2549943.506 ms
> mydb=# explain analyze select count(*) from prediction_accuracy where
> evtime between '2007-09-29' and '2007-09-30';
>
>  QUERY PLAN
>
>
-------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=3.02..3.03 rows=1 width=0) (actual
> time=111200.943..111200.944 rows=1 loops=1)
>    ->  Index Scan using pred_acc_evtime_index on prediction_accuracy
> (cost=0.00..3.02 rows=1 width=0) (actual time=36.396..96347.483
> rows=11423786 loops=1)
>          Index Cond: ((evtime >= '2007-09-29 00:00:00-07'::timestamp
> with time zone) AND (evtime <= '2007-09-30 00:00:00-07'::timestamp with
> time zone))
>  Total runtime: 111201.000 ms
> (4 rows)
>
> Time: 111298.695 ms
>
>
> Mike
>
> Gregory Stark wrote:
> > "Mike Charnoky" <noky@nextbus.com> writes:
> >
> >> I altered the table in question, with "set statistics 100" on the
> >> timestamp column, then ran analyze.  This seemed to help somewhat.  Now,
> >> queries don't seem to hang, but it still takes a long time to do the count:
> >>  * "where evtime between '2007-09-26' and '2007-09-27'"
> >>    took 37 minutes to run (result was ~12 million)
> >>  * "where evtime between '2007-09-25' and '2007-09-26'"
> >>    took 40 minutes to run (result was ~14 million)
> >>
> >> Still stymied about the seemingly random performance, especially since I
> >> have seen this query execute in 2 minutes.
> >
> >
> > And the "explain analyze" for these?
> >
> > Are you still sure it's certain date ranges which are consistently problems
> > and others are consistently fast? Or could it be something unrelated.
> >
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq


--
Bill Moran
http://www.potentialtech.com

Re: more problems with count(*) on large table

From
Gregory Stark
Date:
"Mike Charnoky" <noky@nextbus.com> writes:

> Here is the output from EXPLAIN ANALYZE.  This is the same query run
> back to back, first time takes 42 minutes, second time takes less than 2
> minutes!

That doesn't really sound strange at all. It sounds like you have a very slow
disk and very large amount of memory. 40 minutes to scan 11.4M records sounds
kind of high to me though.

How wide are these records anyways? That is, what is the table definition for
this table? If this is a single consumer drive 42 minutes sounds about right
for 2k wide records being randomly accessed.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: more problems with count(*) on large table

From
Martijn van Oosterhout
Date:
On Mon, Oct 01, 2007 at 01:34:32PM -0400, Bill Moran wrote:
> This sounds like a caching issue.  My guess at what's happening is that
> other operations are pushing this data out of the shared_buffers, so
> when you run it, the system has to pull a bunch of tuples off the disk
> to check them.  If you run it again immediately, the tuples are still in
> memory, and it runs very fast.

You should check your ratio of system cache to shared_buffers. Pushing
things out of shared buffers isn't bad if you've got a much bigger OS
behind it. The cost of pulling something out of the OS cache is
negligable compared to really going to disk.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

Re: more problems with count(*) on large table

From
Tomasz Ostrowski
Date:
On Mon, 01 Oct 2007, Mike Charnoky wrote:

> I altered the table in question, with "set statistics 100" on the
> timestamp column, then ran analyze.  This seemed to help somewhat.  Now,
> queries don't seem to hang, but it still takes a long time to do the count:
>  * "where evtime between '2007-09-26' and '2007-09-27'"
>    took 37 minutes to run (result was ~12 million)
>  * "where evtime between '2007-09-25' and '2007-09-26'"
>    took 40 minutes to run (result was ~14 million)

Maybe it needs several million scattered seeks which basically
disable disk cache.

If you can afford a database inactivity period you can for example do
periodically (for example on nights or weekends):
    cluster pred_acc_evtime_index on prediction_accuracy;
    (see http://www.postgresql.org/docs/8.2/static/sql-cluster.html)
This would run rather long time for the first time, but will be much
faster later.

This should make seeks much more local and it would help operating
system to cache results.

Regards
Tometzky
--
...although Eating Honey was a very good thing to do, there was a
moment just before you began to eat it which was better than when you
were...
                                                      Winnie the Pooh

Re: more problems with count(*) on large table

From
Mike Charnoky
Date:
The db server is pretty beefy: 2x Xeon 3.20 GHz with 6G RAM.  The io
subsystem is a 550G 4-disk SATA 150 RAID 10 array connected via a 3ware
9500S-8 controller (Seagate Nearline ST3400632NS drives).  Currently,
shared_buffers is set to 50000 (nearly 400M)

As for the data stored in this large table, there are 15 columns.  Each
row takes roughly 134 bytes to store, not counting the index.  So, for
one day's worth of data we are talking about 1.5G/day (1.8G with the
index).  That's about 11.5M rows/day.  Although the data isn't stored
exactly sequentially by the indexed time field, it is pretty close.

If it takes PG ~40 minutes to count(*) one day's worth of records, the
avg throughput is 786k/s.  Watching iostat during the count(*)
operation, I see average read speeds in the range of 1100-1500k/s.

I guess I would expect postgres to perform a count(*) faster.  When I
run benchmarks on the machine with hdparm (with the db insert process
running), I see the disk averages > 80MB/sec for reads

# hdparm -tT /dev/sdb1
/dev/sdb1:
 Timing cached reads:   3884 MB in  2.00 seconds = 1942.85 MB/sec
 Timing buffered disk reads:  248 MB in  3.01 seconds =  82.49 MB/sec

Maybe PG has to do a lot of random disk access?  I'm running bonnie++
now to get more detailed disk performance info.  As Tomasz pointed out
maybe using CLUSTER would help, but this probably takes a long time to
perform.

Again, the only other thing happening with the db: a separate process is
inserting data into this table.  I have checkpoint_segments set to 64 so
that pg is not constantly thrashing the disk with writes.  The
transaction log is on a separate disk.


Mike

Bill Moran wrote:
> In response to Mike Charnoky <noky@nextbus.com>:
>
>> This is strange... count(*) operations over a period of one day's worth
>> of data now take ~1-2 minutes to run or ~40 minutes.  It seems that the
>> first time the data is queried it takes about 40 minutes.  If I try the
>> query again, it finishes in 1-2 minutes!
>
> This sounds like a caching issue.  My guess at what's happening is that
> other operations are pushing this data out of the shared_buffers, so
> when you run it, the system has to pull a bunch of tuples off the disk
> to check them.  If you run it again immediately, the tuples are still in
> memory, and it runs very fast.
>
> If this is the case, you can speed up things by adding RAM/shared_buffers,
> or by moving to faster disks.  The RAM solution is going to give you the
> biggest performance improvement.
>
> However, if there's enough other data on this system, you may have
> difficulty getting enough RAM to mitigate the problem, in which case,
> faster disks are going to be your best bet.
>
> How much RAM do you have, and how much of it is allocated to shared_buffers?
> What's your IO subsystem look like?


Re: more problems with count(*) on large table

From
Bill Moran
Date:
In response to Mike Charnoky <noky@nextbus.com>:

> The db server is pretty beefy: 2x Xeon 3.20 GHz with 6G RAM.  The io
> subsystem is a 550G 4-disk SATA 150 RAID 10 array connected via a 3ware
> 9500S-8 controller (Seagate Nearline ST3400632NS drives).  Currently,
> shared_buffers is set to 50000 (nearly 400M)

The advice on 8.x systems has been to start with 1/4-1/3 of the available
RAM on the system, and fine-tune from there.  Unless there are other
(non-postgresql) functions this machine serves, you should probably up
shared_buffers to about 2G.  From there, you may find that your workload
benefits from even more, or possibly less, but 400M seems pretty small
for a 6G system.

> As for the data stored in this large table, there are 15 columns.  Each
> row takes roughly 134 bytes to store, not counting the index.  So, for
> one day's worth of data we are talking about 1.5G/day (1.8G with the
> index).  That's about 11.5M rows/day.  Although the data isn't stored
> exactly sequentially by the indexed time field, it is pretty close.

How much other data is this server pushing around?  If there's only that
one table in that one database, then something is wrong, as that whole
thing should be in the filesystem cache all the time.  Otherwise, you
have to consider what other operations may be needing memory and moving
those tables out of the way.

> If it takes PG ~40 minutes to count(*) one day's worth of records, the
> avg throughput is 786k/s.  Watching iostat during the count(*)
> operation, I see average read speeds in the range of 1100-1500k/s.

Could be a lot of fragmentation of that table.  Keep in mind that if
you're deleting records occasionally, that free space will get reused,
which means an insert might not insert sequentially, it might go all
over the table.

> I guess I would expect postgres to perform a count(*) faster.  When I
> run benchmarks on the machine with hdparm (with the db insert process
> running), I see the disk averages > 80MB/sec for reads
>
> # hdparm -tT /dev/sdb1
> /dev/sdb1:
>  Timing cached reads:   3884 MB in  2.00 seconds = 1942.85 MB/sec
>  Timing buffered disk reads:  248 MB in  3.01 seconds =  82.49 MB/sec
>
> Maybe PG has to do a lot of random disk access?  I'm running bonnie++
> now to get more detailed disk performance info.  As Tomasz pointed out
> maybe using CLUSTER would help, but this probably takes a long time to
> perform.

If you can spare the time, give it a try to see if it helps.

> Again, the only other thing happening with the db: a separate process is
> inserting data into this table.  I have checkpoint_segments set to 64 so
> that pg is not constantly thrashing the disk with writes.  The
> transaction log is on a separate disk.
>
>
> Mike
>
> Bill Moran wrote:
> > In response to Mike Charnoky <noky@nextbus.com>:
> >
> >> This is strange... count(*) operations over a period of one day's worth
> >> of data now take ~1-2 minutes to run or ~40 minutes.  It seems that the
> >> first time the data is queried it takes about 40 minutes.  If I try the
> >> query again, it finishes in 1-2 minutes!
> >
> > This sounds like a caching issue.  My guess at what's happening is that
> > other operations are pushing this data out of the shared_buffers, so
> > when you run it, the system has to pull a bunch of tuples off the disk
> > to check them.  If you run it again immediately, the tuples are still in
> > memory, and it runs very fast.
> >
> > If this is the case, you can speed up things by adding RAM/shared_buffers,
> > or by moving to faster disks.  The RAM solution is going to give you the
> > biggest performance improvement.
> >
> > However, if there's enough other data on this system, you may have
> > difficulty getting enough RAM to mitigate the problem, in which case,
> > faster disks are going to be your best bet.
> >
> > How much RAM do you have, and how much of it is allocated to shared_buffers?
> > What's your IO subsystem look like?
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings


--
Bill Moran
http://www.potentialtech.com