Thread: more problems with count(*) on large table
Hi, I am still having problems performing a count(*) on a large table. This is a followup from a recent thread: http://archives.postgresql.org/pgsql-general/2007-09/msg00561.php Since the last time these problems happened, we have tweaked some postgresql config parameters (fsm, etc). I also recreated the large table, with the assumption it was somehow corrupted. Now, certain count(*) queries are failing to complete for certain time ranges (I killed the query after about 24 hours). The table is indexed on a timestamp field. Here is one query that hangs: select count(*) from mytable where evtime between '2007-09-26' and '2007-09-27'; However, this query runs successfully and takes 2 minutes: select count(*) from mytable where evtime between '2007-09-25' and '2007-09-26'; count ---------- 14150928 (1 row) What is going on? I analyzed the table before running the query and have no reason to believe that the amount of data added to the table varies much from day to day. No data has been deleted from the table yet, just added. Here is some config info: PostgreSQL 8.1.8 on Fedora 3 shared_buffers = 8000 temp_buffers = 1000 work_mem = 16384 maintenance_work_mem = 262144 max_fsm_pages = 500000 max_fsm_relations = 30000 Mike
am Fri, dem 28.09.2007, um 11:56:46 -0400 mailte Mike Charnoky folgendes: > Hi, > > I am still having problems performing a count(*) on a large table. This > > Now, certain count(*) queries are failing to complete for certain time > ranges (I killed the query after about 24 hours). The table is indexed > on a timestamp field. Here is one query that hangs: Again: an index can't help! Because of MVCC: 'select count(*)' without WHERE-condition forces an seq. table-scan. Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net
A. Kretschmer wrote: > am Fri, dem 28.09.2007, um 11:56:46 -0400 mailte Mike Charnoky folgendes: > > Hi, > > > > I am still having problems performing a count(*) on a large table. This > > > > Now, certain count(*) queries are failing to complete for certain time > > ranges (I killed the query after about 24 hours). The table is indexed > > on a timestamp field. Here is one query that hangs: > > Again: an index can't help! Because of MVCC: 'select count(*)' without > WHERE-condition forces an seq. table-scan. But he does have a WHERE condition. THe problem is, probably, that the condition is not selective enough so the planner chooses to do a seqscan. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Fri, Sep 28, 2007 at 12:50:34PM -0400, Alvaro Herrera wrote: > > But he does have a WHERE condition. THe problem is, probably, that the > condition is not selective enough so the planner chooses to do a > seqscan. Or else the planner has a bad idea of how selective the condition is. I've found that this sort of inconsistent selectivity on an indexed column can be because the column statistics aren't good enough. You can try ALTER TABLE...SET STATISTICS to see if it helps. A -- Andrew Sullivan | ajs@crankycanuck.ca This work was visionary and imaginative, and goes to show that visionary and imaginative work need not end up well. --Dennis Ritchie
am Fri, dem 28.09.2007, um 12:50:34 -0400 mailte Alvaro Herrera folgendes: > A. Kretschmer wrote: > > am Fri, dem 28.09.2007, um 11:56:46 -0400 mailte Mike Charnoky folgendes: > > > Hi, > > > > > > I am still having problems performing a count(*) on a large table. This > > > > > > Now, certain count(*) queries are failing to complete for certain time > > > ranges (I killed the query after about 24 hours). The table is indexed > > > on a timestamp field. Here is one query that hangs: > > > > Again: an index can't help! Because of MVCC: 'select count(*)' without > > WHERE-condition forces an seq. table-scan. > > But he does have a WHERE condition. THe problem is, probably, that the > condition is not selective enough so the planner chooses to do a > seqscan. Yes, i'm sorry, i havn't recognize this. Maybe wrong statistics. As Andrew suggested, a 'ALTER TABLE...SET STATISTICS' can help. Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net
"Alvaro Herrera" <alvherre@commandprompt.com> writes: > A. Kretschmer wrote: >> am Fri, dem 28.09.2007, um 11:56:46 -0400 mailte Mike Charnoky folgendes: >> > Hi, >> > >> > I am still having problems performing a count(*) on a large table. This >> > >> > Now, certain count(*) queries are failing to complete for certain time >> > ranges (I killed the query after about 24 hours). The table is indexed >> > on a timestamp field. Here is one query that hangs: >> >> Again: an index can't help! Because of MVCC: 'select count(*)' without >> WHERE-condition forces an seq. table-scan. > > But he does have a WHERE condition. THe problem is, probably, that the > condition is not selective enough so the planner chooses to do a > seqscan. What does EXPLAIN SELECT ... say? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Mike Charnoky wrote: > Hi, > > I am still having problems performing a count(*) on a large table. This > is a followup from a recent thread: > > http://archives.postgresql.org/pgsql-general/2007-09/msg00561.php > > Since the last time these problems happened, we have tweaked some > postgresql config parameters (fsm, etc). I also recreated the large > table, with the assumption it was somehow corrupted. > > Now, certain count(*) queries are failing to complete for certain time > ranges (I killed the query after about 24 hours). The table is indexed > on a timestamp field. Here is one query that hangs: > > select count(*) from mytable where evtime between '2007-09-26' and > '2007-09-27'; > > However, this query runs successfully and takes 2 minutes: > > select count(*) from mytable where evtime between '2007-09-25' and > '2007-09-26'; > > count > ---------- > 14150928 > (1 row) > > What is going on? I analyzed the table before running the query and > have no reason to believe that the amount of data added to the table > varies much from day to day. No data has been deleted from the table > yet, just added. > > Here is some config info: > > PostgreSQL 8.1.8 on Fedora 3 Have you vacuumed recently? Sean
The autovacuum is turned on. Since this is pg8.1, I don't know when the table was actually last vacuumed. I *did* run analyze on the table, though. Also, nothing has been deleted in this table... so vacuum should have no affect, right? Mike Sean Davis wrote: > Mike Charnoky wrote: >> Hi, >> >> I am still having problems performing a count(*) on a large table. > Have you vacuumed recently? > > Sean
In response to Mike Charnoky <noky@nextbus.com>: > The autovacuum is turned on. Since this is pg8.1, I don't know when the > table was actually last vacuumed. I *did* run analyze on the table, > though. Also, nothing has been deleted in this table... so vacuum > should have no affect, right? Updated rows also produce dead tuples that require vacuuming. If the table is insert only, you don't need vacuum. -- Bill Moran http://www.potentialtech.com
With respect to the ALTER TABLE SET STATISTICS... how do I determine a good value to use? This wasn't really clear in the pg docs. Also, do I need to run ANALYZE on the table after I change the statistics? Here are the EXPLAINs from the queries: db=# explain select count(*) from prediction_accuracy where evtime between '2007-09-25' and '2007-09-26'; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=475677.40..475677.41 rows=1 width=0) -> Index Scan using pred_acc_evtime_index on prediction_accuracy (cost=0.00..444451.44 rows=12490383 width=0) Index Cond: ((evtime >= '2007-09-25 00:00:00-07'::timestamp with time zone) AND (evtime <= '2007-09-26 00:00:00-07'::timestamp with time zone)) (3 rows) db=# explain select count(*) from prediction_accuracy where evtime between '2007-09-26' and '2007-09-27'; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=486615.04..486615.05 rows=1 width=0) -> Index Scan using pred_acc_evtime_index on prediction_accuracy (cost=0.00..454671.07 rows=12777586 width=0) Index Cond: ((evtime >= '2007-09-26 00:00:00-07'::timestamp with time zone) AND (evtime <= '2007-09-27 00:00:00-07'::timestamp with time zone)) (3 rows) Mike Gregory Stark wrote: > "Alvaro Herrera" <alvherre@commandprompt.com> writes: > >> A. Kretschmer wrote: >>> am Fri, dem 28.09.2007, um 11:56:46 -0400 mailte Mike Charnoky folgendes: >>>> Hi, >>>> >>>> I am still having problems performing a count(*) on a large table. This >>>> >>>> Now, certain count(*) queries are failing to complete for certain time >>>> ranges (I killed the query after about 24 hours). The table is indexed >>>> on a timestamp field. Here is one query that hangs: >>> Again: an index can't help! Because of MVCC: 'select count(*)' without >>> WHERE-condition forces an seq. table-scan. >> But he does have a WHERE condition. THe problem is, probably, that the >> condition is not selective enough so the planner chooses to do a >> seqscan. > > What does EXPLAIN SELECT ... say? > > >
On 9/28/07, Mike Charnoky <noky@nextbus.com> wrote: > Hi, > > I am still having problems performing a count(*) on a large table. This > is a followup from a recent thread: > > http://archives.postgresql.org/pgsql-general/2007-09/msg00561.php > > Since the last time these problems happened, we have tweaked some > postgresql config parameters (fsm, etc). I also recreated the large > table, with the assumption it was somehow corrupted. > > Now, certain count(*) queries are failing to complete for certain time > ranges (I killed the query after about 24 hours). The table is indexed > on a timestamp field. Here is one query that hangs: > > select count(*) from mytable where evtime between '2007-09-26' and > '2007-09-27'; > > However, this query runs successfully and takes 2 minutes: > > select count(*) from mytable where evtime between '2007-09-25' and > '2007-09-26'; I had a similar problem pop up recently in my medium large reporting database. I found the problem solved by two things, one was upping the stats target on the time column to 100, and the other was running regularly scheduled analyzes on the large reporting table, about once every 8 hours. autovacuum works fine otherwise, but was not running analyze often enough
A. Kretschmer wrote: > Again: an index can't help! Because of MVCC: 'select count(*)' without > WHERE-condition forces an seq. table-scan. That has very little to do with MVCC. If I understand correctly, MVCC is about the availability of records in the current transaction. For that it makes no difference whether a seq scan or an index scan is performed - both cases need to check at the record level whether it's visible (where the seq scan is already looking at the actual record, of course). The only difference MVCC makes here is that the balance between a seq scan or an index scan being more efficient is somewhat sooner in favour of the seq scan than on some other databases, because the index scan needs to look at the actual record for visibility. I pleed not guilty ;) -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
Mike Charnoky wrote: > With respect to the ALTER TABLE SET STATISTICS... how do I determine a > good value to use? This wasn't really clear in the pg docs. Also, do I > need to run ANALYZE on the table after I change the statistics? > > Here are the EXPLAINs from the queries: > > db=# explain select count(*) from prediction_accuracy where evtime > between '2007-09-25' and '2007-09-26'; > > QUERY PLAN > > --------------------------------------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=475677.40..475677.41 rows=1 width=0) > -> Index Scan using pred_acc_evtime_index on prediction_accuracy > (cost=0.00..444451.44 rows=12490383 width=0) > Index Cond: ((evtime >= '2007-09-25 00:00:00-07'::timestamp > with time zone) AND (evtime <= '2007-09-26 00:00:00-07'::timestamp with > time zone)) > (3 rows) > > db=# explain select count(*) from prediction_accuracy where evtime > between '2007-09-26' and '2007-09-27'; > > QUERY PLAN > > --------------------------------------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=486615.04..486615.05 rows=1 width=0) > -> Index Scan using pred_acc_evtime_index on prediction_accuracy > (cost=0.00..454671.07 rows=12777586 width=0) > Index Cond: ((evtime >= '2007-09-26 00:00:00-07'::timestamp > with time zone) AND (evtime <= '2007-09-27 00:00:00-07'::timestamp with > time zone)) > (3 rows) Interesting, same plans and no sequential scans... Yet totally different run times. Almost as if something prevents you to read some records between 26 and 27 september... I'm no expert on locking in Postgres, but AFAIK locks that prevent you from reading records are rather rare and probably only issued from userland code. -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
"Alban Hertroys" <a.hertroys@magproductions.nl> writes: > Mike Charnoky wrote: >> With respect to the ALTER TABLE SET STATISTICS... how do I determine a >> good value to use? This wasn't really clear in the pg docs. Also, do I >> need to run ANALYZE on the table after I change the statistics? >> >> Here are the EXPLAINs from the queries: >> >> db=# explain select count(*) from prediction_accuracy where evtime >> between '2007-09-25' and '2007-09-26'; >> >> QUERY PLAN >> >> --------------------------------------------------------------------------------------------------------------------------------------------------------- >> Aggregate (cost=475677.40..475677.41 rows=1 width=0) >> -> Index Scan using pred_acc_evtime_index on prediction_accuracy >> (cost=0.00..444451.44 rows=12490383 width=0) >> Index Cond: ((evtime >= '2007-09-25 00:00:00-07'::timestamp >> with time zone) AND (evtime <= '2007-09-26 00:00:00-07'::timestamp with >> time zone)) >> (3 rows) >> >> db=# explain select count(*) from prediction_accuracy where evtime >> between '2007-09-26' and '2007-09-27'; >> >> QUERY PLAN >> >> --------------------------------------------------------------------------------------------------------------------------------------------------------- >> Aggregate (cost=486615.04..486615.05 rows=1 width=0) >> -> Index Scan using pred_acc_evtime_index on prediction_accuracy >> (cost=0.00..454671.07 rows=12777586 width=0) >> Index Cond: ((evtime >= '2007-09-26 00:00:00-07'::timestamp >> with time zone) AND (evtime <= '2007-09-27 00:00:00-07'::timestamp with >> time zone)) >> (3 rows) > > Interesting, same plans and no sequential scans... Yet totally different > run times. Almost as if something prevents you to read some records > between 26 and 27 september... Just to be sure we're looking at the right plan do this: \timing PREPARE stmt AS SELECT count(*) FROM prediction_accuracy WHERE evtime BETWEEN '2007-09-25' AND '2007-09-26'; EXPLAIN EXECUTE stmt; EXECUTE stmt; > I'm no expert on locking in Postgres, but AFAIK locks that prevent you > from reading records are rather rare and probably only issued from > userland code. Pages can be locked but never for very long. What other work is going on in this server? Is there anything which might be locking the table periodically (either a VACUUM FULL, CLUSTER, ALTER TABLE, etc?) Perhaps there's a hardware problem, is there anything in your system logs from dmesg? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Alban Hertroys wrote: > A. Kretschmer wrote: >> Again: an index can't help! Because of MVCC: 'select count(*)' without >> WHERE-condition forces an seq. table-scan. > > That has very little to do with MVCC. > > [...] For that it makes no difference whether a seq > scan or an index scan is performed - both cases need to check at the > record level whether it's visible (where the seq scan is > already looking at the actual record, of course). If you do not use MVCC (say, you use DB2), you need not check the record itself because if it is there (which it is if there is an index entry), it will be 'visible'. > I pleed not guilty ;) Declined, sorry. Yours, Laurenz Albe
Albe Laurenz wrote: > Alban Hertroys wrote: >> A. Kretschmer wrote: >>> Again: an index can't help! Because of MVCC: 'select count(*)' > without >>> WHERE-condition forces an seq. table-scan. >> That has very little to do with MVCC. >> >> [...] For that it makes no difference whether a seq >> scan or an index scan is performed - both cases need to check at the >> record level whether it's visible (where the seq scan is >> already looking at the actual record, of course). > > If you do not use MVCC (say, you use DB2), you need not check > the record itself because if it is there (which it is if there > is an index entry), it will be 'visible'. Still, that's not because of MVCC, but because of the way it is implemented in PostgreSQL. There has been talk in the past (regularly) about why the MVCC information is not in the index and whether it should be, see the ML archives. Besides, there are still many situations where a sequential scan (whether for count(*) or not) is faster than an index scan, no matter whether you have MVCC or not. As I said, MVCC has little to do with it. The real problem is that in postgres you cannot tell from an index whether a record is visible or not, while you can in DB2 (because it has an index entry or not). >> I pleed not guilty ;) > > Declined, sorry. Overruled, sorry. -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
I altered the table in question, with "set statistics 100" on the timestamp column, then ran analyze. This seemed to help somewhat. Now, queries don't seem to hang, but it still takes a long time to do the count: * "where evtime between '2007-09-26' and '2007-09-27'" took 37 minutes to run (result was ~12 million) * "where evtime between '2007-09-25' and '2007-09-26'" took 40 minutes to run (result was ~14 million) Still stymied about the seemingly random performance, especially since I have seen this query execute in 2 minutes. Nothing should be locking the table, the only things happening with this database are: periodic autovacuums and a process which is constantly inserting data into the large table in question (and a few other smaller logging tables). FYI: Here is the output from explain (I never knew about "\timing" in psql, that is a big help!). This was run immediately after I ran the query a first time (without using the prepare and explain)... This second run took a little over a minute! Bizarre... mydb=# prepare stmt as select count(*) from prediction_accuracy where evtime between '2007-09-25' and '2007-09-26'; PREPARE Time: 90.854 ms mydb=# explain execute stmt; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=642338.27..642338.28 rows=1 width=0) -> Index Scan using pred_acc_evtime_index on prediction_accuracy (cost=0.00..600219.37 rows=16847557 width=0) Index Cond: ((evtime >= '2007-09-25 00:00:00-07'::timestamp with time zone) AND (evtime <= '2007-09-26 00:00:00-07'::timestamp with time zone)) (3 rows) Time: 131.559 ms mydb# execute stmt; count ---------- 14150928 (1 row) Time: 101721.346 ms Mike Gregory Stark wrote: > "Alban Hertroys" <a.hertroys@magproductions.nl> writes: > >> Mike Charnoky wrote: >>> With respect to the ALTER TABLE SET STATISTICS... how do I determine a >>> good value to use? This wasn't really clear in the pg docs. Also, do I >>> need to run ANALYZE on the table after I change the statistics? >>> >>> Here are the EXPLAINs from the queries: >>> >>> db=# explain select count(*) from prediction_accuracy where evtime >>> between '2007-09-25' and '2007-09-26'; >>> >>> QUERY PLAN >>> >>> --------------------------------------------------------------------------------------------------------------------------------------------------------- >>> Aggregate (cost=475677.40..475677.41 rows=1 width=0) >>> -> Index Scan using pred_acc_evtime_index on prediction_accuracy >>> (cost=0.00..444451.44 rows=12490383 width=0) >>> Index Cond: ((evtime >= '2007-09-25 00:00:00-07'::timestamp >>> with time zone) AND (evtime <= '2007-09-26 00:00:00-07'::timestamp with >>> time zone)) >>> (3 rows) >>> >>> db=# explain select count(*) from prediction_accuracy where evtime >>> between '2007-09-26' and '2007-09-27'; >>> >>> QUERY PLAN >>> >>> --------------------------------------------------------------------------------------------------------------------------------------------------------- >>> Aggregate (cost=486615.04..486615.05 rows=1 width=0) >>> -> Index Scan using pred_acc_evtime_index on prediction_accuracy >>> (cost=0.00..454671.07 rows=12777586 width=0) >>> Index Cond: ((evtime >= '2007-09-26 00:00:00-07'::timestamp >>> with time zone) AND (evtime <= '2007-09-27 00:00:00-07'::timestamp with >>> time zone)) >>> (3 rows) >> Interesting, same plans and no sequential scans... Yet totally different >> run times. Almost as if something prevents you to read some records >> between 26 and 27 september... > > Just to be sure we're looking at the right plan do this: > > \timing > PREPARE stmt AS > SELECT count(*) > FROM prediction_accuracy > WHERE evtime BETWEEN '2007-09-25' AND '2007-09-26'; > > EXPLAIN EXECUTE stmt; > EXECUTE stmt; > >> I'm no expert on locking in Postgres, but AFAIK locks that prevent you >> from reading records are rather rare and probably only issued from >> userland code. > > Pages can be locked but never for very long. > > What other work is going on in this server? Is there anything which might be > locking the table periodically (either a VACUUM FULL, CLUSTER, ALTER TABLE, > etc?) > > Perhaps there's a hardware problem, is there anything in your system logs from > dmesg? >
"Mike Charnoky" <noky@nextbus.com> writes: > I altered the table in question, with "set statistics 100" on the > timestamp column, then ran analyze. This seemed to help somewhat. Now, > queries don't seem to hang, but it still takes a long time to do the count: > * "where evtime between '2007-09-26' and '2007-09-27'" > took 37 minutes to run (result was ~12 million) > * "where evtime between '2007-09-25' and '2007-09-26'" > took 40 minutes to run (result was ~14 million) > > Still stymied about the seemingly random performance, especially since I > have seen this query execute in 2 minutes. And the "explain analyze" for these? Are you still sure it's certain date ranges which are consistently problems and others are consistently fast? Or could it be something unrelated. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
This is strange... count(*) operations over a period of one day's worth of data now take ~1-2 minutes to run or ~40 minutes. It seems that the first time the data is queried it takes about 40 minutes. If I try the query again, it finishes in 1-2 minutes! Again, nothing else is happening on this db server except for a constant insertion into this table and a few others. I have done "set statistics 100" for the evtime field in this table. Here is the output from EXPLAIN ANALYZE. This is the same query run back to back, first time takes 42 minutes, second time takes less than 2 minutes! mydb=# explain analyze select count(*) from prediction_accuracy where evtime between '2007-09-29' and '2007-09-30'; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=3.02..3.03 rows=1 width=0) (actual time=2549854.351..2549854.352 rows=1 loops=1) -> Index Scan using pred_acc_evtime_index on prediction_accuracy (cost=0.00..3.02 rows=1 width=0) (actual time=97.676..2532824.892 rows=11423786 loops=1) Index Cond: ((evtime >= '2007-09-29 00:00:00-07'::timestamp with time zone) AND (evtime <= '2007-09-30 00:00:00-07'::timestamp with time zone)) Total runtime: 2549854.411 ms (4 rows) Time: 2549943.506 ms mydb=# explain analyze select count(*) from prediction_accuracy where evtime between '2007-09-29' and '2007-09-30'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=3.02..3.03 rows=1 width=0) (actual time=111200.943..111200.944 rows=1 loops=1) -> Index Scan using pred_acc_evtime_index on prediction_accuracy (cost=0.00..3.02 rows=1 width=0) (actual time=36.396..96347.483 rows=11423786 loops=1) Index Cond: ((evtime >= '2007-09-29 00:00:00-07'::timestamp with time zone) AND (evtime <= '2007-09-30 00:00:00-07'::timestamp with time zone)) Total runtime: 111201.000 ms (4 rows) Time: 111298.695 ms Mike Gregory Stark wrote: > "Mike Charnoky" <noky@nextbus.com> writes: > >> I altered the table in question, with "set statistics 100" on the >> timestamp column, then ran analyze. This seemed to help somewhat. Now, >> queries don't seem to hang, but it still takes a long time to do the count: >> * "where evtime between '2007-09-26' and '2007-09-27'" >> took 37 minutes to run (result was ~12 million) >> * "where evtime between '2007-09-25' and '2007-09-26'" >> took 40 minutes to run (result was ~14 million) >> >> Still stymied about the seemingly random performance, especially since I >> have seen this query execute in 2 minutes. > > > And the "explain analyze" for these? > > Are you still sure it's certain date ranges which are consistently problems > and others are consistently fast? Or could it be something unrelated. >
On Monday 01 October 2007, Mike Charnoky <noky@nextbus.com> wrote: > This is strange... count(*) operations over a period of one day's worth > of data now take ~1-2 minutes to run or ~40 minutes. It seems that the > first time the data is queried it takes about 40 minutes. If I try the > query again, it finishes in 1-2 minutes! > > Again, nothing else is happening on this db server except for a constant > insertion into this table and a few others. I have done "set statistics > 100" for the evtime field in this table. The first time, you're reading from disk. The second time, you're reading from cache. Tens of millions of disk seeks don't come cheap. -- We're Microsoft. Everything else is just patent infringement.
Mike Charnoky wrote: > This is strange... count(*) operations over a period of one day's worth > of data now take ~1-2 minutes to run or ~40 minutes. It seems that the > first time the data is queried it takes about 40 minutes. If I try the > query again, it finishes in 1-2 minutes! This is just cache effect. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
In response to Mike Charnoky <noky@nextbus.com>: > This is strange... count(*) operations over a period of one day's worth > of data now take ~1-2 minutes to run or ~40 minutes. It seems that the > first time the data is queried it takes about 40 minutes. If I try the > query again, it finishes in 1-2 minutes! This sounds like a caching issue. My guess at what's happening is that other operations are pushing this data out of the shared_buffers, so when you run it, the system has to pull a bunch of tuples off the disk to check them. If you run it again immediately, the tuples are still in memory, and it runs very fast. If this is the case, you can speed up things by adding RAM/shared_buffers, or by moving to faster disks. The RAM solution is going to give you the biggest performance improvement. However, if there's enough other data on this system, you may have difficulty getting enough RAM to mitigate the problem, in which case, faster disks are going to be your best bet. How much RAM do you have, and how much of it is allocated to shared_buffers? What's your IO subsystem look like? > Again, nothing else is happening on this db server except for a constant > insertion into this table and a few others. I have done "set statistics > 100" for the evtime field in this table. > > Here is the output from EXPLAIN ANALYZE. This is the same query run > back to back, first time takes 42 minutes, second time takes less than 2 > minutes! > > mydb=# explain analyze select count(*) from prediction_accuracy where > evtime between '2007-09-29' and '2007-09-30'; > > QUERY PLAN > > --------------------------------------------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=3.02..3.03 rows=1 width=0) (actual > time=2549854.351..2549854.352 rows=1 loops=1) > -> Index Scan using pred_acc_evtime_index on prediction_accuracy > (cost=0.00..3.02 rows=1 width=0) (actual time=97.676..2532824.892 > rows=11423786 loops=1) > Index Cond: ((evtime >= '2007-09-29 00:00:00-07'::timestamp > with time zone) AND (evtime <= '2007-09-30 00:00:00-07'::timestamp with > time zone)) > Total runtime: 2549854.411 ms > (4 rows) > > Time: 2549943.506 ms > mydb=# explain analyze select count(*) from prediction_accuracy where > evtime between '2007-09-29' and '2007-09-30'; > > QUERY PLAN > > ------------------------------------------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=3.02..3.03 rows=1 width=0) (actual > time=111200.943..111200.944 rows=1 loops=1) > -> Index Scan using pred_acc_evtime_index on prediction_accuracy > (cost=0.00..3.02 rows=1 width=0) (actual time=36.396..96347.483 > rows=11423786 loops=1) > Index Cond: ((evtime >= '2007-09-29 00:00:00-07'::timestamp > with time zone) AND (evtime <= '2007-09-30 00:00:00-07'::timestamp with > time zone)) > Total runtime: 111201.000 ms > (4 rows) > > Time: 111298.695 ms > > > Mike > > Gregory Stark wrote: > > "Mike Charnoky" <noky@nextbus.com> writes: > > > >> I altered the table in question, with "set statistics 100" on the > >> timestamp column, then ran analyze. This seemed to help somewhat. Now, > >> queries don't seem to hang, but it still takes a long time to do the count: > >> * "where evtime between '2007-09-26' and '2007-09-27'" > >> took 37 minutes to run (result was ~12 million) > >> * "where evtime between '2007-09-25' and '2007-09-26'" > >> took 40 minutes to run (result was ~14 million) > >> > >> Still stymied about the seemingly random performance, especially since I > >> have seen this query execute in 2 minutes. > > > > > > And the "explain analyze" for these? > > > > Are you still sure it's certain date ranges which are consistently problems > > and others are consistently fast? Or could it be something unrelated. > > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq -- Bill Moran http://www.potentialtech.com
"Mike Charnoky" <noky@nextbus.com> writes: > Here is the output from EXPLAIN ANALYZE. This is the same query run > back to back, first time takes 42 minutes, second time takes less than 2 > minutes! That doesn't really sound strange at all. It sounds like you have a very slow disk and very large amount of memory. 40 minutes to scan 11.4M records sounds kind of high to me though. How wide are these records anyways? That is, what is the table definition for this table? If this is a single consumer drive 42 minutes sounds about right for 2k wide records being randomly accessed. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Mon, Oct 01, 2007 at 01:34:32PM -0400, Bill Moran wrote: > This sounds like a caching issue. My guess at what's happening is that > other operations are pushing this data out of the shared_buffers, so > when you run it, the system has to pull a bunch of tuples off the disk > to check them. If you run it again immediately, the tuples are still in > memory, and it runs very fast. You should check your ratio of system cache to shared_buffers. Pushing things out of shared buffers isn't bad if you've got a much bigger OS behind it. The cost of pulling something out of the OS cache is negligable compared to really going to disk. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
On Mon, 01 Oct 2007, Mike Charnoky wrote: > I altered the table in question, with "set statistics 100" on the > timestamp column, then ran analyze. This seemed to help somewhat. Now, > queries don't seem to hang, but it still takes a long time to do the count: > * "where evtime between '2007-09-26' and '2007-09-27'" > took 37 minutes to run (result was ~12 million) > * "where evtime between '2007-09-25' and '2007-09-26'" > took 40 minutes to run (result was ~14 million) Maybe it needs several million scattered seeks which basically disable disk cache. If you can afford a database inactivity period you can for example do periodically (for example on nights or weekends): cluster pred_acc_evtime_index on prediction_accuracy; (see http://www.postgresql.org/docs/8.2/static/sql-cluster.html) This would run rather long time for the first time, but will be much faster later. This should make seeks much more local and it would help operating system to cache results. Regards Tometzky -- ...although Eating Honey was a very good thing to do, there was a moment just before you began to eat it which was better than when you were... Winnie the Pooh
The db server is pretty beefy: 2x Xeon 3.20 GHz with 6G RAM. The io subsystem is a 550G 4-disk SATA 150 RAID 10 array connected via a 3ware 9500S-8 controller (Seagate Nearline ST3400632NS drives). Currently, shared_buffers is set to 50000 (nearly 400M) As for the data stored in this large table, there are 15 columns. Each row takes roughly 134 bytes to store, not counting the index. So, for one day's worth of data we are talking about 1.5G/day (1.8G with the index). That's about 11.5M rows/day. Although the data isn't stored exactly sequentially by the indexed time field, it is pretty close. If it takes PG ~40 minutes to count(*) one day's worth of records, the avg throughput is 786k/s. Watching iostat during the count(*) operation, I see average read speeds in the range of 1100-1500k/s. I guess I would expect postgres to perform a count(*) faster. When I run benchmarks on the machine with hdparm (with the db insert process running), I see the disk averages > 80MB/sec for reads # hdparm -tT /dev/sdb1 /dev/sdb1: Timing cached reads: 3884 MB in 2.00 seconds = 1942.85 MB/sec Timing buffered disk reads: 248 MB in 3.01 seconds = 82.49 MB/sec Maybe PG has to do a lot of random disk access? I'm running bonnie++ now to get more detailed disk performance info. As Tomasz pointed out maybe using CLUSTER would help, but this probably takes a long time to perform. Again, the only other thing happening with the db: a separate process is inserting data into this table. I have checkpoint_segments set to 64 so that pg is not constantly thrashing the disk with writes. The transaction log is on a separate disk. Mike Bill Moran wrote: > In response to Mike Charnoky <noky@nextbus.com>: > >> This is strange... count(*) operations over a period of one day's worth >> of data now take ~1-2 minutes to run or ~40 minutes. It seems that the >> first time the data is queried it takes about 40 minutes. If I try the >> query again, it finishes in 1-2 minutes! > > This sounds like a caching issue. My guess at what's happening is that > other operations are pushing this data out of the shared_buffers, so > when you run it, the system has to pull a bunch of tuples off the disk > to check them. If you run it again immediately, the tuples are still in > memory, and it runs very fast. > > If this is the case, you can speed up things by adding RAM/shared_buffers, > or by moving to faster disks. The RAM solution is going to give you the > biggest performance improvement. > > However, if there's enough other data on this system, you may have > difficulty getting enough RAM to mitigate the problem, in which case, > faster disks are going to be your best bet. > > How much RAM do you have, and how much of it is allocated to shared_buffers? > What's your IO subsystem look like?
In response to Mike Charnoky <noky@nextbus.com>: > The db server is pretty beefy: 2x Xeon 3.20 GHz with 6G RAM. The io > subsystem is a 550G 4-disk SATA 150 RAID 10 array connected via a 3ware > 9500S-8 controller (Seagate Nearline ST3400632NS drives). Currently, > shared_buffers is set to 50000 (nearly 400M) The advice on 8.x systems has been to start with 1/4-1/3 of the available RAM on the system, and fine-tune from there. Unless there are other (non-postgresql) functions this machine serves, you should probably up shared_buffers to about 2G. From there, you may find that your workload benefits from even more, or possibly less, but 400M seems pretty small for a 6G system. > As for the data stored in this large table, there are 15 columns. Each > row takes roughly 134 bytes to store, not counting the index. So, for > one day's worth of data we are talking about 1.5G/day (1.8G with the > index). That's about 11.5M rows/day. Although the data isn't stored > exactly sequentially by the indexed time field, it is pretty close. How much other data is this server pushing around? If there's only that one table in that one database, then something is wrong, as that whole thing should be in the filesystem cache all the time. Otherwise, you have to consider what other operations may be needing memory and moving those tables out of the way. > If it takes PG ~40 minutes to count(*) one day's worth of records, the > avg throughput is 786k/s. Watching iostat during the count(*) > operation, I see average read speeds in the range of 1100-1500k/s. Could be a lot of fragmentation of that table. Keep in mind that if you're deleting records occasionally, that free space will get reused, which means an insert might not insert sequentially, it might go all over the table. > I guess I would expect postgres to perform a count(*) faster. When I > run benchmarks on the machine with hdparm (with the db insert process > running), I see the disk averages > 80MB/sec for reads > > # hdparm -tT /dev/sdb1 > /dev/sdb1: > Timing cached reads: 3884 MB in 2.00 seconds = 1942.85 MB/sec > Timing buffered disk reads: 248 MB in 3.01 seconds = 82.49 MB/sec > > Maybe PG has to do a lot of random disk access? I'm running bonnie++ > now to get more detailed disk performance info. As Tomasz pointed out > maybe using CLUSTER would help, but this probably takes a long time to > perform. If you can spare the time, give it a try to see if it helps. > Again, the only other thing happening with the db: a separate process is > inserting data into this table. I have checkpoint_segments set to 64 so > that pg is not constantly thrashing the disk with writes. The > transaction log is on a separate disk. > > > Mike > > Bill Moran wrote: > > In response to Mike Charnoky <noky@nextbus.com>: > > > >> This is strange... count(*) operations over a period of one day's worth > >> of data now take ~1-2 minutes to run or ~40 minutes. It seems that the > >> first time the data is queried it takes about 40 minutes. If I try the > >> query again, it finishes in 1-2 minutes! > > > > This sounds like a caching issue. My guess at what's happening is that > > other operations are pushing this data out of the shared_buffers, so > > when you run it, the system has to pull a bunch of tuples off the disk > > to check them. If you run it again immediately, the tuples are still in > > memory, and it runs very fast. > > > > If this is the case, you can speed up things by adding RAM/shared_buffers, > > or by moving to faster disks. The RAM solution is going to give you the > > biggest performance improvement. > > > > However, if there's enough other data on this system, you may have > > difficulty getting enough RAM to mitigate the problem, in which case, > > faster disks are going to be your best bet. > > > > How much RAM do you have, and how much of it is allocated to shared_buffers? > > What's your IO subsystem look like? > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings -- Bill Moran http://www.potentialtech.com