Thread: count(*) performance
Hi, I guess this is an age-old 100times answered question, but I didn't find the answer to it yet (neither in the FAQ nor in the mailing list archives). Question: I have a table with 2.5M rows. count(*) on this table is running 4 minutes long. (dual opteron, 4gig ram, db on 4 disk raid10 array (sata, not scsi)) Is this normal? How could I make it run faster? Maybe make it run faster for the 2nd time? Which parameters should I change in postgresql.conf and how? -- Üdvözlettel, Gábriel Ákos -=E-Mail :akos.gabriel@i-logic.hu|Web: http://www.i-logic.hu=- -=Tel/fax:+3612367353 |Mobil:+36209278894 =-
On Mon, Mar 27, 2006 at 03:34:32PM +0200, G?briel ?kos wrote: > Hi, > > I guess this is an age-old 100times answered question, but I didn't find > the answer to it yet (neither in the FAQ nor in the mailing list archives). > > Question: I have a table with 2.5M rows. count(*) on this table is > running 4 minutes long. (dual opteron, 4gig ram, db on 4 disk raid10 > array (sata, not scsi)) Is this normal? How could I make it run faster? > Maybe make it run faster for the 2nd time? Which parameters should I > change in postgresql.conf and how? First, count(*) on PostgreSQL tends to be slow because you can't do index covering[1]. But in this case, I'd bet money that if it's taking 4 minutes something else is wrong. Have you been vacuuming that table frequently enough? What's SELECT relpages FROM pg_class WHERE relname='tablename' show? [1] http://www.pervasive-postgres.com/lp/newsletters/2006/Insights_postgres_Feb.asp#5 -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Gabriel, On 3/27/06 5:34 AM, "Gábriel Ákos" <akos.gabriel@i-logic.hu> wrote: > Question: I have a table with 2.5M rows. count(*) on this table is > running 4 minutes long. (dual opteron, 4gig ram, db on 4 disk raid10 > array (sata, not scsi)) Is this normal? How could I make it run faster? > Maybe make it run faster for the 2nd time? Which parameters should I > change in postgresql.conf and how? Before changing anything with your Postgres configuration, you should check your hard drive array performance. All select count(*) does is a sequential scan of your data, and if the table is larger than memory, or if it's the first time you've scanned it, it is limited by your disk speed. To test your disk speed, use the following commands and report the times here: time bash -c "dd if=/dev/zero of=bigfile bs=8k count=500000 && sync" time dd if=bigfile of=/dev/null bs=8k If these are taking a long time, from another session watch the I/O rate with "vmstat 1" for a while and report that here. - Luke
Luke Lonergan wrote: > To test your disk speed, use the following commands and report the times > here: > > time bash -c "dd if=/dev/zero of=bigfile bs=8k count=500000 && sync" root@panther:/fast # time bash -c "dd if=/dev/zero of=bigfile bs=8k count=500000 && sync" 500000+0 records in 500000+0 records out 4096000000 bytes transferred in 45.469404 seconds (90082553 bytes/sec) real 0m56.880s user 0m0.112s sys 0m18.937s > time dd if=bigfile of=/dev/null bs=8k root@panther:/fast # time dd if=bigfile of=/dev/null bs=8k 500000+0 records in 500000+0 records out 4096000000 bytes transferred in 53.542147 seconds (76500481 bytes/sec) real 0m53.544s user 0m0.048s sys 0m10.637s I guess these values aren't that bad :) -- Üdvözlettel, Gábriel Ákos -=E-Mail :akos.gabriel@i-logic.hu|Web: http://www.i-logic.hu=- -=Tel/fax:+3612367353 |Mobil:+36209278894 =-
Jim C. Nasby wrote: > But in this case, I'd bet money that if it's taking 4 minutes something > else is wrong. Have you been vacuuming that table frequently enough? That gave me an idea. I thought that autovacuum is doing it right, but I issued a vacuum full analyze verbose , and it worked all the day. After that I've tweaked memory settings a bit too (more fsm_pages) Now: staging=# SELECT count(*) from infx.infx_product; count --------- 3284997 (1 row) Time: 1301.049 ms As I saw the output, the database was compressed to 10% of its size :) This table has quite big changes every 4 hour, let's see how it works. Maybe I'll have to issue full vacuums from cron regularly. > What's SELECT relpages FROM pg_class WHERE relname='tablename' show? This went to 10% as well, now it's around 156000 pages. Regards, Akos -- Üdvözlettel, Gábriel Ákos -=E-Mail :akos.gabriel@i-logic.hu|Web: http://www.i-logic.hu=- -=Tel/fax:+3612367353 |Mobil:+36209278894 =-
Gabriel, On 3/27/06 10:05 AM, "Gábriel Ákos" <akos.gabriel@i-logic.hu> wrote: > That gave me an idea. I thought that autovacuum is doing it right, but I > issued a vacuum full analyze verbose , and it worked all the day. > After that I've tweaked memory settings a bit too (more fsm_pages) Oops! I replied to your disk speed before I saw this. The only thing is - you probably don't want to do a "vacuum full", but rather a simple "vacuum" should be enough. - Luke
Luke Lonergan wrote: > Gabriel, > > On 3/27/06 10:05 AM, "Gábriel Ákos" <akos.gabriel@i-logic.hu> wrote: > >> That gave me an idea. I thought that autovacuum is doing it right, but I >> issued a vacuum full analyze verbose , and it worked all the day. >> After that I've tweaked memory settings a bit too (more fsm_pages) > > Oops! I replied to your disk speed before I saw this. > > The only thing is - you probably don't want to do a "vacuum full", but > rather a simple "vacuum" should be enough. I thought that too. Autovacuum is running on our system but it didn't do the trick. Anyway the issue is solved, thank you all for helping. :) -- Üdvözlettel, Gábriel Ákos -=E-Mail :akos.gabriel@i-logic.hu|Web: http://www.i-logic.hu=- -=Tel/fax:+3612367353 |Mobil:+36209278894 =-
Does that mean that even though autovacuum is turned on, you still should do a regular vacuum analyze periodically? Thanks, ____________________________________________________________________ Brendan Duddridge | CTO | 403-277-5591 x24 | brendan@clickspace.com ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com On Mar 27, 2006, at 11:14 AM, Luke Lonergan wrote: > Gabriel, > > On 3/27/06 10:05 AM, "Gábriel Ákos" <akos.gabriel@i-logic.hu> wrote: > >> That gave me an idea. I thought that autovacuum is doing it right, >> but I >> issued a vacuum full analyze verbose , and it worked all the day. >> After that I've tweaked memory settings a bit too (more fsm_pages) > > Oops! I replied to your disk speed before I saw this. > > The only thing is - you probably don't want to do a "vacuum full", but > rather a simple "vacuum" should be enough. > > - Luke > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
Attachment
Gábriel Ákos wrote: > Luke Lonergan wrote: >> Gabriel, >> >> On 3/27/06 10:05 AM, "Gábriel Ákos" <akos.gabriel@i-logic.hu> wrote: >> >>> That gave me an idea. I thought that autovacuum is doing it right, but I >>> issued a vacuum full analyze verbose , and it worked all the day. >>> After that I've tweaked memory settings a bit too (more fsm_pages) >> >> Oops! I replied to your disk speed before I saw this. >> >> The only thing is - you probably don't want to do a "vacuum full", but >> rather a simple "vacuum" should be enough. > > I thought that too. Autovacuum is running on our system but it didn't do > the trick. Anyway the issue is solved, thank you all for helping. :) Yeah, it would be nice of autovacuum had some way of raising a flag to the admin that given current settings (thresholds, FSM etc...), it's not keeping up with the activity. I don't know how to do this, but I hope someone else has some good ideas. Matt
Brendan Duddridge wrote: > Does that mean that even though autovacuum is turned on, you still > should do a regular vacuum analyze periodically? No, it probably means you have set FSM settings too low, or not tuned the autovacuum parameters to your specific situation. A bug in the autovacuum daemon is not unexpected however, so if it doesn't work after tuning, let us know. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On 27.03.2006, at 21:20 Uhr, Brendan Duddridge wrote: > Does that mean that even though autovacuum is turned on, you still > should do a regular vacuum analyze periodically? It seems that there are situations where autovacuum does not a really good job. However, in our application I have made stupid design decision which I want to change as soon as possible. I have a "visit count" column in one of the very large tables, so updates are VERY regular. I've just checked and saw that autovacuum does a great job with that. Nevertheless I have set up a cron job to do a standard vacuum every month. I've used vacuum full only once after I did a bulk update of about 200.000 rows ... cug -- PharmaLine, Essen, GERMANY Software and Database Development
Attachment
This is where a "last_vacuumed" (and "last_analyzed") column in pg_statistic(?) would come in handy. Each time vacuum or analyze has finished, update the row for the specific table that was vacuumed/analyzed with a timestamp in the last_vacuumed/last_analyzed column. No more guessing "maybe I haven't vacuumed/analyzed in a while", and each time a user complains about bad performance, one could request the user to do a "select s.last_vacuumed, s.last_analyzed from pg_statistic s, pg_attribute a, pg_class c where ..." It SOUNDS easy to implement, but that has fooled me before... :-) - Mikael -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org]On Behalf Of Guido Neitzer Sent: den 27 mars 2006 21:44 To: Brendan Duddridge Cc: Postgresql Performance Subject: Re: [PERFORM] count(*) performance On 27.03.2006, at 21:20 Uhr, Brendan Duddridge wrote: > Does that mean that even though autovacuum is turned on, you still > should do a regular vacuum analyze periodically? It seems that there are situations where autovacuum does not a really good job. However, in our application I have made stupid design decision which I want to change as soon as possible. I have a "visit count" column in one of the very large tables, so updates are VERY regular. I've just checked and saw that autovacuum does a great job with that. Nevertheless I have set up a cron job to do a standard vacuum every month. I've used vacuum full only once after I did a bulk update of about 200.000 rows ... cug -- PharmaLine, Essen, GERMANY Software and Database Development
Mikael Carneholm wrote: > This is where a "last_vacuumed" (and "last_analyzed") column in > pg_statistic(?) would come in handy. Each time vacuum or analyze has > finished, update the row for the specific table that was > vacuumed/analyzed with a timestamp in the last_vacuumed/last_analyzed > column. No more guessing "maybe I haven't vacuumed/analyzed in a while", > and each time a user complains about bad performance, one could request > the user to do a "select s.last_vacuumed, s.last_analyzed from > pg_statistic s, pg_attribute a, pg_class c where ..." > > It SOUNDS easy to implement, but that has fooled me before... :-) It is fairly easy to implement, however it has been discussed before and decided that it wasn't necessary. What the system cares about is how long it's been since the last vacuum in terms of XIDs not time. Storing a timestamp would make it more human readable, but I'm not sure the powers that be want to add two new columns to some system table to accommodate this. Matt
"Matthew T. O'Connor" <matthew@zeut.net> writes: > It is fairly easy to implement, however it has been discussed before and > decided that it wasn't necessary. What the system cares about is how > long it's been since the last vacuum in terms of XIDs not time. I think Alvaro is intending to do the latter (store per-table vacuum xid info) for 8.2. regards, tom lane
I think it is definitely necessary from an administration point of view - as an administrator, I want to know: 1) Are there any stats (at all) in a schema 2) Are there any stats on the table that slow_query_foo is targeting 3) If I have stats, how recent are they 4) Could it be that there are a lot of dead tuples lying around (given the amount of traffic I know I have) These would be (are always!) the first questions I ask myself when I'm about to identify performance problems in an app,don't know how other people do though :) Maybe something I'll try to look into this weekend, if I can spare some time. - Mikael -----Original Message----- From: Matthew T. O'Connor [mailto:matthew@zeut.net] Sent: den 28 mars 2006 00:43 To: Mikael Carneholm Cc: Postgresql Performance Subject: Re: [PERFORM] count(*) performance Mikael Carneholm wrote: > This is where a "last_vacuumed" (and "last_analyzed") column in > pg_statistic(?) would come in handy. Each time vacuum or analyze has > finished, update the row for the specific table that was > vacuumed/analyzed with a timestamp in the last_vacuumed/last_analyzed > column. No more guessing "maybe I haven't vacuumed/analyzed in a while", > and each time a user complains about bad performance, one could request > the user to do a "select s.last_vacuumed, s.last_analyzed from > pg_statistic s, pg_attribute a, pg_class c where ..." > > It SOUNDS easy to implement, but that has fooled me before... :-) It is fairly easy to implement, however it has been discussed before and decided that it wasn't necessary. What the system cares about is how long it's been since the last vacuum in terms of XIDs not time. Storing a timestamp would make it more human readable, but I'm not sure the powers that be want to add two new columns to some system table to accommodate this. Matt
Gábriel Ákos wrote: > I thought that too. Autovacuum is running on our system but it didn't do > the trick. Anyway the issue is solved, thank you all for helping. :) Hi, Gabriel, it may be that your Free Space Map (FSM) setting is way to low. Try increasing it. Btw, VACUUM outputs a Warning if FSM is not high enough, maybe you can find useful hints in the log file. HTH Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org
On Mon, Mar 27, 2006 at 12:20:54PM -0700, Brendan Duddridge wrote: > Does that mean that even though autovacuum is turned on, you still > should do a regular vacuum analyze periodically? Doing a periodic vacuumdb -avz and keeping an eye on the last few lines isn't a bad idea. It would also be helpful if there was a log parser that could take a look at the output of a vacuumdb -av and look for any problem areas, such as relations that have a lot of free space in them. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461