Thread: Vacuum Problems (locking)
I have a nightly vacuum scheduled as follows: su - postgres -c "/usr/bin/vacuumdb --analyze --dbname=mydb" Last night, it appears that the vacuum blocked db access from my application server (JBoss). Here is the logfile snippet: [3693-jbossdb-postgres-2006-10-25 06:52:29.488 EDT]NOTICE: number of page slots needed (559472) exceeds max_fsm_pages (200000) [3693-jbossdb-postgres-2006-10-25 06:52:29.488 EDT]HINT: Consider increasing the configuration parameter "max_fsm_pages" to a value over 559472. [3693-jbossdb-postgres-2006-10-25 06:52:29.488 EDT]LOG: number of page slots needed (559472) exceeds max_fsm_pages (200000) [3693-jbossdb-postgres-2006-10-25 06:52:29.488 EDT]HINT: Consider increasing the configuration parameter "max_fsm_pages" to a value over 559472. [3693-jbossdb-postgres-2006-10-25 06:52:29.530 EDT]LOG: transaction ID wrap limit is 1073813592, limited by database "benchpg" [3693-jbossdb-postgres-2006-10-25 06:52:29.530 EDT]LOG: transaction ID wrap limit is 1073813592, limited by database "benchpg" [3693-jbossdb-postgres-2006-10-25 06:52:29.530 EDT]LOG: duration: 10224738.050 ms statement: VACUUM ANALYZE; First of all, my max_fsm_pages is obviously way off. However, every time I increase my max_fsm_pages the next vacuum says that it requires more. Will there every be a plateau in the requested pages? What is the impact of exceeding the pages slots needed? Secondly, in this case, it almost looks like there were two vacuum process running based on the duplicated log messages. There is only one cron job, so there shouldn't have been a second vacuum running. Should I just ignore this? Finally, and most important is the blocking. The vacuum duration reported in the log converts to about 170 minutes. I can track backwards in the log and the only messages prior to the 6:52am completion of the vacuum end at 3:57am (almost exactly 170 minutes prior). So, all indications point to postgres blocking all access during the entire vacuum. The only semi-unusual characteristic of my database is that I have one table in particular that grows and shrinks dramatically. My entire database after a vacuum --full is about 8GB. However, after a day or two, the db will grow as large as 75GB. Truncating that one table (since it empties regularly) will shrink the db right back to the 8GB. Is it possible that the extreme table size (maybe requiring the high number of page slots) is causing postgres to block more than it should in order to process that table? Thanks, Mike
Mike Goldner wrote: > I have a nightly vacuum scheduled as follows: > > su - postgres -c "/usr/bin/vacuumdb --analyze --dbname=mydb" > > Last night, it appears that the vacuum blocked db access from my > application server (JBoss). Here is the logfile snippet: > > [3693-jbossdb-postgres-2006-10-25 06:52:29.488 EDT]NOTICE: number of > page slots needed (559472) exceeds max_fsm_pages (200000) > [3693-jbossdb-postgres-2006-10-25 06:52:29.488 EDT]HINT: Consider > increasing the configuration parameter "max_fsm_pages" to a value over > 559472. > [3693-jbossdb-postgres-2006-10-25 06:52:29.488 EDT]LOG: number of page > slots needed (559472) exceeds max_fsm_pages (200000) > [3693-jbossdb-postgres-2006-10-25 06:52:29.488 EDT]HINT: Consider > increasing the configuration parameter "max_fsm_pages" to a value over > 559472. > [3693-jbossdb-postgres-2006-10-25 06:52:29.530 EDT]LOG: transaction ID > wrap limit is 1073813592, limited by database "benchpg" > [3693-jbossdb-postgres-2006-10-25 06:52:29.530 EDT]LOG: transaction ID > wrap limit is 1073813592, limited by database "benchpg" > [3693-jbossdb-postgres-2006-10-25 06:52:29.530 EDT]LOG: duration: > 10224738.050 ms statement: VACUUM ANALYZE; > > First of all, my max_fsm_pages is obviously way off. However, every > time I increase my max_fsm_pages the next vacuum says that it requires > more. Will there every be a plateau in the requested pages? What is > the impact of exceeding the pages slots needed? You will need to vacuum full and if your need continues to grow, you need to vacuum more. > The only semi-unusual characteristic of my database is that I have one > table in particular that grows and shrinks dramatically. My entire > database after a vacuum --full is about 8GB. However, after a day or > two, the db will grow as large as 75GB. Truncating that one table > (since it empties regularly) will shrink the db right back to the 8GB. Vacuum that single table, ALOT. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
On Wed, 2006-10-25 at 15:54 -0400, Mike Goldner wrote: > I have a nightly vacuum scheduled as follows: > > su - postgres -c "/usr/bin/vacuumdb --analyze --dbname=mydb" > > Last night, it appears that the vacuum blocked db access from my > application server (JBoss). Here is the logfile snippet: > > [3693-jbossdb-postgres-2006-10-25 06:52:29.488 EDT]NOTICE: number of > page slots needed (559472) exceeds max_fsm_pages (200000) > [3693-jbossdb-postgres-2006-10-25 06:52:29.488 EDT]HINT: Consider > increasing the configuration parameter "max_fsm_pages" to a value over > 559472. > [3693-jbossdb-postgres-2006-10-25 06:52:29.488 EDT]LOG: number of page > slots needed (559472) exceeds max_fsm_pages (200000) > [3693-jbossdb-postgres-2006-10-25 06:52:29.488 EDT]HINT: Consider > increasing the configuration parameter "max_fsm_pages" to a value over > 559472. > [3693-jbossdb-postgres-2006-10-25 06:52:29.530 EDT]LOG: transaction ID > wrap limit is 1073813592, limited by database "benchpg" > [3693-jbossdb-postgres-2006-10-25 06:52:29.530 EDT]LOG: transaction ID > wrap limit is 1073813592, limited by database "benchpg" > [3693-jbossdb-postgres-2006-10-25 06:52:29.530 EDT]LOG: duration: > 10224738.050 ms statement: VACUUM ANALYZE; > > First of all, my max_fsm_pages is obviously way off. However, every > time I increase my max_fsm_pages the next vacuum says that it requires > more. Will there every be a plateau in the requested pages? What is > the impact of exceeding the pages slots needed? > > Secondly, in this case, it almost looks like there were two vacuum > process running based on the duplicated log messages. There is only one > cron job, so there shouldn't have been a second vacuum running. Should > I just ignore this? > > Finally, and most important is the blocking. The vacuum duration > reported in the log converts to about 170 minutes. I can track > backwards in the log and the only messages prior to the 6:52am > completion of the vacuum end at 3:57am (almost exactly 170 minutes > prior). > > So, all indications point to postgres blocking all access during the > entire vacuum. > > The only semi-unusual characteristic of my database is that I have one > table in particular that grows and shrinks dramatically. My entire > database after a vacuum --full is about 8GB. However, after a day or > two, the db will grow as large as 75GB. Truncating that one table > (since it empties regularly) will shrink the db right back to the 8GB. > > Is it possible that the extreme table size (maybe requiring the high > number of page slots) is causing postgres to block more than it should > in order to process that table? > > Thanks, > > Mike > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend I just noticed that my nightly vacuum failed the previous three nights with the following error: [2032-jbossdb-postgres-2006-10-24 04:12:30.019 EDT]ERROR: failed to re- find parent key in "jms_messages_pkey" [2032-jbossdb-postgres-2006-10-24 04:12:30.019 EDT]STATEMENT: VACUUM ANALYZE; I'm not sure if it is related, but it seems pertinent. Thanks, Mike
Mike Goldner <mgoldner@agmednet.com> writes: > First of all, my max_fsm_pages is obviously way off. However, every > time I increase my max_fsm_pages the next vacuum says that it requires > more. Will there every be a plateau in the requested pages? We realized recently that this can happen if you have a single relation that all by itself needs more FSM space than you've got; any per-table space beyond max_fsm_pages is disregarded in the calculation of the total system-wide space needed, and so you get this inflationary effect where the reported requirement keeps going up when you increase the setting. There's a fix in for 8.2. > So, all indications point to postgres blocking all access during the > entire vacuum. You haven't said anything I consider credible evidence of that. But try looking in pg_locks next time. regards, tom lane
On Wed, Oct 25, 2006 at 03:54:17PM -0400, Mike Goldner wrote: > Finally, and most important is the blocking. The vacuum duration > reported in the log converts to about 170 minutes. I can track > backwards in the log and the only messages prior to the 6:52am > completion of the vacuum end at 3:57am (almost exactly 170 minutes > prior). > > So, all indications point to postgres blocking all access during the > entire vacuum. The vacuum command you posted doesn't block. I can think of two other explanations: 1. You just didn't have any activity then. Don't throw away this possibility without evidence: I can't count the number of blind alleys I've been down because someone insisted that "never happens". 2. You're completely pegged on I/O. Vacuum will make this worse, and maybe therefore no transactions get through. You can fiddle with the vacuum settings to get them to back off a little and let some other transactions through. That said, your essential problem is that one table. Vacuum it more often -- from the look of the churn on it, I'd just put a job on it that runs all the time and sleeps for a few seconds in between -- and you shouldn't have this problem. But you'll need to VACUUM FULL or dump and reload first. A -- Andrew Sullivan | ajs@crankycanuck.ca Windows is a platform without soap, where rats run around in open sewers. --Daniel Eran
On Wed, Oct 25, 2006 at 04:00:29PM -0400, Mike Goldner wrote: > I just noticed that my nightly vacuum failed the previous three nights > with the following error: > > [2032-jbossdb-postgres-2006-10-24 04:12:30.019 EDT]ERROR: failed to re- > find parent key in "jms_messages_pkey" > [2032-jbossdb-postgres-2006-10-24 04:12:30.019 EDT]STATEMENT: VACUUM > ANALYZE; > > I'm not sure if it is related, but it seems pertinent. Well, vacuuming a lot of databases once a day is bad enough. Not vacuuming for 3 days... IIRC you can fix that issue by reindexing jms_messages_pkey. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)