Thread: Database-wide VACUUM ANALYZE
We recently upgraded a very large database (~550 GB) from 8.1.4 to 8.2.4 via a pg_dump and pg_restore. (Note that the restore took several days.) We had accepted the default settings:
vacuum_freeze_min_age = 100 million
autovacuum_freeze_max_age = 200 million
Due to our very high transaction rate, it appears that a database-wide vacuum kicked off approximately 2 weeks after the restore. (Aside: after reading the docs and considering our system characteristics, I know now that our autovacuum_freeze_max_age should be more like 2 billion. However on this machine I haven't changed the config settings yet.) Also, I believe, that due to the bulk of our data having the same "age" after the restore, the db-wide vacuum had *a lot* of rows to mark with the FrozenXID.
The good thing is that the db-wide vacuum, which ran for a long time, was reasonably non-intrusive to other database activity (somewhat, but reasonable for the short term). The other good thing was that concurrent autovacuum processes were still vacuuming/analyzing tables as necessary.
The bad thing, which I don't totally understand from reading the docs, is that another db-wide vacuum kicked off exactly 24 hours after the first db-wide vacuum kicked off, before the first one had finished. (Note that these vacuums seem to go through the tables alphabetically.) I managed to explain this to myself in that there were still rows in tables not yet touched by the first db-wide vacuum that could have XIDs older than autovacuum_freeze_max_age. Fine, so two db-wide vacuums were now taking place, one behind the other.
The first db-wide vacuum finished approximately 36 hours after it started. At this point I was convinced that the second db-wide vacuum would run to completion with little or no work to do and all would be good. The thing I can't explain is why a third db-wide vacuum kicked off exactly 24 hours (again) after the second db-wide vacuum kicked off (and the second vacuum still running).
Wouldn't the first db-wide vacuum have marked any rows that needed it with the FrozenXID? Why would a third db-wide vacuum kick off so soon after the first db-wide vacuum had completed? Surely there haven't been 100 million more transactions in the last two days?
Can someone explain what is going on here? I can't quite figure it out based on the docs.
Thanks,
Steve
Steven Flatt wrote: > The bad thing, which I don't totally understand from reading the docs, is > that another db-wide vacuum kicked off exactly 24 hours after the first > db-wide vacuum kicked off, before the first one had finished. (Note that > these vacuums seem to go through the tables alphabetically.) I managed to > explain this to myself in that there were still rows in tables not yet > touched by the first db-wide vacuum that could have XIDs older than > autovacuum_freeze_max_age. Fine, so two db-wide vacuums were now taking > place, one behind the other. Are you sure there's no cron job starting the vacuums? 24h sounds too good to be a coincidence, and there's no magic constant of 24h in the autovacuum code. Besides, autovacuum can only be running one VACUUM at a time, so there must be something else launching them. What's your vacuuming strategy in general, before and after upgrade? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Steven Flatt writes: > Can someone explain what is going on here? I can't quite figure it out > based on the docs. Are you on FreeBSD by any chance? I think the FreeBSD port by default installs a script that does a daily vacuum. If using another OS, perhaps you want to see if you used some sort of package system and if that package added a nightly vacuum.
On 6/21/07, Francisco Reyes <lists@stringsutils.com> wrote:
Are you on FreeBSD by any chance?
I think the FreeBSD port by default installs a script that does a daily
vacuum.
Yes, FreeBSD. Do you know what script that is? And it does a db-wide VACUUM ANALYZE every day?! That is certainly not necessary, and in fact, costly for us.
Hmmm... I wonder why this would just start now, three days ago. Everything seemed to be normal for the last two weeks.
Steve
On Thu, 21 Jun 2007, Steven Flatt wrote: > On 6/21/07, Francisco Reyes <lists@stringsutils.com> wrote: >> >> Are you on FreeBSD by any chance? >> >> I think the FreeBSD port by default installs a script that does a daily >> vacuum. > > > Yes, FreeBSD. Do you know what script that is? And it does a db-wide > VACUUM ANALYZE every day?! That is certainly not necessary, and in fact, > costly for us. > > Hmmm... I wonder why this would just start now, three days ago. Everything > seemed to be normal for the last two weeks. > The current FreeBSD port places the script in: /usr/local/etc/periodic/daily/502.pgsql And it can be controlled from /etc/periodic.conf See the top of that script. LER > Steve > -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 512-248-2683 E-Mail: ler@lerctr.org US Mail: 430 Valona Loop, Round Rock, TX 78681-3893
In response to "Steven Flatt" <steven.flatt@gmail.com>: > On 6/21/07, Francisco Reyes <lists@stringsutils.com> wrote: > > > > Are you on FreeBSD by any chance? > > > > I think the FreeBSD port by default installs a script that does a daily > > vacuum. > > > Yes, FreeBSD. Do you know what script that is? /usr/local/etc/periodic/daily/502.pgsql > And it does a db-wide > VACUUM ANALYZE every day?! That is certainly not necessary, and in fact, > costly for us. You can control it with knobs in /etc/periodic.conf (just like other periodic job): daily_pgsql_vacuum_enable="YES" daily_pgsql_backup_enable="NO" are the defaults. > Hmmm... I wonder why this would just start now, three days ago. Everything > seemed to be normal for the last two weeks. Someone alter /etc/periodic.conf? Perhaps it's been running all along but you never noticed it before now? -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023
Thanks everyone. It appears that we had hacked the 502.pgsql script for our 8.1 build to disable the daily vacuum. I was not aware of this when building and upgrading to 8.2.
So it looks like for the past two weeks, that 36 hour db-wide vacuum has been running every 24 hours. Good for it for being reasonably non-intrusive and going unnoticed until now. :)
Although apparently not related anymore, I still think it was a good move to change autovacuum_freeze_max_age from 200 million to 2 billion.
Steve
Steven Flatt escribió: > Thanks everyone. It appears that we had hacked the 502.pgsql script for our > 8.1 build to disable the daily vacuum. I was not aware of this when > building and upgrading to 8.2. > > So it looks like for the past two weeks, that 36 hour db-wide vacuum has > been running every 24 hours. Good for it for being reasonably non-intrusive > and going unnoticed until now. :) Looks like you have plenty of spare I/O ;-) > Although apparently not related anymore, I still think it was a good move to > change autovacuum_freeze_max_age from 200 million to 2 billion. Absolutely not related. Also note that 1. autovacuum is not able (in 8.2 or older) to have more than one task running 2. autovacuum in 8.2 doesn't ever launch database-wide vacuums. As of 8.2 it only vacuums tables that are in actual danger of Xid wraparound (according to pg_class.relfrozenxid); tables that had been vacuumed the day before would not need another vacuum for Xid purposes (though if you had modified the table to the point that it needed another vacuum, that would be another matter). Unless you consumed 200 million (or 2 billion) transactions during the day, that is. -- Alvaro Herrera http://www.amazon.com/gp/registry/5ZYLFMCVHXC "The only difference is that Saddam would kill you on private, where the Americans will kill you in public" (Mohammad Saleh, 39, a building contractor)
On Jun 21, 2007, at 3:37 PM, Steven Flatt wrote: > Thanks everyone. It appears that we had hacked the 502.pgsql > script for our 8.1 build to disable the daily vacuum. I was not > aware of this when building and upgrading to 8.2. Much better to change stuff in a config file than to hack installed scripts, for this very reason. :) > So it looks like for the past two weeks, that 36 hour db-wide > vacuum has been running every 24 hours. Good for it for being > reasonably non-intrusive and going unnoticed until now. :) > > Although apparently not related anymore, I still think it was a > good move to change autovacuum_freeze_max_age from 200 million to 2 > billion. If you set that to 2B, that means you're 2^31-"2 billion"-1000000 transactions away from a shutdown when autovac finally gets around to trying to run a wraparound vacuum on a table. If you have any number of large tables, that could be a big problem, as autovac could get tied up on a large table for a long enough period that the table needing to be frozen doesn't get frozen in time. I suspect 1B is a much better setting. I probably wouldn't go past 1.5B. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On 6/25/07, Jim Nasby <decibel@decibel.org> wrote:
If you set that to 2B, that means you're 2^31-"2 billion"-1000000
transactions away from a shutdown when autovac finally gets around to
trying to run a wraparound vacuum on a table. If you have any number
of large tables, that could be a big problem, as autovac could get
tied up on a large table for a long enough period that the table
needing to be frozen doesn't get frozen in time.
I suspect 1B is a much better setting. I probably wouldn't go past 1.5B.
From my understanding of the docs, for tables that are not otherwise vacuumed, autovac will be invoked on it once every autovacuum_freeze_max_age minus vacuum_freeze_min_age transactions. In our case that's 2 billion - 100 million = 1.9 billion transactions. So when an autovac finally kicks off on an otherwise non-vacuumed table, we are (2^31 - 1.9 billion) - 1 million =~ 250 million transactions away from shutdown. (I guess that's close to what you were saying.)
Most of our large (partitioned) tables are insert-only (truncated eventually) so will not be touched by autovacuum until wraparound prevention kicks in. However the tables are partitioned by timestamp so tables will cross the 1.9 billion marker at different times (some not at all, as the data will have been truncated).
Do you still think the 250 million transactions away from shutdown is cutting it too close? Recall that the unintentional db-wide vacuum analyze that was going on last week on our system took less than two days to complete.
Steve
Steven Flatt escribió: > Most of our large (partitioned) tables are insert-only (truncated > eventually) so will not be touched by autovacuum until wraparound prevention > kicks in. However the tables are partitioned by timestamp so tables will > cross the 1.9 billion marker at different times (some not at all, as the > data will have been truncated). Note that as of 8.3, tables that are truncated do not need vacuuming for Xid wraparound purposes, because the counter is updated on TRUNCATE (as it is on CLUSTER and certain forms of ALTER TABLE). > Do you still think the 250 million transactions away from shutdown is > cutting it too close? Recall that the unintentional db-wide vacuum analyze > that was going on last week on our system took less than two days to > complete. Is this 8.1 or 8.2? In the latter you don't ever need db-wide vacuums at all, because Xid wraparound is tracked per table, so only tables actually needing vacuum are processed. To answer your question, the followup question is how many transactions normally take place in two days. If they are way less than 250 million then you don't need to worry. Otherwise, the database may shut itself down to protect from Xid wraparound. -- Alvaro Herrera http://www.flickr.com/photos/alvherre/ "La soledad es compañía"