Thread: Database-wide VACUUM ANALYZE

Database-wide VACUUM ANALYZE

From
"Steven Flatt"
Date:
We recently upgraded a very large database (~550 GB) from 8.1.4 to 8.2.4 via a pg_dump and pg_restore.  (Note that the restore took several days.)  We had accepted the default settings:
 
vacuum_freeze_min_age = 100 million
autovacuum_freeze_max_age = 200 million
 
Due to our very high transaction rate, it appears that a database-wide vacuum kicked off approximately 2 weeks after the restore.  (Aside: after reading the docs and considering our system characteristics, I know now that our autovacuum_freeze_max_age should be more like 2 billion.  However on this machine I haven't changed the config settings yet.)  Also, I believe, that due to the bulk of our data having the same "age" after the restore, the db-wide vacuum had *a lot* of rows to mark with the FrozenXID.
 
The good thing is that the db-wide vacuum, which ran for a long time, was reasonably non-intrusive to other database activity (somewhat, but reasonable for the short term).  The other good thing was that concurrent autovacuum processes were still vacuuming/analyzing tables as necessary.
 
The bad thing, which I don't totally understand from reading the docs, is that another db-wide vacuum kicked off exactly 24 hours after the first db-wide vacuum kicked off, before the first one had finished.  (Note that these vacuums seem to go through the tables alphabetically.)  I managed to explain this to myself in that there were still rows in tables not yet touched by the first db-wide vacuum that could have XIDs older than autovacuum_freeze_max_age.  Fine, so two db-wide vacuums were now taking place, one behind the other.
 
The first db-wide vacuum finished approximately 36 hours after it started.  At this point I was convinced that the second db-wide vacuum would run to completion with little or no work to do and all would be good.  The thing I can't explain is why a third db-wide vacuum kicked off exactly 24 hours (again) after the second db-wide vacuum kicked off (and the second vacuum still running).
 
Wouldn't the first db-wide vacuum have marked any rows that needed it with the FrozenXID?  Why would a third db-wide vacuum kick off so soon after the first db-wide vacuum had completed?  Surely there haven't been 100 million more transactions in the last two days?
 
Can someone explain what is going on here?  I can't quite figure it out based on the docs.
 
Thanks,
Steve
 

Re: Database-wide VACUUM ANALYZE

From
Heikki Linnakangas
Date:
Steven Flatt wrote:
> The bad thing, which I don't totally understand from reading the docs, is
> that another db-wide vacuum kicked off exactly 24 hours after the first
> db-wide vacuum kicked off, before the first one had finished.  (Note that
> these vacuums seem to go through the tables alphabetically.)  I managed to
> explain this to myself in that there were still rows in tables not yet
> touched by the first db-wide vacuum that could have XIDs older than
> autovacuum_freeze_max_age.  Fine, so two db-wide vacuums were now taking
> place, one behind the other.

Are you sure there's no cron job starting the vacuums? 24h sounds too
good to be a coincidence, and there's no magic constant of 24h in the
autovacuum code. Besides, autovacuum can only be running one VACUUM at a
time, so there must be something else launching them.

What's your vacuuming strategy in general, before and after upgrade?

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Database-wide VACUUM ANALYZE

From
Francisco Reyes
Date:
Steven Flatt writes:

> Can someone explain what is going on here?  I can't quite figure it out
> based on the docs.

Are you on FreeBSD by any chance?

I think the FreeBSD port by default installs a script that does a daily
vacuum. If using another OS, perhaps you want to see if you used some sort
of package system and if that package added a nightly vacuum.

Re: Database-wide VACUUM ANALYZE

From
"Steven Flatt"
Date:

On 6/21/07, Francisco Reyes <lists@stringsutils.com> wrote:
Are you on FreeBSD by any chance?

I think the FreeBSD port by default installs a script that does a daily
vacuum.

Yes, FreeBSD.  Do you know what script that is?  And it does a db-wide VACUUM ANALYZE every day?!  That is certainly not necessary, and in fact, costly for us.
 
Hmmm... I wonder why this would just start now, three days ago.  Everything seemed to be normal for the last two weeks.
 
Steve

Re: Database-wide VACUUM ANALYZE

From
Larry Rosenman
Date:
On Thu, 21 Jun 2007, Steven Flatt wrote:

> On 6/21/07, Francisco Reyes <lists@stringsutils.com> wrote:
>>
>> Are you on FreeBSD by any chance?
>>
>> I think the FreeBSD port by default installs a script that does a daily
>> vacuum.
>
>
> Yes, FreeBSD.  Do you know what script that is?  And it does a db-wide
> VACUUM ANALYZE every day?!  That is certainly not necessary, and in fact,
> costly for us.
>
> Hmmm... I wonder why this would just start now, three days ago.  Everything
> seemed to be normal for the last two weeks.
>
The current FreeBSD port places the script in:

/usr/local/etc/periodic/daily/502.pgsql

And it can be controlled from /etc/periodic.conf

See the top of that script.

LER

> Steve
>

--
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 512-248-2683                 E-Mail: ler@lerctr.org
US Mail: 430 Valona Loop, Round Rock, TX 78681-3893

Re: Database-wide VACUUM ANALYZE

From
Bill Moran
Date:
In response to "Steven Flatt" <steven.flatt@gmail.com>:

> On 6/21/07, Francisco Reyes <lists@stringsutils.com> wrote:
> >
> > Are you on FreeBSD by any chance?
> >
> > I think the FreeBSD port by default installs a script that does a daily
> > vacuum.
>
>
> Yes, FreeBSD.  Do you know what script that is?

/usr/local/etc/periodic/daily/502.pgsql

> And it does a db-wide
> VACUUM ANALYZE every day?!  That is certainly not necessary, and in fact,
> costly for us.

You can control it with knobs in /etc/periodic.conf (just like other
periodic job):
daily_pgsql_vacuum_enable="YES"
daily_pgsql_backup_enable="NO"

are the defaults.

> Hmmm... I wonder why this would just start now, three days ago.  Everything
> seemed to be normal for the last two weeks.

Someone alter /etc/periodic.conf?  Perhaps it's been running all along but
you never noticed it before now?

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

Re: Database-wide VACUUM ANALYZE

From
"Steven Flatt"
Date:
Thanks everyone.  It appears that we had hacked the 502.pgsql script for our 8.1 build to disable the daily vacuum.  I was not aware of this when building and upgrading to 8.2.
 
So it looks like for the past two weeks, that 36 hour db-wide vacuum has been running every 24 hours.  Good for it for being reasonably non-intrusive and going unnoticed until now. :)
 
Although apparently not related anymore, I still think it was a good move to change autovacuum_freeze_max_age from 200 million to 2 billion.
 
Steve
 

Re: Database-wide VACUUM ANALYZE

From
Alvaro Herrera
Date:
Steven Flatt escribió:
> Thanks everyone.  It appears that we had hacked the 502.pgsql script for our
> 8.1 build to disable the daily vacuum.  I was not aware of this when
> building and upgrading to 8.2.
>
> So it looks like for the past two weeks, that 36 hour db-wide vacuum has
> been running every 24 hours.  Good for it for being reasonably non-intrusive
> and going unnoticed until now. :)

Looks like you have plenty of spare I/O ;-)


> Although apparently not related anymore, I still think it was a good move to
> change autovacuum_freeze_max_age from 200 million to 2 billion.

Absolutely not related.  Also note that

1. autovacuum is not able (in 8.2 or older) to have more than one task
   running

2. autovacuum in 8.2 doesn't ever launch database-wide vacuums.  As of
   8.2 it only vacuums tables that are in actual danger of Xid
   wraparound (according to pg_class.relfrozenxid); tables that had been
   vacuumed the day before would not need another vacuum for Xid
   purposes (though if you had modified the table to the point that it
   needed another vacuum, that would be another matter).  Unless you
   consumed 200 million (or 2 billion) transactions during the day, that
   is.

--
Alvaro Herrera                  http://www.amazon.com/gp/registry/5ZYLFMCVHXC
"The only difference is that Saddam would kill you on private, where the
Americans will kill you in public" (Mohammad Saleh, 39, a building contractor)

Re: Database-wide VACUUM ANALYZE

From
Jim Nasby
Date:
On Jun 21, 2007, at 3:37 PM, Steven Flatt wrote:
> Thanks everyone.  It appears that we had hacked the 502.pgsql
> script for our 8.1 build to disable the daily vacuum.  I was not
> aware of this when building and upgrading to 8.2.

Much better to change stuff in a config file than to hack installed
scripts, for this very reason. :)

> So it looks like for the past two weeks, that 36 hour db-wide
> vacuum has been running every 24 hours.  Good for it for being
> reasonably non-intrusive and going unnoticed until now. :)
>
> Although apparently not related anymore, I still think it was a
> good move to change autovacuum_freeze_max_age from 200 million to 2
> billion.

If you set that to 2B, that means you're 2^31-"2 billion"-1000000
transactions away from a shutdown when autovac finally gets around to
trying to run a wraparound vacuum on a table. If you have any number
of large tables, that could be a big problem, as autovac could get
tied up on a large table for a long enough period that the table
needing to be frozen doesn't get frozen in time.

I suspect 1B is a much better setting. I probably wouldn't go past 1.5B.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)



Re: Database-wide VACUUM ANALYZE

From
"Steven Flatt"
Date:
On 6/25/07, Jim Nasby <decibel@decibel.org> wrote:
If you set that to 2B, that means you're 2^31-"2 billion"-1000000
transactions away from a shutdown when autovac finally gets around to
trying to run a wraparound vacuum on a table. If you have any number
of large tables, that could be a big problem, as autovac could get
tied up on a large table for a long enough period that the table
needing to be frozen doesn't get frozen in time.

I suspect 1B is a much better setting. I probably wouldn't go past 1.5B.
 
From my understanding of the docs, for tables that are not otherwise vacuumed, autovac will be invoked on it once every autovacuum_freeze_max_age minus vacuum_freeze_min_age transactions.  In our case that's 2 billion - 100 million = 1.9 billion transactions.  So when an autovac finally kicks off on an otherwise non-vacuumed table, we are (2^31 - 1.9 billion) - 1 million =~ 250 million transactions away from shutdown.  (I guess that's close to what you were saying.)
 
Most of our large (partitioned) tables are insert-only (truncated eventually) so will not be touched by autovacuum until wraparound prevention kicks in.  However the tables are partitioned by timestamp so tables will cross the 1.9 billion marker at different times (some not at all, as the data will have been truncated).
 
Do you still think the 250 million transactions away from shutdown is cutting it too close?  Recall that the unintentional db-wide vacuum analyze that was going on last week on our system took less than two days to complete.
 
Steve
 

Re: Database-wide VACUUM ANALYZE

From
Alvaro Herrera
Date:
Steven Flatt escribió:

> Most of our large (partitioned) tables are insert-only (truncated
> eventually) so will not be touched by autovacuum until wraparound prevention
> kicks in.  However the tables are partitioned by timestamp so tables will
> cross the 1.9 billion marker at different times (some not at all, as the
> data will have been truncated).

Note that as of 8.3, tables that are truncated do not need vacuuming for
Xid wraparound purposes, because the counter is updated on TRUNCATE (as
it is on CLUSTER and certain forms of ALTER TABLE).

> Do you still think the 250 million transactions away from shutdown is
> cutting it too close?  Recall that the unintentional db-wide vacuum analyze
> that was going on last week on our system took less than two days to
> complete.

Is this 8.1 or 8.2?  In the latter you don't ever need db-wide vacuums
at all, because Xid wraparound is tracked per table, so only tables
actually needing vacuum are processed.  To answer your question, the
followup question is how many transactions normally take place in two
days.  If they are way less than 250 million then you don't need to
worry.  Otherwise, the database may shut itself down to protect from Xid
wraparound.

--
Alvaro Herrera                         http://www.flickr.com/photos/alvherre/
"La soledad es compañía"