Thread: Running vacuumdb -a taking too long
We are to a point in size and utilization of a set of our Postgres 8.1.17 databases that a vacuumdb -a -z -U postgres is still running after 50 hours and we have to kill it off because the additional i/o is causing queries to stack up. We have archived off as much data as possible (100+ GB) and we are looking at moving to a better hardware configuration to support the increased system activity, but for now I need to know if there is a way to avoid "transaction ID wraparound" if we don't issue a vacuumdb -a to update the datfrozenxid value.
This is a logging type database where data is initially loaded and then "rolled up" into weekly/monthly tables. Many of the tables in the DB don't last for more than seven days, and after a week the majority of the historical tables are static and stay around until they are finally dropped off (based on a retention period), so re-running a vacuum on these older static tables doesn't really gain much since the data never changes.
I have read (and re-read) the documentation and am a bit confused on exactly what needs to happen here:
Since periodic VACUUM runs are needed anyway for the reasons described earlier, it's unlikely that any table would not be vacuumed for as long as a billion transactions. But to help administrators ensure this constraint is met, VACUUM stores transaction ID statistics in the system table pg_database. In particular, the datfrozenxid column of a database's pg_database row is updated at the completion of any database-wide VACUUM operation (i.e., VACUUM that does not name a specific table). The value stored in this field is the freeze cutoff XID that was used by that VACUUM command. All normal XIDs older than this cutoff XID are guaranteed to have been replaced by FrozenXID within that database. A convenient way to examine this information is to execute the query
SELECT datname, age(datfrozenxid) FROM pg_database;
The age column measures the number of transactions from the cutoff XID to the current transaction's XID.
So if, after a table is no longer added to (becomes static), I run a VACUUM against it, the table wouldn't need to be vacuumed again since the tuples and their related transaction ID's never change? Is there a way to set up a vacuum scheme on a table-by-table basis to accomplish the same goal as an all-in-one vacuumdb -a run that I can spread out over time instead of relying on the completion of a single vacuumdb -a command?
This is a logging type database where data is initially loaded and then "rolled up" into weekly/monthly tables. Many of the tables in the DB don't last for more than seven days, and after a week the majority of the historical tables are static and stay around until they are finally dropped off (based on a retention period), so re-running a vacuum on these older static tables doesn't really gain much since the data never changes.
I have read (and re-read) the documentation and am a bit confused on exactly what needs to happen here:
Since periodic VACUUM runs are needed anyway for the reasons described earlier, it's unlikely that any table would not be vacuumed for as long as a billion transactions. But to help administrators ensure this constraint is met, VACUUM stores transaction ID statistics in the system table pg_database. In particular, the datfrozenxid column of a database's pg_database row is updated at the completion of any database-wide VACUUM operation (i.e., VACUUM that does not name a specific table). The value stored in this field is the freeze cutoff XID that was used by that VACUUM command. All normal XIDs older than this cutoff XID are guaranteed to have been replaced by FrozenXID within that database. A convenient way to examine this information is to execute the query
SELECT datname, age(datfrozenxid) FROM pg_database;
The age column measures the number of transactions from the cutoff XID to the current transaction's XID.
So if, after a table is no longer added to (becomes static), I run a VACUUM against it, the table wouldn't need to be vacuumed again since the tuples and their related transaction ID's never change? Is there a way to set up a vacuum scheme on a table-by-table basis to accomplish the same goal as an all-in-one vacuumdb -a run that I can spread out over time instead of relying on the completion of a single vacuumdb -a command?
In response to Keaton Adams <kadams@mxlogic.com>: > We are to a point in size and utilization of a set of our Postgres 8.1.17 databases that a vacuumdb -a -z -U postgres isstill running after 50 hours and we have to kill it off because the additional i/o is causing queries to stack up. Wehave archived off as much data as possible (100+ GB) and we are looking at moving to a better hardware configuration tosupport the increased system activity, but for now I need to know if there is a way to avoid "transaction ID wraparound"if we don't issue a vacuumdb -a to update the datfrozenxid value. > > This is a logging type database where data is initially loaded and then "rolled up" into weekly/monthly tables. Many ofthe tables in the DB don't last for more than seven days, and after a week the majority of the historical tables are staticand stay around until they are finally dropped off (based on a retention period), so re-running a vacuum on these olderstatic tables doesn't really gain much since the data never changes. > > I have read (and re-read) the documentation and am a bit confused on exactly what needs to happen here: > > Since periodic VACUUM runs are needed anyway for the reasons described earlier, it's unlikely that any table would notbe vacuumed for as long as a billion transactions. But to help administrators ensure this constraint is met, VACUUM storestransaction ID statistics in the system table pg_database. In particular, the datfrozenxid column of a database's pg_databaserow is updated at the completion of any database-wide VACUUM operation (i.e., VACUUM that does not name a specifictable). The value stored in this field is the freeze cutoff XID that was used by that VACUUM command. All normalXIDs older than this cutoff XID are guaranteed to have been replaced by FrozenXID within that database. A convenientway to examine this information is to execute the query > > SELECT datname, age(datfrozenxid) FROM pg_database; > > The age column measures the number of transactions from the cutoff XID to the current transaction's XID. > > So if, after a table is no longer added to (becomes static), I run a VACUUM against it, the table wouldn't need to be vacuumedagain since the tuples and their related transaction ID's never change? Is there a way to set up a vacuum schemeon a table-by-table basis to accomplish the same goal as an all-in-one vacuumdb -a run that I can spread out over timeinstead of relying on the completion of a single vacuumdb -a command? I'm not quite sure what the best answer is to your problem, but since nobody else has suggested anything, here are my thoughts. First off, can you allocate more maintenance_work_mem? Perhaps that can speed up vacuum enough. Secondly, if you VACUUM FREEZE those static tables, it will guarantee that you'll never lose data from them, even if you hit XID wraparound. If you VACUUM FREEZE tables one at a time, perhaps you can avoid the huge performance hit. Third, while your argument about tables not needing vacuumed again seems logical, it's simply not how PG functions. Since the XID is database-wide, it may affect any table. Of course, the FREEZE process will protect tables from this. This seems to be improved in newer versions of Postgres, so an upgrade should improve the issue. Finally, are you really in danger of hitting the wraparound? If you run the query "SELECT datname, age(datfrozenxid) FROM pg_database;" (as suggested in the docs) once a day for a few days, does it seems like you're using up XIDs fast enough to be a danger? If you've got new hardware coming soon anyway, perhaps you have enough time to now worry about it on the current hardware? Hope this helps. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
We are upgrading to 8.3.7 in September, if that helps the situation at all.
I just want to make sure I thoroughly understand how these two statements work together:
“The price is this maintenance requirement: every table in the database must be vacuumed at least once every billion transactions.”
“For each database in the cluster, PostgreSQL keeps track of the time of the last database-wide VACUUM. When any database approaches the billion-transaction danger level, the system begins to emit warning messages. If nothing is done, it will eventually shut down normal operations until appropriate manual maintenance is done.”
Let’s say I have two sets of tables in the production DB, one set is always around and the other set rolls forward in time using CREATE/DROP table (_YYYYMMDD), with the oldest table being 14 days old.
I never issue the actual “vacuumdb –a” command against the entire cluster, but I do a DB wide “VACUUM;” against databases such as postgres and template1 once a week.
I issue a VACUUM <table> once a day against the tables that always exist in the database (they do receive inserts/updates/deletes daily) because they are small enough to vacuum without a big performance hit. This includes everything that is shown as a real “table” in pg_tables except those that match the filter of ‘%_YYYYMMDD’ to avoid the rolling table set.
The large day tables (rolling forward in time) I never vacuum because I just COPY into them, and if there is a failed COPY operation or two I don’t really care about the wasted space, due to the fact that the table has a limited lifespan anyway. So these tables are always rolling forward and being dropped in a 14 day window.
So my questions are:
Thanks again for your response.
On 7/27/09 1:41 PM, "Bill Moran" <wmoran@potentialtech.com> wrote:
I just want to make sure I thoroughly understand how these two statements work together:
“The price is this maintenance requirement: every table in the database must be vacuumed at least once every billion transactions.”
“For each database in the cluster, PostgreSQL keeps track of the time of the last database-wide VACUUM. When any database approaches the billion-transaction danger level, the system begins to emit warning messages. If nothing is done, it will eventually shut down normal operations until appropriate manual maintenance is done.”
Let’s say I have two sets of tables in the production DB, one set is always around and the other set rolls forward in time using CREATE/DROP table (_YYYYMMDD), with the oldest table being 14 days old.
I never issue the actual “vacuumdb –a” command against the entire cluster, but I do a DB wide “VACUUM;” against databases such as postgres and template1 once a week.
I issue a VACUUM <table> once a day against the tables that always exist in the database (they do receive inserts/updates/deletes daily) because they are small enough to vacuum without a big performance hit. This includes everything that is shown as a real “table” in pg_tables except those that match the filter of ‘%_YYYYMMDD’ to avoid the rolling table set.
The large day tables (rolling forward in time) I never vacuum because I just COPY into them, and if there is a failed COPY operation or two I don’t really care about the wasted space, due to the fact that the table has a limited lifespan anyway. So these tables are always rolling forward and being dropped in a 14 day window.
So my questions are:
- Will the Postgres cluster eventually shut down because I never do a true “database-wide VACUUM” using a vacuumdb –a command on all of the databases, even though I vacuum the tables in the production database that have a lifespan of greater than 14 days?
- Would I ever be at risk of losing data in a table that is only around for a two week (14 day period) if I never do this “database wide VACUUM” on the actual production DB?
Thanks again for your response.
On 7/27/09 1:41 PM, "Bill Moran" <wmoran@potentialtech.com> wrote:
In response to Keaton Adams <kadams@mxlogic.com>:
> We are to a point in size and utilization of a set of our Postgres 8.1.17 databases that a vacuumdb -a -z -U postgres is still running after 50 hours and we have to kill it off because the additional i/o is causing queries to stack up. We have archived off as much data as possible (100+ GB) and we are looking at moving to a better hardware configuration to support the increased system activity, but for now I need to know if there is a way to avoid "transaction ID wraparound" if we don't issue a vacuumdb -a to update the datfrozenxid value.
>
> This is a logging type database where data is initially loaded and then "rolled up" into weekly/monthly tables. Many of the tables in the DB don't last for more than seven days, and after a week the majority of the historical tables are static and stay around until they are finally dropped off (based on a retention period), so re-running a vacuum on these older static tables doesn't really gain much since the data never changes.
>
> I have read (and re-read) the documentation and am a bit confused on exactly what needs to happen here:
>
> Since periodic VACUUM runs are needed anyway for the reasons described earlier, it's unlikely that any table would not be vacuumed for as long as a billion transactions. But to help administrators ensure this constraint is met, VACUUM stores transaction ID statistics in the system table pg_database. In particular, the datfrozenxid column of a database's pg_database row is updated at the completion of any database-wide VACUUM operation (i.e., VACUUM that does not name a specific table). The value stored in this field is the freeze cutoff XID that was used by that VACUUM command. All normal XIDs older than this cutoff XID are guaranteed to have been replaced by FrozenXID within that database. A convenient way to examine this information is to execute the query
>
> SELECT datname, age(datfrozenxid) FROM pg_database;
>
> The age column measures the number of transactions from the cutoff XID to the current transaction's XID.
>
> So if, after a table is no longer added to (becomes static), I run a VACUUM against it, the table wouldn't need to be vacuumed again since the tuples and their related transaction ID's never change? Is there a way to set up a vacuum scheme on a table-by-table basis to accomplish the same goal as an all-in-one vacuumdb -a run that I can spread out over time instead of relying on the completion of a single vacuumdb -a command?
I'm not quite sure what the best answer is to your problem, but since
nobody else has suggested anything, here are my thoughts.
First off, can you allocate more maintenance_work_mem? Perhaps that can
speed up vacuum enough.
Secondly, if you VACUUM FREEZE those static tables, it will guarantee
that you'll never lose data from them, even if you hit XID wraparound.
If you VACUUM FREEZE tables one at a time, perhaps you can avoid the
huge performance hit.
Third, while your argument about tables not needing vacuumed again seems
logical, it's simply not how PG functions. Since the XID is database-wide,
it may affect any table. Of course, the FREEZE process will protect tables
from this. This seems to be improved in newer versions of Postgres, so
an upgrade should improve the issue.
Finally, are you really in danger of hitting the wraparound? If you run
the query "SELECT datname, age(datfrozenxid) FROM pg_database;" (as suggested
in the docs) once a day for a few days, does it seems like you're using
up XIDs fast enough to be a danger? If you've got new hardware coming
soon anyway, perhaps you have enough time to now worry about it on the
current hardware?
Hope this helps.
--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/
On Mon, Jul 27, 2009 at 02:21:02PM -0600, Keaton Adams wrote: > We are upgrading to 8.3.7 in September, if that helps the situation at all. This is good. Since 8.2 VACUUM age is done per table instead of per database. This should solve most of your problems. > So my questions are: > > 1. Will the Postgres cluster eventually shut down because I never > do a true "database-wide VACUUM" using a vacuumdb -a command on all > of the databases, even though I vacuum the tables in the production > database that have a lifespan of greater than 14 days? On older versions you need to do a database-wide vacuum (note this is not vacuumdb -a) once every billion transaction. Did you take the advice in the email you responded to with respect to speeding up vacuum? And using > > SELECT datname, age(datfrozenxid) FROM pg_database; to determine if it's an actual problem (just post the results if you can't interpret them). > 2. Would I ever be at risk of losing data in a table that is only > around for a two week (14 day period) if I never do this "database > wide VACUUM" on the actual production DB? You won't lose data, but you need to do a DB wide (not cluster-wide) vacuum to advance the wraparound counter... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Attachment
> This is good. Since 8.2 VACUUM age is done per table instead of per > database. This should solve most of your problems. > On older versions you need to do a database-wide vacuum (note this is > not vacuumdb -a) once every billion transactions. > You won't lose data, but you need to do a DB wide (not cluster-wide) > vacuum to advance the wraparound counter... That answered the question exactly as I needed to hear it. > Did you take the advice in the email you responded to with respect to > speeding up vacuum? Great suggestion and one that I will try out over the next couple of days. > And using > to determine if it's an actual problem (just post the results if you > can't interpret them). postgres=# SELECT datname, age(datfrozenxid) FROM pg_database; datname | age ------------+------------ postgres | 1073741878 listenerdb | 1074114794 template1 | 1073908727 template0 | 30121699 (4 rows) Thanks again. On 7/27/09 3:10 PM, "Martijn van Oosterhout" <kleptog@svana.org> wrote: > On Mon, Jul 27, 2009 at 02:21:02PM -0600, Keaton Adams wrote: >> We are upgrading to 8.3.7 in September, if that helps the situation at all. > > This is good. Since 8.2 VACUUM age is done per table instead of per > database. This should solve most of your problems. > >> So my questions are: >> >> 1. Will the Postgres cluster eventually shut down because I never >> do a true "database-wide VACUUM" using a vacuumdb -a command on all >> of the databases, even though I vacuum the tables in the production >> database that have a lifespan of greater than 14 days? > > On older versions you need to do a database-wide vacuum (note this is > not vacuumdb -a) once every billion transaction. > > Did you take the advice in the email you responded to with respect to > speeding up vacuum? And using > >>> SELECT datname, age(datfrozenxid) FROM pg_database; > > to determine if it's an actual problem (just post the results if you > can't interpret them). > >> 2. Would I ever be at risk of losing data in a table that is only >> around for a two week (14 day period) if I never do this "database >> wide VACUUM" on the actual production DB? > > You won't lose data, but you need to do a DB wide (not cluster-wide) > vacuum to advance the wraparound counter... > > Have a nice day,