Re: Template zero xid issue - Mailing list pgsql-general
From | Keaton Adams |
---|---|
Subject | Re: Template zero xid issue |
Date | |
Msg-id | 0B34A6972BF39E4CB465A64DBBAD2BB902261E3A@mxlhq-exch01.corp.mxlogic.com Whole thread Raw |
In response to | Re: Template zero xid issue (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Template zero xid issue
|
List | pgsql-general |
Auto vacuum is not enabled: #----------------------------------------------------------------------- ---- # AUTOVACUUM PARAMETERS #----------------------------------------------------------------------- ---- #autovacuum = off # enable autovacuum subprocess? #autovacuum_naptime = 60 # time between autovacuum runs, in secs #autovacuum_vacuum_threshold = 1000 # min # of tuple updates before # vacuum #autovacuum_analyze_threshold = 500 # min # of tuple updates before # analyze #autovacuum_vacuum_scale_factor = 0.4 # fraction of rel size before # vacuum #autovacuum_analyze_scale_factor = 0.2 # fraction of rel size before # analyze #autovacuum_vacuum_cost_delay = -1 # default vacuum cost delay for # autovac, -1 means use # vacuum_cost_delay #autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for # autovac, -1 means use # vacuum_cost_limit Instead, the vacuumdb command is run on the DBs to perform a full DB vacuum every Saturday. The DB template0 is skipped by the vacuumdb process, according to the output: /xyz/var/pgsql/bin/vacuumdb -a -z -U postgres vacuumdb: vacuuming database "postgres" VACUUM vacuumdb: vacuuming database "mxl" VACUUM vacuumdb: vacuuming database "template1" VACUUM 1. Ops has a job that monitors datfrozenxid. An event was triggered this morning because the count on template0 was approaching the limit. I don't have the original output, but it looked something like this: mxl=# select datname, age(datfrozenxid) from pg_database; datname | age --------------+------------ postgres | 1478038975 mxl_survey | 1478038263 mxl | 1478036889 template1 | 1478010730 template0 | 2146747335 2. As we were attempting to figure out why this was the case, Postgres stopped accepting commands, as reported in the log: <2007-08-06 09:42:50 MDT>ERROR: database is not accepting commands to avoid wraparound data loss in database "template0" <2007-08-06 09:42:50 MDT>HINT: Stop the postmaster and use a standalone backend to vacuum database "template0". 3. One of the admins initiated a shutdown request to the DB. The shutdown was NOT automatically done by Postgres as I had originally stated, but it became necessary nonetheless: <2007-08-06 09:44:05 MDT>LOG: received fast shutdown request <2007-08-06 09:44:05 MDT>LOG: aborting any active transactions <2007-08-06 09:44:05 MDT>FATAL: terminating connection due to administrator command <2007-08-06 09:44:05 MDT>FATAL: terminating connection due to administrator command <2007-08-06 09:44:05 MDT>FATAL: terminating connection due to administrator command 4. During the shutdown we continued to receive these messages in the log: <2007-08-06 09:44:08 MDT>LOG: transaction ID wrap limit is 2147484146, limited by database "template0" <2007-08-06 09:44:08 MDT>WARNING: database "template0" must be vacuumed within 736811 transactions <2007-08-06 09:44:08 MDT>HINT: To avoid a database shutdown, execute a full-database VACUUM in "template0". 5. Once the DB was down, a standalone backend was used to execute a vacuum against template0: sudo -u postgres /mxl/var/pgsql/bin/postgres -D /mxl/var/pgsql/data -O -P template0 >VACUUM VERBOSE ANALYZE; 6. The DB started successfully and normal operations resumed. We do have a second 8.1.4 DB running the same type of transaction load and the datfrozenxid query shows template0 having a larger 'age' value than any of the other databases in the cluster: postgres=# select datname, age(datfrozenxid) from pg_database; datname | age --------------+------------ postgres | 1108184564 mxl | 1108182478 template1 | 1108156319 template0 | 1497605405 So before it grows into the 2.1 billion value we'll need to do a manual reset of the FrozenXID on this DB as well. Of course, we are looking into an 8.1.9 upgrade to avoid this problem in the future. Thanks, Keaton -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Monday, August 06, 2007 1:11 PM To: Keaton Adams Cc: Joshua D. Drake; Alvaro Herrera; pgsql-general@postgresql.org Subject: Re: [GENERAL] Template zero xid issue "Keaton Adams" <kadams@mxlogic.com> writes: > Do I have this right? It's not clear. You didn't answer the question about whether you were running autovac, but even if you were I'm not sure why you got the complaint about template0. AFAICS the only way for template0 to be installed as oldest_datname is if vac_truncate_clog() is invoked in template0, allowing it to be the initial selection before the loop starts. Which could happen if autovac is on, since it will try to vacuum template0, and in 8.1.4 it neglects to use the FREEZE option. But even then, if we just finished a DB-wide vacuum on template0, it should not have had the oldest datfrozenxid; and certainly not one old enough to provoke complaints. Actually there's a whole lot that isn't clear about this. Your first report showed a warning --- not an error --- about template0, so it's not at all clear that the "database system is shutting down" messages had anything to do with that at all. And you later posted some queries that weren't drawing any such message. What was the exact sequence of events here? Alvaro: you have any thoughts about this? I'm suspicious that we ought to change 8.1's vac_truncate_clog() to not include the current database automatically, since it might be not-datallowconn; but I don't see how that potential bug could actually be exposed as a fault. regards, tom lane
pgsql-general by date: