Thread: Re: Transaction wraparound problem with database postgres
2: age (typeid = 23, len = 4, typmod = -1, byval = t)
3: datfrozenxid (typeid = 28, len = 4, typmod = -1, byval = t)
----
1: datname = "postgres" (typeid = 19, len = 64, typmod = -1, byval = f)
2: age = "-2147321465" (typeid = 23, len = 4, typmod = -1, byval = t)
3: datfrozenxid = "1835116837" (typeid = 28, len = 4, typmod = -1, byval = t)
----
Then I issue a vacuum:
2: age (typeid = 23, len = 4, typmod = -1, byval = t)
3: datfrozenxid (typeid = 28, len = 4, typmod = -1, byval = t)
----
1: datname = "postgres" (typeid = 19, len = 64, typmod = -1, byval = f)
2: age = "-2147321383" (typeid = 23, len = 4, typmod = -1, byval = t)
3: datfrozenxid = "1835116837" (typeid = 28, len = 4, typmod = -1, byval = t)
Von: Markus Wollny
Gesendet: Fr 21.03.2008 21:50
An: pgsql-general@postgresql.org
Betreff: Transaction wraparound problem with database postgres
TIP: Halten Sie den Postmaster an und verwenden Sie ein Standalone-Backend, um VACUUM in der Datenbank »postgres« auszuführen.
backend> VACUUM
However, this doesn't seem to help - I receive lots and lots of messages like this:
<2008-03-21 21:43:27 CET - 11845: @>WARNUNG: Datenbank »postgres« muss innerhalb von 4294805194 Transaktionen gevacuumt werden
<2008-03-21 21:43:27 CET - 11845: @>TIPP: Um ein Abschalten der Datenbank zu vermeiden, führen Sie in »postgres« ein VACUUM über die komplette Datenbank aus.
i.e. "database 'postgres' must be vacuumed within 4294805194 transactions."
That's what I just did, but the problem persists. Whenever I issue a 'vacuum', the number of transactions simply decreases.
This is PostgreSQL 8.2.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)
I am absolutely lost about what to do now - and it's a puzzle how this could have happened in the first place. I have configured autovaccum AND I run a vacuum verbose analyze over all databases every single night. What do I do now? Is there some alternative to reinit and going back to the last dump?
Urgent help would be very much appreciated.
Kind regards
Markus
Computec Media AG
Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818)
Vorstandsmitglieder: Johannes S. Gözalan (Vorsitzender) und Niels Herrmann
Vorsitzender des Aufsichtsrates: Jürg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276
TIP: Halten Sie den Postmaster an und verwenden Sie ein Standalone-Backend, um VACUUM in der Datenbank »postgres« auszuführen.
backend> VACUUM
However, this doesn't seem to help - I receive lots and lots of messages like this:
<2008-03-21 21:43:27 CET - 11845: @>WARNUNG: Datenbank »postgres« muss innerhalb von 4294805194 Transaktionen gevacuumt werden
<2008-03-21 21:43:27 CET - 11845: @>TIPP: Um ein Abschalten der Datenbank zu vermeiden, führen Sie in »postgres« ein VACUUM über die komplette Datenbank aus.
i.e. "database 'postgres' must be vacuumed within 4294805194 transactions."
That's what I just did, but the problem persists. Whenever I issue a 'vacuum', the number of transactions simply decreases.
This is PostgreSQL 8.2.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)
I am absolutely lost about what to do now - and it's a puzzle how this could have happened in the first place. I have configured autovaccum AND I run a vacuum verbose analyze over all databases every single night. What do I do now? Is there some alternative to reinit and going back to the last dump?
Urgent help would be very much appreciated.
Kind regards
Markus
Computec Media AG
Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818)
Vorstandsmitglieder: Johannes S. Gözalan (Vorsitzender) und Niels Herrmann
Vorsitzender des Aufsichtsrates: Jürg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Fri, 21 Mar 2008 21:50:57 +0100 "Markus Wollny" <Markus.Wollny@computec.de> wrote: > That's what I just did, but the problem persists. Whenever I issue a > 'vacuum', the number of transactions simply decreases. > > This is PostgreSQL 8.2.4 on x86_64-unknown-linux-gnu, compiled by GCC > gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21) > > I am absolutely lost about what to do now - and it's a puzzle how > this could have happened in the first place. I have configured > autovaccum AND I run a vacuum verbose analyze over all databases > every single night. What do I do now? Is there some alternative to > reinit and going back to the last dump? > > Urgent help would be very much appreciated. > Vacuum every database. (template1,postgres too). This could happen if you have long running transactions that are not allow VACUUM to actually work. Sincerely, Joshua D. Drake > Kind regards > > Markus > > > > Computec Media AG > Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818) > Vorstandsmitglieder: Johannes S. Gözalan (Vorsitzender) und Niels > Herrmann Vorsitzender des Aufsichtsrates: Jürg Marquard > Umsatzsteuer-Identifikationsnummer: DE 812 575 276 > > - -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFH5CcAATb/zqfZUUQRAl7CAJ9QGYJVqNVfHFgjVyCBswp1+d8kJgCfe+io 5d28sM4Gw4OkSBh/+U4jMDI= =iBx9 -----END PGP SIGNATURE-----
Hello, On Fri, 21 Mar 2008 21:50:57 +0100 Markus Wollny wrote: > My database cluster has just stopped working. I get the following message: > psql: FATAL: Datenbank nimmt keine Befehle an, um Datenverlust in Datenbank »postgres« wegen Transaktionsnummernüberlaufzu vermeiden > TIP: Halten Sie den Postmaster an und verwenden Sie ein Standalone-Backend, um VACUUM in der Datenbank »postgres« auszuführen. First of all, it would help you and most of the readers on this list, if you have the error messages in english. There is a german mailinglist too, if you want to ask in german. > That's what I just did, but the problem persists. Whenever I issue a 'vacuum', the number of transactions simply decreases. vacuum all databases, add the VERBOSE option to see, what actually happens. > I am absolutely lost about what to do now - and it's a puzzle how this could have happened in the first place. I have configuredautovaccum AND I run a vacuum verbose analyze over all databases every single night. What do I do now? Is theresome alternative to reinit and going back to the last dump? Are you using the database 'postgres' at all? And are you sure, that you include all databases? Any error messages in the vacuum output? Oh, and by the way: why do you have autovacuum and a manual vacuum run every night plus the vacuum run with verbose? > Urgent help would be very much appreciated. That's a bit late here ;-) Kind regards -- Andreas 'ads' Scherbaum German PostgreSQL User Group European PostgreSQL User Group - Board of Directors
"Markus Wollny" <Markus.Wollny@computec.de> writes: > Just some more info, hoping that it helps with a diagnosis: > 1: datname (typeid = 19, len = 64, typmod = -1, byval = f) > 2: age (typeid = 23, len = 4, typmod = -1, byval = t) > 3: datfrozenxid (typeid = 28, len = 4, typmod = -1, byval = t) > ---- > 1: datname = "postgres" (typeid = 19, len = 64, typmod = -1, byval = f) > 2: age = "-2147321465" (typeid = 23, len = 4, typmod = -1, byval = t) > 3: datfrozenxid = "1835116837" (typeid = 28, len = 4, typmod = -1, byval = t) What are the datfrozenxid's of the other rows in pg_database? Do the other fields of postgres' row look sane? regards, tom lane
datname | age | datfrozenxid
----------+-----------+--------------
postgres | 100291695 | 3882762765
(1 Zeile)
<2008-03-21 17:08:48 CET - 32161: xxx.xxx.xxx.xxx(52833)@magazine>TIPP: Um ein Abschalten der Datenbank zu vermeiden, führen Sie in »postgres« ein VACUUM über die komplette Datenbank aus.
Computec Media AG
Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818)
Vorstandsmitglieder: Johannes S. Gözalan (Vorsitzender) und Niels Herrmann
Vorsitzender des Aufsichtsrates: Jürg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276
Hi! Thanks for all the quick replies :) Tom Lane wrote: > "Markus Wollny" <Markus.Wollny@computec.de> writes: >> Just some more info, hoping that it helps with a diagnosis: > >> 1: datname (typeid = 19, len = 64, typmod = -1, byval = f) >> 2: age (typeid = 23, len = 4, typmod = -1, byval = t) >> 3: datfrozenxid (typeid = 28, len = 4, typmod = -1, >> byval = t) ---- 1: datname = "postgres" >> (typeid = 19, len = 64, typmod = -1, byval = f) 2: age = >> "-2147321465" (typeid = 23, len = 4, typmod = -1, byval = >> t) 3: datfrozenxid = "1835116837" (typeid = 28, len = 4, typmod = >> -1, byval = t) > > What are the datfrozenxid's of the other rows in pg_database? > Do the other fields of postgres' row look sane? Yes, there were no issues on any of the databases that are actually in use: # select datname, age(datfrozenxid), datfrozenxid from pg_database; datname | age | datfrozenxid ------------+-----------+-------------- rpfcms | 104213725 | 3881601233 rpfflash | 147289015 | 3838525943 postgres | 103052193 | 3882762765 template1 | 104213787 | 3881601171 template0 | 3052193 | 3982762765 ezpublish | 147419044 | 3838395914 community | 147566532 | 3838248426 abo | 147689637 | 3838125321 bluebox | 147679271 | 3838135687 cbox | 147582662 | 3838232296 mpo | 147309716 | 3838505242 newsletter | 147309110 | 3838505848 pcaction | 147297707 | 3838517251 pcgames | 147291588 | 3838523370 magazine | 147419044 | 3838395914 Only the 'postgres' db was affected - which is puzzling because we don't actually use this database actively for anything. Kind regards Markus Computec Media AG Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818) Vorstandsmitglieder: Johannes S. Gözalan (Vorsitzender) und Niels Herrmann Vorsitzender des Aufsichtsrates: Jürg Marquard Umsatzsteuer-Identifikationsnummer: DE 812 575 276
"Markus Wollny" <Markus.Wollny@computec.de> writes: > Sorry for the quick updates to my own messages, but I didn't want to > lean back and wait - so I took to more aggressive measures. All my > other databases in this cluster are fine - and the 'postgres' database > doesn't seem to do anything really useful except being the default > database. I dropped it and recreated it with template1 as template, > afterwards I could start up my cluster with no problems > whatsoever. Yeah, if there were no other problems apparent in pg_database I was going to suggest that as a recovery method. > I'd still like to find out what exactly happened here so I > can prevent the same from happening again in the future. Me too. It would seem that something did a vacuum of postgres with a strange choice of xid cutoff, but I can't think of what would cause that. Do you ever do VACUUM FREEZE on your databases? regards, tom lane
Andreas 'ads' Scherbaum wrote: > Hello, > First of all, it would help you and most of the readers on this list, > if you have the error messages in english. There is a german > mailinglist too, if you want to ask in german. Sorry, I tried to describe the issue as best as I could and included the actual log entries only for completeness, but wasin too much of a hurry to find the correct translations. > vacuum all databases, add the VERBOSE option to see, what actually > happens. Alas, too late, I got rid of the offending 'postgres' database already by dropping and recreating. > Are you using the database 'postgres' at all? No, not at all. Didn't touch it ever after initdb. > And are you sure, that you include all databases? Yes. I run the following every night: su postgres -c '/opt/pgsql/bin/psql -t -c "select datname from pg_database order by datname;" template1 | xargs -n 1 /opt/pgsql/bin/psql-q -c "vacuum verbose analyze;"' > Any error messages in the vacuum output? None. > Oh, and by the way: why do you have autovacuum and a manual vacuum > run every night plus the vacuum run with verbose? Paranoia, mostly, I think. I'm using PostgreSQL since long before autovacuum was introduced and always thought that it couldn'tdo any harm to keep my original vacuum job running once every night, even though autovacuum does a remarkable job,especially for a couple of busy tables where the nightly vacuum was not quite enough. Plus, having the verbose outputfrom the log, I get useful info for setting the 'max_fsm_pages'/'max_fsm_relations'-options to sensible values. Isit a problem to have cron'ed VACUUM-runs in parallel with autovacuum? >> Urgent help would be very much appreciated. > > That's a bit late here ;-) Ah, well obviously it wasn't - it's always an extremely pleasant surprise when one is actually in dire need of help and getsan almost immediate and helpful response. I wish you all happy Easter! Kind regards Markus Computec Media AG Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818) Vorstandsmitglieder: Johannes S. Gözalan (Vorsitzender) und Niels Herrmann Vorsitzender des Aufsichtsrates: Jürg Marquard Umsatzsteuer-Identifikationsnummer: DE 812 575 276
Tom Lane wrote: > "Markus Wollny" <Markus.Wollny@computec.de> writes: >> I'd still like to find out what exactly happened here so I can >> prevent the same from happening again in the future. > > Me too. It would seem that something did a vacuum of postgres with a > strange choice of xid cutoff, but I can't think of what would cause > that. > > Do you ever do VACUUM FREEZE on your databases? No, I actually never heard of VACUUM FREEZE, I have to admit. Computec Media AG Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818) Vorstandsmitglieder: Johannes S. Gözalan (Vorsitzender) und Niels Herrmann Vorsitzender des Aufsichtsrates: Jürg Marquard Umsatzsteuer-Identifikationsnummer: DE 812 575 276
Hi! After going several months without such incidents, we now got bit by the same problem again. We have since upgraded the hardwarewe ran the database cluster on and currently use version 8.3.7. The general outline of the problem hasn't changedmuch though - we still don't use the database 'postgres' except for one lone pgagent-job which has only been configureda couple of weeks back and we do a nightly vacuum over all databases in addition the the running of the autovacuumdaemon. As I expect that this might hit again in a couple of months: Any suggestions on what sort of forensic datamight be required to actually find out the root of what's causing it? As I needed to get the cluster back up and runningagain, I used the same remedy as last time and simply dropped the database and recreated it from template1, so there'snot much left to be looked into right now, but if I knew what kind of data to retain I mit be able to come up withsome more useful info next time... Kind regards Markus > -----Ursprüngliche Nachricht----- > Von: Markus Wollny > Gesendet: Freitag, 21. März 2008 23:34 > An: 'Tom Lane' > Cc: pgsql-general@postgresql.org > Betreff: AW: [GENERAL] Transaction wraparound problem with > database postgres > > Tom Lane wrote: > > "Markus Wollny" <Markus.Wollny@computec.de> writes: > >> I'd still like to find out what exactly happened here so I can > >> prevent the same from happening again in the future. > > > > Me too. It would seem that something did a vacuum of > postgres with a > > strange choice of xid cutoff, but I can't think of what would cause > > that. > > > > Do you ever do VACUUM FREEZE on your databases? > > No, I actually never heard of VACUUM FREEZE, I have to admit. > Computec Media AG Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818) Vorstandsmitglieder: Albrecht Hengstenberg (Vorsitzender) und Rainer Rosenbusch Vorsitzender des Aufsichtsrates: Jürg Marquard Umsatzsteuer-Identifikationsnummer: DE 812 575 276
On Sun, Mar 7, 2010 at 6:06 PM, Markus Wollny <Markus.Wollny@computec.de> wrote: > Hi! > > After going several months without such incidents, we now got bit by the same problem again. We have since upgraded thehardware we ran the database cluster on and currently use version 8.3.7. The general outline of the problem hasn't changedmuch though - we still don't use the database 'postgres' except for one lone pgagent-job which has only been configureda couple of weeks back and we do a nightly vacuum over all databases in addition the the running of the autovacuumdaemon. As I expect that this might hit again in a couple of months: Any suggestions on what sort of forensic datamight be required to actually find out the root of what's causing it? As I needed to get the cluster back up and runningagain, I used the same remedy as last time and simply dropped the database and recreated it from template1, so there'snot much left to be looked into right now, but if I knew what kind of data to retain I mit be able to come up withsome more useful info next time... Do your logs show any kind of error when vacuuming about "only owner can vacuum" a table or anything?
Hi! > From: Scott Marlowe [mailto:scott.marlowe@gmail.com] > Do your logs show any kind of error when vacuuming about > "only owner can vacuum" a table or anything? I grepped through the logs from the last four days and, no, there were none such errors whatsoever. Last vacuum analyze run returned the following: INFO: free space map contains 1974573 pages in 9980 relations DETAIL: A total of 2043408 page slots are in use (including overhead). 2043408 page slots are required to track all free space. Current limits are: 2100000 page slots, 10000 relations, using 13376 kB. I have since increased these limits by 50% as we've come quite close to what was configured. But as they hadn't been reached yet anyway, so I don't think we did have any sort of apparent problem with the running of vaccuum as such. Kind regards Markus Computec Media AG Sitz der Gesellschaft und Registergericht: Furth (HRB 8818) Vorstandsmitglieder: Albrecht Hengstenberg (Vorsitzender) und Rainer Rosenbusch Vorsitzender des Aufsichtsrates: Jurg Marquard Umsatzsteuer-Identifikationsnummer: DE 812 575 276
On Sun, Mar 7, 2010 at 6:06 PM, Markus Wollny <Markus.Wollny@computec.de> wrote: > Hi! > > After going several months without such incidents, we now got bit by the same problem again. We have since upgraded thehardware we ran the database cluster on and currently use version 8.3.7. The general outline of the problem hasn't changedmuch though I'm not sure I'm entirely sure exactly what the problem was and now again is. Could you fill me in on that?