Thread: Re: Transaction wraparound problem with database postgres

Re: Transaction wraparound problem with database postgres

From

"Markus Wollny"

Date:

21 March 2008, 21:08:26

Hi!

Just some more info, hoping that it helps with a diagnosis:

1: datname     (typeid = 19, len = 64, typmod = -1, byval = f)
         2: age (typeid = 23, len = 4, typmod = -1, byval = t)
         3: datfrozenxid        (typeid = 28, len = 4, typmod = -1, byval = t)
        ----
         1: datname = "postgres"        (typeid = 19, len = 64, typmod = -1, byval = f)
         2: age = "-2147321465" (typeid = 23, len = 4, typmod = -1, byval = t)
         3: datfrozenxid = "1835116837" (typeid = 28, len = 4, typmod = -1, byval = t)
        ----
Then I issue a vacuum:

1: datname     (typeid = 19, len = 64, typmod = -1, byval = f)
         2: age (typeid = 23, len = 4, typmod = -1, byval = t)
         3: datfrozenxid        (typeid = 28, len = 4, typmod = -1, byval = t)
        ----
         1: datname = "postgres"        (typeid = 19, len = 64, typmod = -1, byval = f)
         2: age = "-2147321383" (typeid = 23, len = 4, typmod = -1, byval = t)
         3: datfrozenxid = "1835116837" (typeid = 28, len = 4, typmod = -1, byval = t)

It worries me, that 'age' is negative.

Kind regards

Markus

Von: Markus Wollny
Gesendet: Fr 21.03.2008 21:50
An: pgsql-general@postgresql.org
Betreff: Transaction wraparound problem with database postgres

Hi!

My database cluster has just stopped working. I get the following message:

psql: FATAL: Datenbank nimmt keine Befehle an, um Datenverlust in Datenbank »postgres« wegen Transaktionsnummernüberlauf zu vermeiden
TIP: Halten Sie den Postmaster an und verwenden Sie ein Standalone-Backend, um VACUUM in der Datenbank »postgres« auszuführen.

I did as suggested, stopped tzhe postmaster and started a single backend on database 'postgres'; I issued a "VACCUM" there.

su postgres -c "/opt/pgsql/bin/postgres --single -D /var/lib/pgsql/data postgres"

backend> VACUUM

However, this doesn't seem to help - I receive lots and lots of messages like this:

<2008-03-21 21:43:27 CET - 11845: @>WARNUNG: Datenbank »postgres« muss innerhalb von 4294805194 Transaktionen gevacuumt werden
<2008-03-21 21:43:27 CET - 11845: @>TIPP: Um ein Abschalten der Datenbank zu vermeiden, führen Sie in »postgres« ein VACUUM über die komplette Datenbank aus.

i.e. "database 'postgres' must be vacuumed within 4294805194 transactions."

That's what I just did, but the problem persists. Whenever I issue a 'vacuum', the number of transactions simply decreases.

This is PostgreSQL 8.2.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)

I am absolutely lost about what to do now - and it's a puzzle how this could have happened in the first place. I have configured autovaccum AND I run a vacuum verbose analyze over all databases every single night. What do I do now? Is there some alternative to reinit and going back to the last dump?

Urgent help would be very much appreciated.

Kind regards

Markus

Computec Media AG

Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818)

Vorstandsmitglieder: Johannes S. Gözalan (Vorsitzender) und Niels Herrmann

Vorsitzender des Aufsichtsrates: Jürg Marquard

Umsatzsteuer-Identifikationsnummer: DE 812 575 276

Transaction wraparound problem with database postgres

From

"Markus Wollny"

Date:

21 March 2008, 21:08:26

Hi!

My database cluster has just stopped working. I get the following message:

I did as suggested, stopped tzhe postmaster and started a single backend on database 'postgres'; I issued a "VACCUM" there.

su postgres -c "/opt/pgsql/bin/postgres --single -D /var/lib/pgsql/data postgres"

backend> VACUUM

However, this doesn't seem to help - I receive lots and lots of messages like this:

i.e. "database 'postgres' must be vacuumed within 4294805194 transactions."

That's what I just did, but the problem persists. Whenever I issue a 'vacuum', the number of transactions simply decreases.

This is PostgreSQL 8.2.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)

Urgent help would be very much appreciated.

Kind regards

Markus

Computec Media AG

Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818)

Vorstandsmitglieder: Johannes S. Gözalan (Vorsitzender) und Niels Herrmann

Vorsitzender des Aufsichtsrates: Jürg Marquard

Umsatzsteuer-Identifikationsnummer: DE 812 575 276

Re: Transaction wraparound problem with database postgres

From

"Joshua D. Drake"

Date:

21 March 2008, 21:21:20

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, 21 Mar 2008 21:50:57 +0100
"Markus Wollny" <Markus.Wollny@computec.de> wrote:

> That's what I just did, but the problem persists. Whenever I issue a
> 'vacuum', the number of transactions simply decreases. 
> 
> This is PostgreSQL 8.2.4 on x86_64-unknown-linux-gnu, compiled by GCC
> gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)
> 
> I am absolutely lost about what to do now - and it's a puzzle how
> this could have happened in the first place. I have configured
> autovaccum AND I run a vacuum verbose analyze over all databases
> every single night. What do I do now? Is there some alternative to
> reinit and going back to the last dump?
> 
> Urgent help would be very much appreciated. 
> 

Vacuum every database. (template1,postgres too). This could happen if
you have long running transactions that are not allow VACUUM to
actually work.

Sincerely,

Joshua D. Drake


> Kind regards
> 
>    Markus
> 
> 
> 
> Computec Media AG
> Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818)
> Vorstandsmitglieder: Johannes S. Gözalan (Vorsitzender) und Niels
> Herrmann Vorsitzender des Aufsichtsrates: Jürg Marquard 
> Umsatzsteuer-Identifikationsnummer: DE 812 575 276
> 
> 


- -- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFH5CcAATb/zqfZUUQRAl7CAJ9QGYJVqNVfHFgjVyCBswp1+d8kJgCfe+io
5d28sM4Gw4OkSBh/+U4jMDI=
=iBx9
-----END PGP SIGNATURE-----

Re: Transaction wraparound problem with database postgres

From

Andreas 'ads' Scherbaum

Date:

21 March 2008, 21:32:19

Hello,

On Fri, 21 Mar 2008 21:50:57 +0100 Markus Wollny wrote:

> My database cluster has just stopped working. I get the following message:
> psql: FATAL:  Datenbank nimmt keine Befehle an, um Datenverlust in Datenbank »postgres« wegen
Transaktionsnummernüberlaufzu vermeiden 
> TIP:  Halten Sie den Postmaster an und verwenden Sie ein Standalone-Backend, um VACUUM in der Datenbank »postgres«
auszuführen.

First of all, it would help you and most of the readers on this list, if
you have the error messages in english. There is a german mailinglist
too, if you want to ask in german.

> That's what I just did, but the problem persists. Whenever I issue a 'vacuum', the number of transactions simply
decreases. 

vacuum all databases, add the VERBOSE option to see, what actually
happens.

> I am absolutely lost about what to do now - and it's a puzzle how this could have happened in the first place. I have
configuredautovaccum AND I run a vacuum verbose analyze over all databases every single night. What do I do now? Is
theresome alternative to reinit and going back to the last dump? 

Are you using the database 'postgres' at all? And are you sure, that
you include all databases? Any error messages in the vacuum output?

Oh, and by the way: why do you have autovacuum and a manual vacuum run
every night plus the vacuum run with verbose?

> Urgent help would be very much appreciated.

That's a bit late here ;-)

Kind regards

--
                Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors

Re: Transaction wraparound problem with database postgres

From

Tom Lane

Date:

21 March 2008, 21:44:52

"Markus Wollny" <Markus.Wollny@computec.de> writes:
> Just some more info, hoping that it helps with a diagnosis:

>  1: datname     (typeid = 19, len = 64, typmod = -1, byval = f)
>          2: age (typeid = 23, len = 4, typmod = -1, byval = t)
>          3: datfrozenxid        (typeid = 28, len = 4, typmod = -1, byval = t)
>         ----
>          1: datname = "postgres"        (typeid = 19, len = 64, typmod = -1, byval = f)
>          2: age = "-2147321465" (typeid = 23, len = 4, typmod = -1, byval = t)
>          3: datfrozenxid = "1835116837" (typeid = 28, len = 4, typmod = -1, byval = t)

What are the datfrozenxid's of the other rows in pg_database?
Do the other fields of postgres' row look sane?

            regards, tom lane

Re: Transaction wraparound problem with database postgres

From

"Markus Wollny"

Date:

21 March 2008, 21:46:56

Hi!

Sorry for the quick updates to my own messages, but I didn't want to lean back and wait - so I took to more aggressive measures. All my other databases in this cluster are fine - and the 'postgres' database doesn't seem to do anything really useful except being the default database. I dropped it and recreated it with template1 as template, afterwards I could start up my cluster with no problems whatsoever. I'd still like to find out what exactly happened here so I can prevent the same from happening again in the future. The age(datfrozenxid) is positive again:

# SELECT datname, age(datfrozenxid), datfrozenxid FROM pg_database where datname='postgres';
datname | age | datfrozenxid
----------+-----------+--------------
postgres | 100291695 | 3882762765
(1 Zeile)

As I mentioned earlier, I'm running autovaccuum and use a nightly cron to run vacuum verbose analyze over all my databases. So lack of vacuum cannot be the issue, I think. But what else could have happened here? I regularly scan my logs, and there was no early warning for this issue.

The first event of this type in the server log was from today:

<2008-03-21 17:08:48 CET - 32161: xxx.xxx.xxx.xxx(52833)@magazine>WARNUNG: Datenbank »postgres« muss innerhalb von 11000000 Transaktionen gevacuumt werden
<2008-03-21 17:08:48 CET - 32161: xxx.xxx.xxx.xxx(52833)@magazine>TIPP: Um ein Abschalten der Datenbank zu vermeiden, führen Sie in »postgres« ein VACUUM über die komplette Datenbank aus.

(i.e. database 'postgres' need to be vacuumed within 11000000 transactions...)

A mere three hours later, the server already refused any further requests:

<2008-03-21 20:05:21 CET - 25184: xxx.xxx.xxx.xxx(60837)@magazine>FEHLER: Datenbank nimmt keine Befehle an, um Datenverlust in Datenbank »postgres« wegen Transaktionsnummernüberlauf zu vermeiden

(ie. database no longer accepts any commands in order to prevent data loss in database 'postgres' because of transaction id wraparound)

Now that the adrenaline level has dropped to normal, I'd still like to know what exactly has happened here; The cluster has been initdb'ed on 2007-04-27.

Kind regards

Markus

Computec Media AG

Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818)

Vorstandsmitglieder: Johannes S. Gözalan (Vorsitzender) und Niels Herrmann

Vorsitzender des Aufsichtsrates: Jürg Marquard

Umsatzsteuer-Identifikationsnummer: DE 812 575 276

Re: Transaction wraparound problem with database postgres

From

"Markus Wollny"

Date:

21 March 2008, 21:57:02

Hi!

Thanks for all the quick replies :)

Tom Lane wrote:
> "Markus Wollny" <Markus.Wollny@computec.de> writes:
>> Just some more info, hoping that it helps with a diagnosis:
>
>>  1: datname     (typeid = 19, len = 64, typmod = -1, byval = f)
>>          2: age (typeid = 23, len = 4, typmod = -1, byval = t)
>>          3: datfrozenxid        (typeid = 28, len = 4, typmod = -1,
>>          byval = t)         ---- 1: datname = "postgres"
>>          (typeid = 19, len = 64, typmod = -1, byval = f) 2: age =
>>          "-2147321465" (typeid = 23, len = 4, typmod = -1, byval =
>> t) 3: datfrozenxid = "1835116837" (typeid = 28, len = 4, typmod =
>> -1, byval = t)
>
> What are the datfrozenxid's of the other rows in pg_database?
> Do the other fields of postgres' row look sane?

Yes, there were no issues on any of the databases that are actually in use:

# select datname, age(datfrozenxid), datfrozenxid from pg_database;
  datname   |    age    | datfrozenxid
------------+-----------+--------------
 rpfcms     | 104213725 |   3881601233
 rpfflash   | 147289015 |   3838525943
 postgres   | 103052193 |   3882762765
 template1  | 104213787 |   3881601171
 template0  |   3052193 |   3982762765
 ezpublish  | 147419044 |   3838395914
 community  | 147566532 |   3838248426
 abo        | 147689637 |   3838125321
 bluebox    | 147679271 |   3838135687
 cbox       | 147582662 |   3838232296
 mpo        | 147309716 |   3838505242
 newsletter | 147309110 |   3838505848
 pcaction   | 147297707 |   3838517251
 pcgames    | 147291588 |   3838523370
 magazine   | 147419044 |   3838395914

Only the 'postgres' db was affected - which is puzzling because we don't actually use this database actively for
anything.

Kind regards

  Markus


Computec Media AG
Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818)
Vorstandsmitglieder: Johannes S. Gözalan (Vorsitzender) und Niels Herrmann
Vorsitzender des Aufsichtsrates: Jürg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276

Re: Transaction wraparound problem with database postgres

From

Tom Lane

Date:

21 March 2008, 22:22:34

"Markus Wollny" <Markus.Wollny@computec.de> writes:
> Sorry for the quick updates to my own messages, but I didn't want to
> lean back and wait - so I took to more aggressive measures. All my
> other databases in this cluster are fine - and the 'postgres' database
> doesn't seem to do anything really useful except being the default
> database. I dropped it and recreated it with template1 as template,
> afterwards I could start up my cluster with no problems
> whatsoever.

Yeah, if there were no other problems apparent in pg_database I was
going to suggest that as a recovery method.

> I'd still like to find out what exactly happened here so I
> can prevent the same from happening again in the future.

Me too.  It would seem that something did a vacuum of postgres
with a strange choice of xid cutoff, but I can't think of what
would cause that.

Do you ever do VACUUM FREEZE on your databases?

            regards, tom lane

Re: Transaction wraparound problem with database postgres

From

"Markus Wollny"

Date:

21 March 2008, 22:30:38

Andreas 'ads' Scherbaum wrote:
> Hello,
> First of all, it would help you and most of the readers on this list,
> if you have the error messages in english. There is a german
> mailinglist too, if you want to ask in german.

Sorry, I tried to describe the issue as best as I could and included the actual log entries only for completeness, but
wasin too much of a hurry to find the correct translations. 

> vacuum all databases, add the VERBOSE option to see, what actually
> happens.

Alas, too late, I got rid of the offending 'postgres' database already by dropping and recreating.

> Are you using the database 'postgres' at all?

No, not at all. Didn't touch it ever after initdb.

> And are you sure, that you include all databases?

Yes. I run the following every night:

su postgres -c '/opt/pgsql/bin/psql -t -c "select datname from pg_database order by datname;" template1 | xargs -n 1
/opt/pgsql/bin/psql-q -c "vacuum verbose analyze;"' 

> Any error messages in the vacuum output?

None.

> Oh, and by the way: why do you have autovacuum and a manual vacuum
> run every night plus the vacuum run with verbose?

Paranoia, mostly, I think. I'm using PostgreSQL since long before autovacuum was introduced and always thought that it
couldn'tdo any harm to keep my original vacuum job running once every night, even though autovacuum does a remarkable
job,especially for a couple of busy tables where the nightly vacuum was not quite enough. Plus, having the verbose
outputfrom the log, I get useful info for setting the 'max_fsm_pages'/'max_fsm_relations'-options to sensible values.
Isit a problem to have cron'ed VACUUM-runs in parallel with autovacuum? 

>> Urgent help would be very much appreciated.
>
> That's a bit late here ;-)

Ah, well obviously it wasn't - it's always an extremely pleasant surprise when one is actually in dire need of help and
getsan almost immediate and helpful response. 

I wish you all happy Easter!

Kind regards

   Markus

Computec Media AG
Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818)
Vorstandsmitglieder: Johannes S. Gözalan (Vorsitzender) und Niels Herrmann
Vorsitzender des Aufsichtsrates: Jürg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276

Re: Transaction wraparound problem with database postgres

From

"Markus Wollny"

Date:

21 March 2008, 22:34:28

Tom Lane wrote:
> "Markus Wollny" <Markus.Wollny@computec.de> writes:
>> I'd still like to find out what exactly happened here so I can
>> prevent the same from happening again in the future.
>
> Me too.  It would seem that something did a vacuum of postgres with a
> strange choice of xid cutoff, but I can't think of what would cause
> that.
>
> Do you ever do VACUUM FREEZE on your databases?

No, I actually never heard of VACUUM FREEZE, I have to admit.

Computec Media AG
Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818)
Vorstandsmitglieder: Johannes S. Gözalan (Vorsitzender) und Niels Herrmann
Vorsitzender des Aufsichtsrates: Jürg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276

Re: Transaction wraparound problem with database postgres

From

"Markus Wollny"

Date:

08 March 2010, 00:08:45

Hi!

After going several months without such incidents, we now got bit by the same problem again. We have since upgraded the
hardwarewe ran the database cluster on and currently use version 8.3.7. The general outline of the problem hasn't
changedmuch though - we still don't use the database 'postgres' except for one lone pgagent-job which has only been
configureda couple of weeks back and we do a nightly vacuum over all databases in addition the the running of the
autovacuumdaemon. As I expect that this might hit again in a couple of months: Any suggestions on what sort of forensic
datamight be required to actually find out the root of what's causing it? As I needed to get the cluster back up and
runningagain, I used the same remedy as last time and simply dropped the database and recreated it from template1, so
there'snot much left to be looked into right now, but if I knew what kind of data to retain I mit be able to come up
withsome more useful info next time... 

Kind regards

   Markus

> -----Ursprüngliche Nachricht-----
> Von: Markus Wollny
> Gesendet: Freitag, 21. März 2008 23:34
> An: 'Tom Lane'
> Cc: pgsql-general@postgresql.org
> Betreff: AW: [GENERAL] Transaction wraparound problem with
> database postgres
>
> Tom Lane wrote:
> > "Markus Wollny" <Markus.Wollny@computec.de> writes:
> >> I'd still like to find out what exactly happened here so I can
> >> prevent the same from happening again in the future.
> >
> > Me too.  It would seem that something did a vacuum of
> postgres with a
> > strange choice of xid cutoff, but I can't think of what would cause
> > that.
> >
> > Do you ever do VACUUM FREEZE on your databases?
>
> No, I actually never heard of VACUUM FREEZE, I have to admit.
>

Computec Media AG
Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818)
Vorstandsmitglieder: Albrecht Hengstenberg (Vorsitzender) und Rainer Rosenbusch
Vorsitzender des Aufsichtsrates: Jürg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276

Re: Transaction wraparound problem with database postgres

From

Scott Marlowe

Date:

08 March 2010, 01:26:45

On Sun, Mar 7, 2010 at 6:06 PM, Markus Wollny <Markus.Wollny@computec.de> wrote:
> Hi!
>
> After going several months without such incidents, we now got bit by the same problem again. We have since upgraded
thehardware we ran the database cluster on and currently use version 8.3.7. The general outline of the problem hasn't
changedmuch though - we still don't use the database 'postgres' except for one lone pgagent-job which has only been
configureda couple of weeks back and we do a nightly vacuum over all databases in addition the the running of the
autovacuumdaemon. As I expect that this might hit again in a couple of months: Any suggestions on what sort of forensic
datamight be required to actually find out the root of what's causing it? As I needed to get the cluster back up and
runningagain, I used the same remedy as last time and simply dropped the database and recreated it from template1, so
there'snot much left to be looked into right now, but if I knew what kind of data to retain I mit be able to come up
withsome more useful info next time... 

Do your logs show any kind of error when vacuuming about "only owner
can vacuum" a table or anything?

Re: Transaction wraparound problem with database postgres

From

"Markus Wollny"

Date:

08 March 2010, 07:27:14

Hi!

> From: Scott Marlowe [mailto:scott.marlowe@gmail.com]

> Do your logs show any kind of error when vacuuming about
> "only owner can vacuum" a table or anything?

I grepped through the logs from the last four days and, no, there were
none such errors whatsoever. Last vacuum analyze run returned the
following:

INFO:  free space map contains 1974573 pages in 9980 relations
DETAIL:  A total of 2043408 page slots are in use (including overhead).
2043408 page slots are required to track all free space.
Current limits are:  2100000 page slots, 10000 relations, using 13376
kB.

I have since increased these limits by 50% as we've come quite close to
what was configured. But as they hadn't been reached yet anyway, so I
don't think we did have any sort of apparent problem with the running of
vaccuum as such.

Kind regards

   Markus


Computec Media AG
Sitz der Gesellschaft und Registergericht: Furth (HRB 8818)
Vorstandsmitglieder: Albrecht Hengstenberg (Vorsitzender) und Rainer Rosenbusch
Vorsitzender des Aufsichtsrates: Jurg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276

Re: Transaction wraparound problem with database postgres

From

Scott Marlowe

Date:

08 March 2010, 07:36:22

On Sun, Mar 7, 2010 at 6:06 PM, Markus Wollny <Markus.Wollny@computec.de> wrote:
> Hi!
>
> After going several months without such incidents, we now got bit by the same problem again. We have since upgraded
thehardware we ran the database cluster on and currently use version 8.3.7. The general outline of the problem hasn't
changedmuch though 

I'm not sure I'm entirely sure exactly what the problem was and now
again is.  Could you fill me in on that?