Thread: Cannot use a standalone backend to VACUUM in "postgres""

Cannot use a standalone backend to VACUUM in "postgres""

From
Manuel Sugawara
Date:
We have a PostgreSQL 8.2.6 installation running for about six-months
now. There was a lot of log entries saying (sometimes 10 or more in
just one second):

WARNING: oldest xmin is far in the past
HINT: Close open transactions soon to avoid wraparound problems.

(actually it was in Spanish but I think that's irrelevant).

The funny thing is that there was no open transactions, even after
restarting the cluster the same message was logged. Today, the
database stopped working as expected:

ERROR: database is shut down to avoid wraparound data loss in database "postgres"
HINT: Stop the postmaster and use a standalone backend to VACUUM in "postgres"

So, the postmaster was stopped to follow the hint but even in
stand-alone mode postgres keeps saying:

WARNING:  database "postgres" must be vacuumed within 999805 transactions
HINT:  To avoid a database shutdown, execute a full-database VACUUM in "postgres".
WARNING: oldest xmin is far in the past
HINT: Close open transactions soon to avoid wraparound problems.

Every time vacuum is run the number decreases by one but after a few
runs I still cannot access the cluster :-(. (My plan was to take a
pg_dumpall and then re-init the cluster.)

Attached is the output of:

  SELECT relname, age(relfrozenxid) FROM pg_class WHERE relkind = 'r';
  SELECT datname, age(datfrozenxid) FROM pg_database;

pg_controldata says:

pg_control version number:            822
Catalog version number:               200611241
Database system identifier:           5040396405114363383
Database cluster state:               in production
pg_control last modified:             Mon 07 Apr 2008 09:22:19 PM CDT
Current log file ID:                  33
Next log file segment:                91
Latest checkpoint location:           21/5A8EC824
Prior checkpoint location:            21/5A8C8CDC
Latest checkpoint's REDO location:    21/5A8EC824
Latest checkpoint's UNDO location:    0/0
Latest checkpoint's TimeLineID:       1
Latest checkpoint's NextXID:          0/2280224912
Latest checkpoint's NextOID:          103405
Latest checkpoint's NextMultiXactId:  64513935
Latest checkpoint's NextMultiOffset:  154155767
Time of latest checkpoint:            Mon 07 Apr 2008 09:20:54 PM CDT
Minimum recovery ending location:     0/0
Maximum data alignment:               4
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Date/time type storage:               floating-point numbers
Maximum length of locale name:        128
LC_COLLATE:                           es_MX.ISO-8859-1
LC_CTYPE:                             es_MX.ISO-8859-1

Please let me know if there is more information needed.

Regards,
Manuel.


Attachment

Re: Cannot use a standalone backend to VACUUM in "postgres""

From
Alvaro Herrera
Date:
Manuel Sugawara wrote:

Hi Manuel,

> The funny thing is that there was no open transactions, even after
> restarting the cluster the same message was logged. Today, the
> database stopped working as expected:
>
> ERROR: database is shut down to avoid wraparound data loss in database "postgres"
> HINT: Stop the postmaster and use a standalone backend to VACUUM in "postgres"

I suggest you look for temp tables that have not been reclaimed.  We've
had a couple of reports where leftover temp tables have stopped the
frozen-xid counter from advancing.  (They would have a very old
relfrozenxid.)

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Cannot use a standalone backend to VACUUM in "postgres""

From
Manuel Sugawara
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:

> Hi Manuel,

Hi Alvaro!

> I suggest you look for temp tables that have not been reclaimed.
> We've had a couple of reports where leftover temp tables have
> stopped the frozen-xid counter from advancing.  (They would have a
> very old relfrozenxid.)

Thank you very much for the suggestion. Any pointers on how to do
that? A quick serch in google didn't show anything relevant.

Regards,
Manuel.

Re: Cannot use a standalone backend to VACUUM in "postgres""

From
Manuel Sugawara
Date:
Manuel Sugawara <masm@fciencias.unam.mx> writes:

> Alvaro Herrera <alvherre@commandprompt.com> writes:
>
>> Hi Manuel,
>
> Hi Alvaro!
>
>> I suggest you look for temp tables that have not been reclaimed.
>> We've had a couple of reports where leftover temp tables have
>> stopped the frozen-xid counter from advancing.  (They would have a
>> very old relfrozenxid.)
>
> Thank you very much for the suggestion. Any pointers on how to do
> that? A quick serch in google didn't show anything relevant.

Will look into pg_class, of course. Somehow I was thinking something
else. Thanks again.

Regards,
Manuel.

Re: Cannot use a standalone backend to VACUUM in "postgres""

From
Manuel Sugawara
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:

> Manuel Sugawara wrote:
>
> Hi Manuel,
>
>> The funny thing is that there was no open transactions, even after
>> restarting the cluster the same message was logged. Today, the
>> database stopped working as expected:
>>
>> ERROR: database is shut down to avoid wraparound data loss in database "postgres"
>> HINT: Stop the postmaster and use a standalone backend to VACUUM in "postgres"
>
> I suggest you look for temp tables that have not been reclaimed.  We've
> had a couple of reports where leftover temp tables have stopped the
> frozen-xid counter from advancing.  (They would have a very old
> relfrozenxid.)

In each database executed:

  select relname, age(relfrozenxid) from pg_class where relnamespace = '10406'::oid;

(note that 10406 is the oid of the pg_temp_1 namespace) none of them
showed temp tables, Is this the correct way?, Any other idea?.

Regards,
Manuel.

Re: Cannot use a standalone backend to VACUUM in "postgres""

From
Alvaro Herrera
Date:
Manuel Sugawara wrote:

> In each database executed:
>
>   select relname, age(relfrozenxid) from pg_class where relnamespace = '10406'::oid;
>
> (note that 10406 is the oid of the pg_temp_1 namespace) none of them
> showed temp tables, Is this the correct way?, Any other idea?.

Hmm, nope -- take away the relnamespace check, because there is a
different namespace for each backend (pg_temp_2, pg_temp_3, etc).  And
as far as I've seen, most leftover temp tables are on "high" temp
schemas (i.e. those belonging to backends that are only used when the
load is high).

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Cannot use a standalone backend to VACUUM in "postgres""

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Hmm, nope -- take away the relnamespace check, because there is a
> different namespace for each backend (pg_temp_2, pg_temp_3, etc).  And
> as far as I've seen, most leftover temp tables are on "high" temp
> schemas (i.e. those belonging to backends that are only used when the
> load is high).

Yeah, because the low-numbered ones get recycled first.  To have a temp
table survive for long enough to create this problem, it's pretty much
got to be in a high-numbered temp schema.

            regards, tom lane

Re: Cannot use a standalone backend to VACUUM in "postgres""

From
Manuel Sugawara
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:

> Hmm, nope -- take away the relnamespace check, because there is a
> different namespace for each backend (pg_temp_2, pg_temp_3, etc).

Still no luck, changed the query to:

  select relname, age(relfrozenxid) from pg_class join pg_namespace n on (n.oid = relnamespace) where nspname ~
'^pg_temp'

and no temp tables showed in any database :-(. Any other idea?.

Regards,
Manuel.


Re: Cannot use a standalone backend to VACUUM in "postgres""

From
Alvaro Herrera
Date:
Manuel Sugawara wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>
> > Hmm, nope -- take away the relnamespace check, because there is a
> > different namespace for each backend (pg_temp_2, pg_temp_3, etc).
>
> Still no luck, changed the query to:
>
>   select relname, age(relfrozenxid) from pg_class join pg_namespace n on (n.oid = relnamespace) where nspname ~
'^pg_temp'
>
> and no temp tables showed in any database :-(. Any other idea?.

None :-(  Is there any table with a large age value, regardless of
temp-ness?  Does the age of the oldest table correspond to the age of
pg_database.datfrozenxid?  The interesting database is the one with the
largest age(pg_database.datfrozenxid).

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Cannot use a standalone backend to VACUUM in "postgres""

From
Alvaro Herrera
Date:
Manuel Sugawara wrote:

Going back to your first message I see that I missed something
important:

> 1: relname = "pg_tablespace"    (typeid = 19, len = 64, typmod = -1, byval = f)
> 2: age = "2146484675"    (typeid = 23, len = 4, typmod = -1, byval = t)

> 1: relname = "pg_pltemplate"    (typeid = 19, len = 64, typmod = -1, byval = f)
> 2: age = "2146484675"    (typeid = 23, len = 4, typmod = -1, byval = t)

> 1: relname = "pg_shdepend"    (typeid = 19, len = 64, typmod = -1, byval = f)
> 2: age = "2146484675"    (typeid = 23, len = 4, typmod = -1, byval = t)

> 1: relname = "pg_shdescription"    (typeid = 19, len = 64, typmod = -1, byval = f)
> 2: age = "2146484675"    (typeid = 23, len = 4, typmod = -1, byval = t)

> 1: relname = "pg_database"    (typeid = 19, len = 64, typmod = -1, byval = f)
> 2: age = "2146484675"    (typeid = 23, len = 4, typmod = -1, byval = t)

> 1: relname = "pg_auth_members"    (typeid = 19, len = 64, typmod = -1, byval = f)
> 2: age = "2146484675"    (typeid = 23, len = 4, typmod = -1, byval = t)

These are all shared catalogs AFAIR.  Have you vacuumed those?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Cannot use a standalone backend to VACUUM in "postgres""

From
Manuel Sugawara
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:

> Manuel Sugawara wrote:
>> Alvaro Herrera <alvherre@commandprompt.com> writes:
>>
>> > Hmm, nope -- take away the relnamespace check, because there is a
>> > different namespace for each backend (pg_temp_2, pg_temp_3, etc).
>>
>> Still no luck, changed the query to:
>>
>>   select relname, age(relfrozenxid) from pg_class join pg_namespace n on (n.oid = relnamespace) where nspname ~
'^pg_temp'
>>
>> and no temp tables showed in any database :-(. Any other idea?.
>
> None :-(  Is there any table with a large age value, regardless of
> temp-ness?  Does the age of the oldest table correspond to the age of
> pg_database.datfrozenxid?  The interesting database is the one with the
> largest age(pg_database.datfrozenxid).

A friend is doing the queries right now but it takes time :-(, mean
time I was wondering if it will be safe to apply the following patch
just to get the database up and be able to run pg_dumpall:

*** postgresql-8.2.6/src/backend/access/transam/varsup.c~       2006-11-05 16:42:07.000000000 -0600
--- postgresql-8.2.6/src/backend/access/transam/varsup.c        2008-04-08 18:34:51.000000000 -0500
***************
*** 225,231 ****
         * vacuuming requires one transaction per table cleaned, we had better be
         * sure there's lots of XIDs left...)
         */
!       xidStopLimit = xidWrapLimit - 1000000;
        if (xidStopLimit < FirstNormalTransactionId)
                xidStopLimit -= FirstNormalTransactionId;

--- 225,231 ----
         * vacuuming requires one transaction per table cleaned, we had better be
         * sure there's lots of XIDs left...)
         */
!       xidStopLimit = xidWrapLimit - 500000;
        if (xidStopLimit < FirstNormalTransactionId)
                xidStopLimit -= FirstNormalTransactionId;


Regards,
Manuel.

Re: Cannot use a standalone backend to VACUUM in "postgres""

From
Alvaro Herrera
Date:
Manuel Sugawara wrote:

> A friend is doing the queries right now but it takes time :-(, mean
> time I was wondering if it will be safe to apply the following patch
> just to get the database up and be able to run pg_dumpall:

Yes, it is safe.  Just make sure to get a copy of the database out in
500k transactions ...

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Cannot use a standalone backend to VACUUM in "postgres""

From
Manuel Sugawara
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:

> Manuel Sugawara wrote:
>
>> A friend is doing the queries right now but it takes time :-(, mean
>> time I was wondering if it will be safe to apply the following patch
>> just to get the database up and be able to run pg_dumpall:
>
> Yes, it is safe.  Just make sure to get a copy of the database out in
> 500k transactions ...

Ok, will do that. Thanks a lot Alvaro. Want me to send more details to
debug the problem?.

Regards,
Manuel.

Re: Cannot use a standalone backend to VACUUM in "postgres""

From
Alvaro Herrera
Date:
Manuel Sugawara wrote:

> Ok, will do that. Thanks a lot Alvaro. Want me to send more details to
> debug the problem?.

Yes, it would be good to know why the shared catalogs were not being
vacuumed, if you can find that out.  I would have guessed that they
weren't being vacuumed due to the fact that they can only be vacuumed by
a superuser, but in a standalone backend you are always superuser.  So
if you ran a database-wide vacuum, they should have been processed.

What happens if you try "vacuum pg_database", etc?  Does the
age(relfrozenxid) change?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.