Thread: Loss of table structure on 7.3.19

Loss of table structure on 7.3.19

From
Nigel Metheringham
Date:
Hi,

I have a database thats been running in production use since 2006 on a
Centos 4.7 (originally an earlier 4 release, updated incrementally).
The pg version is somewhat ancient as we have stuck with the system
postgres - currently postgresql-7.4.19-1.el4_6.1.

Yesterday it all fell apart with all queries/updates into it having
issues.  A check showed that many of the tables had lost their
definitions - for example the task_log table now consisted on a single
timestamp field rather than the selection of fields that would
normally be there.

Fortunately we have a recent backup and could restore the db with the
loss of only a couple of hours of data.

So now the post-mortum.

Suspiciously, in the pg_clog directory there were 2049 files (0000 to
0800) - the time the failure happened coincided with the modification
time on file 07FF.   I see in the documentation (http://www.postgresql.org/docs/7.3/static/routine-vacuuming.html
) warnings regarding XID rollover at 1 billion - since it appears clog
datastructures are 2 bits per XID then 2048 files of 256KB would be
1,048,576 which is close enough to 1 billion for me, and the effects
are close to those described....

However we do have a regular vacuuming process - every day each table
is VACUUM ANALYZE-ed (as well as an index rebuild).

I've experimented this morning, and nothing I can do with the VACUUM
command (FULL, against a table or the whole db etc) appears to make
the number of CLOG files reduce.  If I understand their function
correctly there is no reason for them to hang around and multiply
unless there are long-running open transactions (which there are not)
- and surely if these were just open transaction related then a
database shutdown/restart ought to scrub them.

So the questions are:-
   - Is this plethora of (active) CLOG files likely to be reason for
the problem?
   - Or am I barking up entirely the wrong tree?
   - How can I avoid this in the future?
   - Is there any other information I should be digging out (I don't
have much
     in the way of postgres logs)?

I've also got a Centos 5.x box with a similar installation but
postgres version 8.1.11 on it.  This also appears to have an
increasing number of pg_clog files, and a full vacuum is not stopping
this...

Cheers
    Nigel.

--
[ Nigel Metheringham             Nigel.Metheringham@InTechnology.com ]
[ - Comments in this message are my own and not ITO opinion/policy - ]


Re: Loss of table structure on 7.3.19

From
Tom Lane
Date:
Nigel Metheringham <nigel.metheringham@dev.intechnology.co.uk> writes:
> I have a database thats been running in production use since 2006 on a
> Centos 4.7 (originally an earlier 4 release, updated incrementally).
> The pg version is somewhat ancient as we have stuck with the system
> postgres - currently postgresql-7.4.19-1.el4_6.1.

> Yesterday it all fell apart with all queries/updates into it having
> issues.  A check showed that many of the tables had lost their
> definitions - for example the task_log table now consisted on a single
> timestamp field rather than the selection of fields that would
> normally be there.

You hit transaction ID wraparound.  There are automatic defenses against
this in 8.1 and up, but in 7.4 it's all on the DBA's head to vacuum
everything often enough.  See
http://www.postgresql.org/docs/7.4/static/maintenance.html#VACUUM-FOR-WRAPAROUND

> However we do have a regular vacuuming process - every day each table
> is VACUUM ANALYZE-ed (as well as an index rebuild).

The symptoms indicate pretty strongly that you forgot about vacuuming
the system catalogs.  A plain "VACUUM" executed in every database, by
a superuser, is sufficient for this.  Trying to be smart by vacuuming
only what you think needs vacuumed is not sufficient.

            regards, tom lane

Re: Loss of table structure on 7.3.19

From
Nigel Metheringham
Date:
On 17 Sep 2009, at 14:57, Tom Lane wrote:

> The symptoms indicate pretty strongly that you forgot about vacuuming
> the system catalogs.  A plain "VACUUM" executed in every database, by
> a superuser, is sufficient for this.  Trying to be smart by vacuuming
> only what you think needs vacuumed is not sufficient.

By "in every database" do you mean we need to vacuum the template*
databases too?

[and I'm just thankful, that as major wake up calls go, this was only
painful as opposed to fatal or permanently debilitating!]

    Nigel.
--
[ Nigel Metheringham             Nigel.Metheringham@InTechnology.com ]
[ - Comments in this message are my own and not ITO opinion/policy - ]


Re: Loss of table structure on 7.3.19

From
Tom Lane
Date:
Nigel Metheringham <nigel.metheringham@dev.intechnology.co.uk> writes:
> On 17 Sep 2009, at 14:57, Tom Lane wrote:
>> The symptoms indicate pretty strongly that you forgot about vacuuming
>> the system catalogs.  A plain "VACUUM" executed in every database, by
>> a superuser, is sufficient for this.  Trying to be smart by vacuuming
>> only what you think needs vacuumed is not sufficient.

> By "in every database" do you mean we need to vacuum the template*
> databases too?

All the connectable databases, I believe is the rule (although 7.4
was so long it might have done things differently).

            regards, tom lane

Re: Loss of table structure on 7.3.19

From
"Paul B. Anderson"
Date:
To clarify what a plain "vacuum" is, if you have an initdb data area, <a1> and the database listening on port <p1>,

Is this sufficient,

  export PGDATA=<a1>
  export PGPORT=<p1>
  vacuumdb --all --analyze

running as user postgres (on Linux)?

Thanks.

Paul Anderson


Tom Lane wrote:
Nigel Metheringham <nigel.metheringham@dev.intechnology.co.uk> writes: 
I have a database thats been running in production use since 2006 on a  
Centos 4.7 (originally an earlier 4 release, updated incrementally).   
The pg version is somewhat ancient as we have stuck with the system  
postgres - currently postgresql-7.4.19-1.el4_6.1.   
 
Yesterday it all fell apart with all queries/updates into it having  
issues.  A check showed that many of the tables had lost their  
definitions - for example the task_log table now consisted on a single  
timestamp field rather than the selection of fields that would  
normally be there.   
You hit transaction ID wraparound.  There are automatic defenses against
this in 8.1 and up, but in 7.4 it's all on the DBA's head to vacuum
everything often enough.  See
http://www.postgresql.org/docs/7.4/static/maintenance.html#VACUUM-FOR-WRAPAROUND
 
However we do have a regular vacuuming process - every day each table  
is VACUUM ANALYZE-ed (as well as an index rebuild).   
The symptoms indicate pretty strongly that you forgot about vacuuming
the system catalogs.  A plain "VACUUM" executed in every database, by
a superuser, is sufficient for this.  Trying to be smart by vacuuming
only what you think needs vacuumed is not sufficient.
		regards, tom lane
 

Re: Loss of table structure on 7.3.19

From
Tom Lane
Date:
"Paul B. Anderson" <paul.a@pnlassociates.com> writes:
> To clarify what a plain "vacuum" is, if you have an initdb data area,
> <a1> and the database listening on port <p1>,

> Is this sufficient,

>   export PGDATA=<a1>
>   export PGPORT=<p1>
>   vacuumdb --all --analyze

> running as user postgres (on Linux)?

The --analyze isn't really necessary, but yeah that should do it.

            regards, tom lane