Thread: Loss of table structure on 7.3.19
Hi, I have a database thats been running in production use since 2006 on a Centos 4.7 (originally an earlier 4 release, updated incrementally). The pg version is somewhat ancient as we have stuck with the system postgres - currently postgresql-7.4.19-1.el4_6.1. Yesterday it all fell apart with all queries/updates into it having issues. A check showed that many of the tables had lost their definitions - for example the task_log table now consisted on a single timestamp field rather than the selection of fields that would normally be there. Fortunately we have a recent backup and could restore the db with the loss of only a couple of hours of data. So now the post-mortum. Suspiciously, in the pg_clog directory there were 2049 files (0000 to 0800) - the time the failure happened coincided with the modification time on file 07FF. I see in the documentation (http://www.postgresql.org/docs/7.3/static/routine-vacuuming.html ) warnings regarding XID rollover at 1 billion - since it appears clog datastructures are 2 bits per XID then 2048 files of 256KB would be 1,048,576 which is close enough to 1 billion for me, and the effects are close to those described.... However we do have a regular vacuuming process - every day each table is VACUUM ANALYZE-ed (as well as an index rebuild). I've experimented this morning, and nothing I can do with the VACUUM command (FULL, against a table or the whole db etc) appears to make the number of CLOG files reduce. If I understand their function correctly there is no reason for them to hang around and multiply unless there are long-running open transactions (which there are not) - and surely if these were just open transaction related then a database shutdown/restart ought to scrub them. So the questions are:- - Is this plethora of (active) CLOG files likely to be reason for the problem? - Or am I barking up entirely the wrong tree? - How can I avoid this in the future? - Is there any other information I should be digging out (I don't have much in the way of postgres logs)? I've also got a Centos 5.x box with a similar installation but postgres version 8.1.11 on it. This also appears to have an increasing number of pg_clog files, and a full vacuum is not stopping this... Cheers Nigel. -- [ Nigel Metheringham Nigel.Metheringham@InTechnology.com ] [ - Comments in this message are my own and not ITO opinion/policy - ]
Nigel Metheringham <nigel.metheringham@dev.intechnology.co.uk> writes: > I have a database thats been running in production use since 2006 on a > Centos 4.7 (originally an earlier 4 release, updated incrementally). > The pg version is somewhat ancient as we have stuck with the system > postgres - currently postgresql-7.4.19-1.el4_6.1. > Yesterday it all fell apart with all queries/updates into it having > issues. A check showed that many of the tables had lost their > definitions - for example the task_log table now consisted on a single > timestamp field rather than the selection of fields that would > normally be there. You hit transaction ID wraparound. There are automatic defenses against this in 8.1 and up, but in 7.4 it's all on the DBA's head to vacuum everything often enough. See http://www.postgresql.org/docs/7.4/static/maintenance.html#VACUUM-FOR-WRAPAROUND > However we do have a regular vacuuming process - every day each table > is VACUUM ANALYZE-ed (as well as an index rebuild). The symptoms indicate pretty strongly that you forgot about vacuuming the system catalogs. A plain "VACUUM" executed in every database, by a superuser, is sufficient for this. Trying to be smart by vacuuming only what you think needs vacuumed is not sufficient. regards, tom lane
On 17 Sep 2009, at 14:57, Tom Lane wrote: > The symptoms indicate pretty strongly that you forgot about vacuuming > the system catalogs. A plain "VACUUM" executed in every database, by > a superuser, is sufficient for this. Trying to be smart by vacuuming > only what you think needs vacuumed is not sufficient. By "in every database" do you mean we need to vacuum the template* databases too? [and I'm just thankful, that as major wake up calls go, this was only painful as opposed to fatal or permanently debilitating!] Nigel. -- [ Nigel Metheringham Nigel.Metheringham@InTechnology.com ] [ - Comments in this message are my own and not ITO opinion/policy - ]
Nigel Metheringham <nigel.metheringham@dev.intechnology.co.uk> writes: > On 17 Sep 2009, at 14:57, Tom Lane wrote: >> The symptoms indicate pretty strongly that you forgot about vacuuming >> the system catalogs. A plain "VACUUM" executed in every database, by >> a superuser, is sufficient for this. Trying to be smart by vacuuming >> only what you think needs vacuumed is not sufficient. > By "in every database" do you mean we need to vacuum the template* > databases too? All the connectable databases, I believe is the rule (although 7.4 was so long it might have done things differently). regards, tom lane
To clarify what a plain "vacuum" is, if you have an initdb data area, <a1> and the database listening on port <p1>,
Is this sufficient,
export PGDATA=<a1>
export PGPORT=<p1>
vacuumdb --all --analyze
running as user postgres (on Linux)?
Thanks.
Paul Anderson
Tom Lane wrote:
Is this sufficient,
export PGDATA=<a1>
export PGPORT=<p1>
vacuumdb --all --analyze
running as user postgres (on Linux)?
Thanks.
Paul Anderson
Tom Lane wrote:
Nigel Metheringham <nigel.metheringham@dev.intechnology.co.uk> writes:I have a database thats been running in production use since 2006 on a Centos 4.7 (originally an earlier 4 release, updated incrementally). The pg version is somewhat ancient as we have stuck with the system postgres - currently postgresql-7.4.19-1.el4_6.1.Yesterday it all fell apart with all queries/updates into it having issues. A check showed that many of the tables had lost their definitions - for example the task_log table now consisted on a single timestamp field rather than the selection of fields that would normally be there.You hit transaction ID wraparound. There are automatic defenses against this in 8.1 and up, but in 7.4 it's all on the DBA's head to vacuum everything often enough. See http://www.postgresql.org/docs/7.4/static/maintenance.html#VACUUM-FOR-WRAPAROUNDHowever we do have a regular vacuuming process - every day each table is VACUUM ANALYZE-ed (as well as an index rebuild).The symptoms indicate pretty strongly that you forgot about vacuuming the system catalogs. A plain "VACUUM" executed in every database, by a superuser, is sufficient for this. Trying to be smart by vacuuming only what you think needs vacuumed is not sufficient. regards, tom lane
"Paul B. Anderson" <paul.a@pnlassociates.com> writes: > To clarify what a plain "vacuum" is, if you have an initdb data area, > <a1> and the database listening on port <p1>, > Is this sufficient, > export PGDATA=<a1> > export PGPORT=<p1> > vacuumdb --all --analyze > running as user postgres (on Linux)? The --analyze isn't really necessary, but yeah that should do it. regards, tom lane