Re: files under pg_clog directories are missing - Mailing list pgsql-admin

From Greg Smith
Subject Re: files under pg_clog directories are missing
Date
Msg-id 4C9432DA.4000609@2ndquadrant.com
Whole thread Raw
In response to files under pg_clog directories are missing  (Dinesh Bhandary <dbhandary@iii.com>)
Responses Re: files under pg_clog directories are missing  (Scott Marlowe <scott.marlowe@gmail.com>)
Re: files under pg_clog directories are missing  (Dinesh Bhandary <dbhandary@iii.com>)
List pgsql-admin
Dinesh Bhandary wrote:
> Due to hardware crash we ran into issues where some blocks were
> corrupted and some files were missing.
> I was able to get over the corrupted blocks ( errmsg - "invalid page
> header in block 12345  of realtion x") by setting
> zero_damaged_pages=0 and running vacuum afterwards. Now I am running
> into situations where pg_clog files are missing (errmsg - "could not
> open pg_clog/0D0D). I have a backup which is quite old ( considering
> this as a last resort). Is there any other way to fix this problem?
>
> I also created empty blocks to fool postgres, but there are so many of
> these file missing I was wondering if there a better/faster way to fix
> this problem.

I hope you made a whole backup of the files before you started trying to
fix the damage too.  It's possible to try and fix this using tricks like
zero_damaged_pages and dummy clog files, only to make things worse.  We
do data recovery services here, and I've had to roll back to the
original copy of the data multiple times before in order to try
different things before getting a get good copy of someone's data back
again.  If you don't have a copy of the database yet, do that before you
do any more experimenting with the clog files.

I wrote a summary of links to past work like this you may find useful,
and a little program to create missing pg_clog files that all say "the
transaction you're asking about committed", available at:
http://archives.postgresql.org/pgsql-general/2009-07/msg00985.php

You might further script that to speed up how fast you can fix these as
they pop up, which makes the test/correct cycle time go down.  You might
even write a script that loops over starting the database, looks at the
end of the log file, and if it's yet another one of these missing just
extract its number, recreate it, and start again.

Unfortunately, doing better than that is tricky.  We had to modify the
PostgreSQL source code to automatically create them in order to handle
this safely last time I ran into one of these that was badly corrupted
and missing a whole lot of real clog files, not just ones that were
unlikely to exist.  You should be staring at the numbers of each one of
these as they're requested.  If the range is way outside of the active
clog files you have, that's probably one you can create safely because
it's garbage data anyway.  But if it starts asking for clog files that
are in the middle or near the ends of the set you've got, you may have a
bigger problem on your hands.

P.S. Make sure you dump a whole copy of the database the minute you get
it started again and reload that before you start using it.  You have no
idea what state all of the tables are really in after a crash like this
without such an exercise.

--
Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us
Author, "PostgreSQL 9.0 High Performance"    Pre-ordering at:
https://www.packtpub.com/postgresql-9-0-high-performance/book


pgsql-admin by date:

Previous
From: Greg Smith
Date:
Subject: Re: question about HA in PG 9.0
Next
From: Scott Marlowe
Date:
Subject: Re: files under pg_clog directories are missing