Thread: Corrupted database - how to recover

Corrupted database - how to recover

From
Johann Spies
Date:
After two disks on our raid-5 system failed causing a filesystem
failure.

We could get one online again and the filesystem is at present usable
but  I cannot start postgresql at the moment:

$ /usr/lib/postgresql/8.3/bin/postgres --single -D /etc/postgresql/8.3/main/
2010-05-27 09:22:48 SAST FATAL:  could not stat directory "base/16400": Structure needs cleaning
2010-05-27 09:22:48 SAST CONTEXT:  xlog redo insert: rel 1663/16400/16438; tid 10960/41

in 'base' I see

drwx------ 10 postgres postgres  102 2010-04-22 15:09 .
drwx------ 10 postgres postgres 4096 2010-05-27 09:29 ..
drwx------  2 postgres postgres 4096 2010-05-20 02:11 1
drwx------  2 postgres postgres 4096 2010-04-22 15:07 11510
??????????  ? ?        ?           ?                ? 16388
drwx------  2 postgres postgres 4096 2010-05-20 02:11 16389
drwx------  2 postgres postgres 4096 2010-05-20 02:11 16390
drwx------  2 postgres postgres 4096 2010-05-20 02:11 16391
??????????  ? ?        ?           ?                ? 16400

I can just move the data-directory, do an initdb and use an older dump
to repair the database as it was some time ago.  Fortunately the data is
not mission critical.  But what if it was.

How do I recover from a situation like this?

Regards
Johann
--
Johann Spies          Telefoon: 021-808 4599
Informasietegnologie, Universiteit van Stellenbosch

     "All that the Father giveth me shall come to me; and
      him that cometh to me I will in no wise cast out."
                                      John 6:37

Re: Corrupted database - how to recover

From
Andreas Schmitz
Date:
hi johann,

maybe you should consider using point in time recovery (pitr) if the
database is mission critical.

http://www.postgresql.org/docs/8.4/interactive/continuous-archiving.html

regards

andreas



Johann Spies wrote:
> After two disks on our raid-5 system failed causing a filesystem
> failure.
>
> We could get one online again and the filesystem is at present usable
> but  I cannot start postgresql at the moment:
>
> $ /usr/lib/postgresql/8.3/bin/postgres --single -D /etc/postgresql/8.3/main/
> 2010-05-27 09:22:48 SAST FATAL:  could not stat directory "base/16400": Structure needs cleaning
> 2010-05-27 09:22:48 SAST CONTEXT:  xlog redo insert: rel 1663/16400/16438; tid 10960/41
>
> in 'base' I see
>
> drwx------ 10 postgres postgres  102 2010-04-22 15:09 .
> drwx------ 10 postgres postgres 4096 2010-05-27 09:29 ..
> drwx------  2 postgres postgres 4096 2010-05-20 02:11 1
> drwx------  2 postgres postgres 4096 2010-04-22 15:07 11510
> ??????????  ? ?        ?           ?                ? 16388
> drwx------  2 postgres postgres 4096 2010-05-20 02:11 16389
> drwx------  2 postgres postgres 4096 2010-05-20 02:11 16390
> drwx------  2 postgres postgres 4096 2010-05-20 02:11 16391
> ??????????  ? ?        ?           ?                ? 16400
>
> I can just move the data-directory, do an initdb and use an older dump
> to repair the database as it was some time ago.  Fortunately the data is
> not mission critical.  But what if it was.
>
> How do I recover from a situation like this?
>
> Regards
> Johann
>


Re: Corrupted database - how to recover

From
Johann Spies
Date:
On Thu, May 27, 2010 at 09:54:00AM +0200, Andreas Schmitz wrote:
>
> maybe you should consider using point in time recovery (pitr) if the
> database is mission critical.
>
> http://www.postgresql.org/docs/8.4/interactive/continuous-archiving.html

Thanks.  This is the first time I experienced this type of problem.
Fortunately the data was not that critical.  I agree that pitr should be
a part of any mission critical setup.

Regards
Johann
--
Johann Spies          Telefoon: 021-808 4599
Informasietegnologie, Universiteit van Stellenbosch

     "All that the Father giveth me shall come to me; and
      him that cometh to me I will in no wise cast out."
                                      John 6:37