Thread: recover corrupt DB?

recover corrupt DB?

From
Dan Armbrust
Date:
I had a test system (read as not backed up, sigh) which had the disk
go full while PostgreSQL was loaded, consequently, PostgreSQL will no
longer start.

It is logging an error about detecting an invalid shutdown, trying to
replay something, and then an error about not being able to open a
file it is looking for.

Is this DB toast?  Or is there something I could do to get the DB back
into a state where it will start, without losing everything?

Thanks,

Dan

Re: recover corrupt DB?

From
Peter Eisentraut
Date:
On Thursday 23 April 2009 18:30:27 Dan Armbrust wrote:
> I had a test system (read as not backed up, sigh) which had the disk
> go full while PostgreSQL was loaded, consequently, PostgreSQL will no
> longer start.
>
> It is logging an error about detecting an invalid shutdown, trying to
> replay something, and then an error about not being able to open a
> file it is looking for.

Knowing what file would help analyze this.  In general, pg_resetxlog would be
the tool to try here.  Don't panic yet. ;-)


Re: recover corrupt DB?

From
Dan Armbrust
Date:
>   In general, pg_resetxlog would be
> the tool to try here.  Don't panic yet. ;-)
>
>

Yep, that was the command I was looking for.  That at least got the DB
to a point where it would start, and I was able to do a dump.

So, I dumped and reloaded all of the databases.  Things seem fine, but
bits and pieces of documentation I've seen for pg_resetxlog also
recommend initdb, and starting over.  Is that necessary?

Thanks,

Dan

Re: recover corrupt DB?

From
Milos Babic
Date:

Yes.
Some things like duplicate primary key can exist in pg_resetxlog-ed db.
So, dump db and restore it again on clean initialized cluster.

Regards
Milos

On Thu, Apr 23, 2009 at 8:25 PM, Dan Armbrust <daniel.armbrust.list@gmail.com> wrote:
>   In general, pg_resetxlog would be
> the tool to try here.  Don't panic yet. ;-)
>
>

Yep, that was the command I was looking for.  That at least got the DB
to a point where it would start, and I was able to do a dump.

So, I dumped and reloaded all of the databases.  Things seem fine, but
bits and pieces of documentation I've seen for pg_resetxlog also
recommend initdb, and starting over.  Is that necessary?

Thanks,

Dan

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: recover corrupt DB?

From
Craig Ringer
Date:
Peter Eisentraut wrote:
> On Thursday 23 April 2009 18:30:27 Dan Armbrust wrote:
>> I had a test system (read as not backed up, sigh) which had the disk
>> go full while PostgreSQL was loaded, consequently, PostgreSQL will no
>> longer start.
>>
>> It is logging an error about detecting an invalid shutdown, trying to
>> replay something, and then an error about not being able to open a
>> file it is looking for.
>
> Knowing what file would help analyze this.  In general, pg_resetxlog would be
> the tool to try here.  Don't panic yet. ;-)

I've been wondering about this for a while. Why does Pg end up with the
database in an unusable, unrecoverable state after a disk-full error? Is
there no way it can efficiently defend against issues writing to the
WAL? Is it, in fact, issues with appending to the current WAL segment
that're the problem anyway?

This may come up even on fairly well managed databases if users have
direct access. To me, with a largely user-and-admin perspective, it
seems like something that really should be handled a bit more cleanly.

--
Craig Ringer

Re: recover corrupt DB?

From
Steve Clark
Date:
Craig Ringer wrote:
> Peter Eisentraut wrote:
>> On Thursday 23 April 2009 18:30:27 Dan Armbrust wrote:
>>> I had a test system (read as not backed up, sigh) which had the disk
>>> go full while PostgreSQL was loaded, consequently, PostgreSQL will no
>>> longer start.
>>>
>>> It is logging an error about detecting an invalid shutdown, trying to
>>> replay something, and then an error about not being able to open a
>>> file it is looking for.
>> Knowing what file would help analyze this.  In general, pg_resetxlog would be
>> the tool to try here.  Don't panic yet. ;-)
>
> I've been wondering about this for a while. Why does Pg end up with the
> database in an unusable, unrecoverable state after a disk-full error? Is
> there no way it can efficiently defend against issues writing to the
> WAL? Is it, in fact, issues with appending to the current WAL segment
> that're the problem anyway?
>
> This may come up even on fairly well managed databases if users have
> direct access. To me, with a largely user-and-admin perspective, it
> seems like something that really should be handled a bit more cleanly.
>
> --
> Craig Ringer
>
Hmm...

On all our servers we have a cron job that runs daily and reports disk usage stats.
Maybe you need something similar.

Re: recover corrupt DB?

From
Craig Ringer
Date:
> On all our servers we have a cron job that runs daily and reports disk
> usage stats.
> Maybe you need something similar.

Of course. I have Cacti running to monitor disk usage on all my servers.

That doesn't help if a user creates several duplicates of a huge table,
or otherwise gobbles disk space. There's always the *potential* to run
out of disk space, and I'm concerned that Pg doesn't handle that
gracefully. I agree it shouldn't happen, but Pg shouldn't mangle the DB
when it does, either.

--
Craig Ringer

Re: recover corrupt DB?

From
Tom Lane
Date:
Craig Ringer <craig@postnewspapers.com.au> writes:
> I've been wondering about this for a while. Why does Pg end up with the
> database in an unusable, unrecoverable state after a disk-full error?

It doesn't.  There must have been some other filesystem misfeasance
involved in the OP's problem.

            regards, tom lane

Re: recover corrupt DB?

From
Craig Ringer
Date:
Tom Lane wrote:
> Craig Ringer <craig@postnewspapers.com.au> writes:
>> I've been wondering about this for a while. Why does Pg end up with the
>> database in an unusable, unrecoverable state after a disk-full error?
>
> It doesn't.  There must have been some other filesystem misfeasance
> involved in the OP's problem.

Cool - so it's definitely meant to handle disk-full fine. That's what  I
was concerned about.

These reports seem to come up a bit, with disk full issues resulting in
the need to pg_resetxlog, dump, and re-initdb, but I wouldn't be too
shocked if they all turned out to be on xfs or something like that.

--
Craig Ringer

Re: recover corrupt DB?

From
Tom Lane
Date:
Craig Ringer <craig@postnewspapers.com.au> writes:
> These reports seem to come up a bit, with disk full issues resulting in
> the need to pg_resetxlog, dump, and re-initdb, but I wouldn't be too
> shocked if they all turned out to be on xfs or something like that.

Well, there are cases where that might actually be the best strategy,
but PG by itself will not lose data on a disk-full condition.  What it
will do is shut down and refuse to play until you find it some more disk
space.  If you have no way to clear some space on the partition ...

Also, over the years we have found various bugs that contributed to bad
behavior in extreme circumstances.  For instance 7.x had a problem that
a very long index build prevented checkpoints from completing, which
would cause pg_xlog to bloat because WAL segments couldn't be recycled:
http://archives.postgresql.org//pgsql-general/2004-05/msg00414.php
which led to cases like this one:
http://archives.postgresql.org//pgsql-general/2005-03/msg01373.php
I think I actually recommended pg_resetxlog in that case because the
alternative was to wait for it to churn through 100GB of uncheckpointed
WAL.

Another interesting example is here:
http://archives.postgresql.org//pgsql-hackers/2004-01/msg00530.php
explanation here:
http://archives.postgresql.org/pgsql-hackers/2004-01/msg00606.php
in which Postgres survived an out-of-space condition for quite some time
without data loss.  It did finally panic because of a rather minor
bug in the pg_clog logic ... but still did not lose any committed
transactions.

            regards, tom lane

Re: recover corrupt DB?

From
Dan Armbrust
Date:
> These reports seem to come up a bit, with disk full issues resulting in
> the need to pg_resetxlog, dump, and re-initdb, but I wouldn't be too
> shocked if they all turned out to be on xfs or something like that.
>

My particular disk-full condition was on ext2.  Nothing exotic.  Also,
the process that filled the disk was postgres - if that makes any
difference - I had left a debug level turned up in the postgres config
file, and it was logging every single db query.  Since it wasn't set
up to remove old log files - it filled the disk.

Nothing else unusual occurred that I'm aware of - things went weird
for the lab tester, he cleared some space, rebooted the system, and
postgres didn't come back online.

Re: recover corrupt DB?

From
Alvaro Herrera
Date:
Dan Armbrust escribió:

> My particular disk-full condition was on ext2.  Nothing exotic.  Also,
> the process that filled the disk was postgres - if that makes any
> difference - I had left a debug level turned up in the postgres config
> file, and it was logging every single db query.  Since it wasn't set
> up to remove old log files - it filled the disk.
>
> Nothing else unusual occurred that I'm aware of - things went weird
> for the lab tester, he cleared some space, rebooted the system, and
> postgres didn't come back online.

Did "cleared some space" include the pg_xlog directory or something in
the vicinity?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support