Thread: recover corrupt DB?
I had a test system (read as not backed up, sigh) which had the disk go full while PostgreSQL was loaded, consequently, PostgreSQL will no longer start. It is logging an error about detecting an invalid shutdown, trying to replay something, and then an error about not being able to open a file it is looking for. Is this DB toast? Or is there something I could do to get the DB back into a state where it will start, without losing everything? Thanks, Dan
On Thursday 23 April 2009 18:30:27 Dan Armbrust wrote: > I had a test system (read as not backed up, sigh) which had the disk > go full while PostgreSQL was loaded, consequently, PostgreSQL will no > longer start. > > It is logging an error about detecting an invalid shutdown, trying to > replay something, and then an error about not being able to open a > file it is looking for. Knowing what file would help analyze this. In general, pg_resetxlog would be the tool to try here. Don't panic yet. ;-)
> In general, pg_resetxlog would be > the tool to try here. Don't panic yet. ;-) > > Yep, that was the command I was looking for. That at least got the DB to a point where it would start, and I was able to do a dump. So, I dumped and reloaded all of the databases. Things seem fine, but bits and pieces of documentation I've seen for pg_resetxlog also recommend initdb, and starting over. Is that necessary? Thanks, Dan
Yes.
Some things like duplicate primary key can exist in pg_resetxlog-ed db.
So, dump db and restore it again on clean initialized cluster.
Regards
Milos
On Thu, Apr 23, 2009 at 8:25 PM, Dan Armbrust <daniel.armbrust.list@gmail.com> wrote:
> In general, pg_resetxlog would be
> the tool to try here. Don't panic yet. ;-)
>
>
Yep, that was the command I was looking for. That at least got the DB
to a point where it would start, and I was able to do a dump.
So, I dumped and reloaded all of the databases. Things seem fine, but
bits and pieces of documentation I've seen for pg_resetxlog also
recommend initdb, and starting over. Is that necessary?
Thanks,
Dan
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Peter Eisentraut wrote: > On Thursday 23 April 2009 18:30:27 Dan Armbrust wrote: >> I had a test system (read as not backed up, sigh) which had the disk >> go full while PostgreSQL was loaded, consequently, PostgreSQL will no >> longer start. >> >> It is logging an error about detecting an invalid shutdown, trying to >> replay something, and then an error about not being able to open a >> file it is looking for. > > Knowing what file would help analyze this. In general, pg_resetxlog would be > the tool to try here. Don't panic yet. ;-) I've been wondering about this for a while. Why does Pg end up with the database in an unusable, unrecoverable state after a disk-full error? Is there no way it can efficiently defend against issues writing to the WAL? Is it, in fact, issues with appending to the current WAL segment that're the problem anyway? This may come up even on fairly well managed databases if users have direct access. To me, with a largely user-and-admin perspective, it seems like something that really should be handled a bit more cleanly. -- Craig Ringer
Craig Ringer wrote: > Peter Eisentraut wrote: >> On Thursday 23 April 2009 18:30:27 Dan Armbrust wrote: >>> I had a test system (read as not backed up, sigh) which had the disk >>> go full while PostgreSQL was loaded, consequently, PostgreSQL will no >>> longer start. >>> >>> It is logging an error about detecting an invalid shutdown, trying to >>> replay something, and then an error about not being able to open a >>> file it is looking for. >> Knowing what file would help analyze this. In general, pg_resetxlog would be >> the tool to try here. Don't panic yet. ;-) > > I've been wondering about this for a while. Why does Pg end up with the > database in an unusable, unrecoverable state after a disk-full error? Is > there no way it can efficiently defend against issues writing to the > WAL? Is it, in fact, issues with appending to the current WAL segment > that're the problem anyway? > > This may come up even on fairly well managed databases if users have > direct access. To me, with a largely user-and-admin perspective, it > seems like something that really should be handled a bit more cleanly. > > -- > Craig Ringer > Hmm... On all our servers we have a cron job that runs daily and reports disk usage stats. Maybe you need something similar.
> On all our servers we have a cron job that runs daily and reports disk > usage stats. > Maybe you need something similar. Of course. I have Cacti running to monitor disk usage on all my servers. That doesn't help if a user creates several duplicates of a huge table, or otherwise gobbles disk space. There's always the *potential* to run out of disk space, and I'm concerned that Pg doesn't handle that gracefully. I agree it shouldn't happen, but Pg shouldn't mangle the DB when it does, either. -- Craig Ringer
Craig Ringer <craig@postnewspapers.com.au> writes: > I've been wondering about this for a while. Why does Pg end up with the > database in an unusable, unrecoverable state after a disk-full error? It doesn't. There must have been some other filesystem misfeasance involved in the OP's problem. regards, tom lane
Tom Lane wrote: > Craig Ringer <craig@postnewspapers.com.au> writes: >> I've been wondering about this for a while. Why does Pg end up with the >> database in an unusable, unrecoverable state after a disk-full error? > > It doesn't. There must have been some other filesystem misfeasance > involved in the OP's problem. Cool - so it's definitely meant to handle disk-full fine. That's what I was concerned about. These reports seem to come up a bit, with disk full issues resulting in the need to pg_resetxlog, dump, and re-initdb, but I wouldn't be too shocked if they all turned out to be on xfs or something like that. -- Craig Ringer
Craig Ringer <craig@postnewspapers.com.au> writes: > These reports seem to come up a bit, with disk full issues resulting in > the need to pg_resetxlog, dump, and re-initdb, but I wouldn't be too > shocked if they all turned out to be on xfs or something like that. Well, there are cases where that might actually be the best strategy, but PG by itself will not lose data on a disk-full condition. What it will do is shut down and refuse to play until you find it some more disk space. If you have no way to clear some space on the partition ... Also, over the years we have found various bugs that contributed to bad behavior in extreme circumstances. For instance 7.x had a problem that a very long index build prevented checkpoints from completing, which would cause pg_xlog to bloat because WAL segments couldn't be recycled: http://archives.postgresql.org//pgsql-general/2004-05/msg00414.php which led to cases like this one: http://archives.postgresql.org//pgsql-general/2005-03/msg01373.php I think I actually recommended pg_resetxlog in that case because the alternative was to wait for it to churn through 100GB of uncheckpointed WAL. Another interesting example is here: http://archives.postgresql.org//pgsql-hackers/2004-01/msg00530.php explanation here: http://archives.postgresql.org/pgsql-hackers/2004-01/msg00606.php in which Postgres survived an out-of-space condition for quite some time without data loss. It did finally panic because of a rather minor bug in the pg_clog logic ... but still did not lose any committed transactions. regards, tom lane
> These reports seem to come up a bit, with disk full issues resulting in > the need to pg_resetxlog, dump, and re-initdb, but I wouldn't be too > shocked if they all turned out to be on xfs or something like that. > My particular disk-full condition was on ext2. Nothing exotic. Also, the process that filled the disk was postgres - if that makes any difference - I had left a debug level turned up in the postgres config file, and it was logging every single db query. Since it wasn't set up to remove old log files - it filled the disk. Nothing else unusual occurred that I'm aware of - things went weird for the lab tester, he cleared some space, rebooted the system, and postgres didn't come back online.
Dan Armbrust escribió: > My particular disk-full condition was on ext2. Nothing exotic. Also, > the process that filled the disk was postgres - if that makes any > difference - I had left a debug level turned up in the postgres config > file, and it was logging every single db query. Since it wasn't set > up to remove old log files - it filled the disk. > > Nothing else unusual occurred that I'm aware of - things went weird > for the lab tester, he cleared some space, rebooted the system, and > postgres didn't come back online. Did "cleared some space" include the pg_xlog directory or something in the vicinity? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support