Thread: how to recover after harddisk error
Hi, Yesterday at about 8pm the harddisk subsystem of our web application crashed, because of some scsi-error. The system could be restarted today in the morning, but the database would not come up again. The following info could be found in the log file. 2003-02-26 09:03:06 [1291] DEBUG: database system was interrupted at 2003-02-25 20:19:22 CET 2003-02-26 09:03:06 [1291] DEBUG: open of /usr/local/pgsql/data/pg_xlog/0000001A000000C9 (log file 26, segment 201) failed : No such file or directory 2003-02-26 09:03:06 [1291] DEBUG: invalid primary checkpoint record 2003-02-26 09:03:06 [1291] DEBUG: open of /usr/local/pgsql/data/pg_xlog/0000001A000000C8 (log file 26, segment 200) failed : No such file or directory 2003-02-26 09:03:06 [1291] DEBUG: invalid secondary checkpoint record 2003-02-26 09:03:06 [1291] FATAL 2: unable to locate a valid checkpoint record 2003-02-26 09:03:06 [1277] DEBUG: startup process (pid 1291) exited with exit code 2 2003-02-26 09:03:06 [1277] DEBUG: aborting startup due to startup process failure I did the following steps to get the system running again: - a new initdb in another data-directory - create the database again - restore the data from the last available nightly dump Is there a better way to get the system running again? Had there been any way to access the old system again? The steps I did took about 45 min which is quite long (cause the db-dump is rather large) and if there had been some important data it had been lost... TIA, peter
> 2003-02-26 09:03:06 [1291] DEBUG: invalid primary checkpoint record > 2003-02-26 09:03:06 [1291] DEBUG: open of > /usr/local/pgsql/data/pg_xlog/0000001A000000C8 (log file 26, segment > 200) failed > > I did the following steps to get the system running again: > > - a new initdb in another data-directory > - create the database again > - restore the data from the last available nightly dump > > Is there a better way to get the system running again? Had there been > any way to access the old system again? The steps I did took about 45 > min which is quite long (cause the db-dump is rather large) and if there > had been some important data it had been lost... pg_resetxlog from contrib Regards, Bjoern
Thanks a lot Bjoern. Just wanted to mention that I found pg_resetxlog to be available per default in pg7.3.2. >-----Ursprüngliche Nachricht----- >Von: Björn Metzdorf [mailto:bm@turtle-entertainment.de] >Gesendet: Mittwoch, 26. Februar 2003 10:25 >An: Peter Alberer; pgsql-general@postgresql.org >Betreff: Re: [GENERAL] how to recover after harddisk error > >> 2003-02-26 09:03:06 [1291] DEBUG: invalid primary checkpoint record >> 2003-02-26 09:03:06 [1291] DEBUG: open of >> /usr/local/pgsql/data/pg_xlog/0000001A000000C8 (log file 26, segment >> 200) failed >> >> I did the following steps to get the system running again: >> >> - a new initdb in another data-directory >> - create the database again >> - restore the data from the last available nightly dump >> >> Is there a better way to get the system running again? Had there been >> any way to access the old system again? The steps I did took about 45 >> min which is quite long (cause the db-dump is rather large) and if there >> had been some important data it had been lost... > >pg_resetxlog from contrib > >Regards, >Bjoern
"Peter Alberer" <h9351252@obelix.wu-wien.ac.at> writes: > 2003-02-26 09:03:06 [1291] DEBUG: open of > /usr/local/pgsql/data/pg_xlog/0000001A000000C9 (log file 26, segment > 201) failed > : No such file or directory > 2003-02-26 09:03:06 [1291] DEBUG: invalid primary checkpoint record > 2003-02-26 09:03:06 [1291] DEBUG: open of > /usr/local/pgsql/data/pg_xlog/0000001A000000C8 (log file 26, segment > 200) failed > : No such file or directory > 2003-02-26 09:03:06 [1291] DEBUG: invalid secondary checkpoint record > 2003-02-26 09:03:06 [1291] FATAL 2: unable to locate a valid > checkpoint record Assuming you haven't wiped the old database directory yet... What file name(s) are actually present in /usr/local/pgsql/data/pg_xlog/ ? What does pg_controldata show --- do the other fields of pg_control look sane? pg_resetxlog would have allowed you to restart, but at the price of losing any consistency guarantees about the results of recently-committed transactions. So I consider it a very last resort. What I'd like to understand first is why the system couldn't restart normally. regards, tom lane
Too bad, i had intended to keep the old database instance around, but i had to remove the files a few hours ago after running low on harddisk capacity... ciao, peter > "Peter Alberer" <h9351252@obelix.wu-wien.ac.at> writes: > > 2003-02-26 09:03:06 [1291] DEBUG: open of > > /usr/local/pgsql/data/pg_xlog/0000001A000000C9 (log file 26, segment > > 201) failed > > : No such file or directory > > 2003-02-26 09:03:06 [1291] DEBUG: invalid primary checkpoint record > > 2003-02-26 09:03:06 [1291] DEBUG: open of > > /usr/local/pgsql/data/pg_xlog/0000001A000000C8 (log file 26, segment > > 200) failed > > : No such file or directory > > 2003-02-26 09:03:06 [1291] DEBUG: invalid secondary checkpoint record > > 2003-02-26 09:03:06 [1291] FATAL 2: unable to locate a valid > > checkpoint record > > Assuming you haven't wiped the old database directory yet... > > What file name(s) are actually present in /usr/local/pgsql/data/pg_xlog/ > ? What does pg_controldata show --- do the other fields of pg_control > look sane? > > pg_resetxlog would have allowed you to restart, but at the price of > losing any consistency guarantees about the results of > recently-committed transactions. So I consider it a very last resort. > What I'd like to understand first is why the system couldn't restart > normally. > > regards, tom lane >