Thread: 7.1 startup recovery failure
Hi, There's a report of startup recovery failure in Japan. Redo done but ... Unfortunately I have no time today. regards, Hiroshi Inoue KAMI wrote: > > > DEBUG: database system shutdown was interrupted at 2001-04-26 22:15:00 JST > DEBUG: CheckPoint record at (1, 3923829232) > DEBUG: Redo record at (1, 3923829232); Undo record at (0, 0); Shutdown TRUE > DEBUG: NextTransactionId: 7473265; NextOid: 2550911 > DEBUG: database system was not properly shut down; automatic recovery in > progress... > DEBUG: redo starts at (1, 3923829296) > DEBUG: ReadRecord: record with zero len at (1, 3923880136) > DEBUG: redo done at (1, 3923880100) > FATAL 2: XLogFlush: request is not satisfied > postmaster: Startup proc 4228 exited with status 512 - abort
> There's a report of startup recovery failure in Japan. > Redo done but ... > Unfortunately I have no time today. Please ask to start up with wal_debug = 1... Vadim
Hiroshi Inoue <Inoue@tpf.co.jp> writes: > There's a report of startup recovery failure in Japan. > >> DEBUG: redo done at (1, 3923880100) >> FATAL 2: XLogFlush: request is not satisfied >> postmaster: Startup proc 4228 exited with status 512 - abort Is this person using 7.1 release, or a beta/RC version? That looks just like the last WAL bug Vadim fixed before final ... regards, tom lane
> > There's a report of startup recovery failure in Japan. > > > >> DEBUG: redo done at (1, 3923880100) > >> FATAL 2: XLogFlush: request is not satisfied > >> postmaster: Startup proc 4228 exited with status 512 - abort > > Is this person using 7.1 release, or a beta/RC version? That looks > just like the last WAL bug Vadim fixed before final ... No, it doesn't. That bug was related to cases when there is no room on last log page for startup checkpoint. ~5k is free in this case. Vadim
"Mikheev, Vadim" wrote: > > > > There's a report of startup recovery failure in Japan. > > > > > >> DEBUG: redo done at (1, 3923880100) > > >> FATAL 2: XLogFlush: request is not satisfied > > >> postmaster: Startup proc 4228 exited with status 512 - abort > > > > Is this person using 7.1 release, or a beta/RC version? That looks > > just like the last WAL bug Vadim fixed before final ... > > No, it doesn't. That bug was related to cases when there is no room > on last log page for startup checkpoint. ~5k is free in this case. > I haven't gotten any reply from him yet. Many people are on vacation now in Japan. Probably we couldn't expect too much of him. regards, Hiroshi Inoue
Vadim Mikheev wrote: > > > There's a report of startup recovery failure in Japan. > > Redo done but ... > > Unfortunately I have no time today. > > Please ask to start up with wal_debug = 1... > Isn't it very difficult for dbas to leave the corrupted database as it is ? ISTM we could hardly expect to get the log with wal_debug = 1 unless we automatically force the log in case of recovery failures. regards, Hiroshi Inoue
Corrupted or not, after a crash take a snapshot of the data tree before firing it back up again. Doesn't take that much time (especially with a netapp filer) and it allows for a virtually unlimited number of attempts to solve the trouble or debug. -- Rod Taylor BarChord Entertainment Inc. ----- Original Message ----- From: "Hiroshi Inoue" <Inoue@tpf.co.jp> To: "Vadim Mikheev" <vmikheev@sectorbase.com> Cc: "pgsql-hackers" <pgsql-hackers@postgresql.org> Sent: Monday, April 30, 2001 11:02 PM Subject: Re: [HACKERS] 7.1 startup recovery failure > Vadim Mikheev wrote: > > > > > There's a report of startup recovery failure in Japan. > > > Redo done but ... > > > Unfortunately I have no time today. > > > > Please ask to start up with wal_debug = 1... > > > > Isn't it very difficult for dbas to leave the > corrupted database as it is ? > ISTM we could hardly expect to get the log with > wal_debug = 1 unless we automatically force the > log in case of recovery failures. > > regards, > Hiroshi Inoue > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://www.postgresql.org/search.mpl >
* Rod Taylor <rbt@barchord.com> [010430 22:10] wrote: > Corrupted or not, after a crash take a snapshot of the data tree > before firing it back up again. Doesn't take that much time > (especially with a netapp filer) and it allows for a virtually > unlimited number of attempts to solve the trouble or debug. > You run your database over NFS? They must be made of steel. :) -- -Alfred Perlstein - [alfred@freebsd.org] Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/