Thread: 7.1 startup recovery failure

7.1 startup recovery failure

From
Hiroshi Inoue
Date:
Hi,
There's a report of startup recovery failure in Japan.
Redo done but ...
Unfortunately I have no time today.

regards,
Hiroshi Inoue

KAMI wrote:
> 
> 
> DEBUG:  database system shutdown was interrupted at 2001-04-26 22:15:00 JST
> DEBUG:  CheckPoint record at (1, 3923829232)
> DEBUG:  Redo record at (1, 3923829232); Undo record at (0, 0); Shutdown TRUE
> DEBUG:  NextTransactionId: 7473265; NextOid: 2550911
> DEBUG:  database system was not properly shut down; automatic recovery in
> progress...
> DEBUG:  redo starts at (1, 3923829296)
> DEBUG:  ReadRecord: record with zero len at (1, 3923880136)
> DEBUG:  redo done at (1, 3923880100)
> FATAL 2:  XLogFlush: request is not satisfied
> postmaster: Startup proc 4228 exited with status 512 - abort


Re: 7.1 startup recovery failure

From
"Vadim Mikheev"
Date:
> There's a report of startup recovery failure in Japan.
> Redo done but ...
> Unfortunately I have no time today.

Please ask to start up with wal_debug = 1...

Vadim




Re: 7.1 startup recovery failure

From
Tom Lane
Date:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> There's a report of startup recovery failure in Japan.
>
>> DEBUG:  redo done at (1, 3923880100)
>> FATAL 2:  XLogFlush: request is not satisfied
>> postmaster: Startup proc 4228 exited with status 512 - abort

Is this person using 7.1 release, or a beta/RC version?  That looks
just like the last WAL bug Vadim fixed before final ...
        regards, tom lane


RE: 7.1 startup recovery failure

From
"Mikheev, Vadim"
Date:
> > There's a report of startup recovery failure in Japan.
> >
> >> DEBUG:  redo done at (1, 3923880100)
> >> FATAL 2:  XLogFlush: request is not satisfied
> >> postmaster: Startup proc 4228 exited with status 512 - abort
> 
> Is this person using 7.1 release, or a beta/RC version?  That looks
> just like the last WAL bug Vadim fixed before final ...

No, it doesn't. That bug was related to cases when there is no room
on last log page for startup checkpoint. ~5k is free in this case.

Vadim


Re: 7.1 startup recovery failure

From
Hiroshi Inoue
Date:
"Mikheev, Vadim" wrote:
> 
> > > There's a report of startup recovery failure in Japan.
> > >
> > >> DEBUG:  redo done at (1, 3923880100)
> > >> FATAL 2:  XLogFlush: request is not satisfied
> > >> postmaster: Startup proc 4228 exited with status 512 - abort
> >
> > Is this person using 7.1 release, or a beta/RC version?  That looks
> > just like the last WAL bug Vadim fixed before final ...
> 
> No, it doesn't. That bug was related to cases when there is no room
> on last log page for startup checkpoint. ~5k is free in this case.
> 

I haven't gotten any reply from him yet.
Many people are on vacation now in Japan.
Probably we couldn't expect too much of him.

regards,
Hiroshi Inoue


Re: 7.1 startup recovery failure

From
Hiroshi Inoue
Date:
Vadim Mikheev wrote:
> 
> > There's a report of startup recovery failure in Japan.
> > Redo done but ...
> > Unfortunately I have no time today.
> 
> Please ask to start up with wal_debug = 1...
> 

Isn't it very difficult for dbas to leave the
corrupted database as it is ?
ISTM we could hardly expect to get the log with
wal_debug = 1 unless we automatically force the
log in case of recovery failures.

regards,
Hiroshi Inoue


Re: 7.1 startup recovery failure

From
"Rod Taylor"
Date:
Corrupted or not, after a crash take a snapshot of the data tree
before firing it back up again.  Doesn't take that much time
(especially with a netapp filer) and it allows for a virtually
unlimited number of attempts to solve the trouble or debug.

--
Rod Taylor  BarChord Entertainment Inc.
----- Original Message -----
From: "Hiroshi Inoue" <Inoue@tpf.co.jp>
To: "Vadim Mikheev" <vmikheev@sectorbase.com>
Cc: "pgsql-hackers" <pgsql-hackers@postgresql.org>
Sent: Monday, April 30, 2001 11:02 PM
Subject: Re: [HACKERS] 7.1 startup recovery failure


> Vadim Mikheev wrote:
> >
> > > There's a report of startup recovery failure in Japan.
> > > Redo done but ...
> > > Unfortunately I have no time today.
> >
> > Please ask to start up with wal_debug = 1...
> >
>
> Isn't it very difficult for dbas to leave the
> corrupted database as it is ?
> ISTM we could hardly expect to get the log with
> wal_debug = 1 unless we automatically force the
> log in case of recovery failures.
>
> regards,
> Hiroshi Inoue
>
> ---------------------------(end of
broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://www.postgresql.org/search.mpl
>



Re: 7.1 startup recovery failure

From
Alfred Perlstein
Date:
* Rod Taylor <rbt@barchord.com> [010430 22:10] wrote:
> Corrupted or not, after a crash take a snapshot of the data tree
> before firing it back up again.  Doesn't take that much time
> (especially with a netapp filer) and it allows for a virtually
> unlimited number of attempts to solve the trouble or debug.
> 

You run your database over NFS?  They must be made of steel. :)

-- 
-Alfred Perlstein - [alfred@freebsd.org]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/