Re: Theory about XLogFlush startup failures - Mailing list pgsql-hackers

From Hiroshi Inoue
Subject Re: Theory about XLogFlush startup failures
Date
Msg-id 3C4392B0.637CF161@tpf.co.jp
Whole thread Raw
In response to Theory about XLogFlush startup failures  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Theory about XLogFlush startup failures  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> 
> I just spent some time trying to understand the mechanism behind the
> "XLogFlush: request is not satisfied" startup errors we've seen reported
> occasionally with 7.1.  The only apparent way for this to happen is for
> XLogFlush to be given a garbage WAL record pointer (ie, one pointing
> beyond the current end of WAL), which presumably must be coming from
> a corrupted LSN field in a data page.  Well, that's not too hard to
> believe during normal operation: say the disk drive drops some bits in
> the LSN field, and we read the page in, and don't have any immediate
> need to change it (which would cause the LSN to be overwritten); but we
> do find some transaction status hint bits to set, so the page gets
> marked dirty.  Then when the page is written out, bufmgr will try to
> flush xlog using the corrupted LSN pointer.

I agree with you at least at the point that we had better
continue FlushBufferPool() even though STOP-error occurs.

BTW doesn't the LSN corruption imply the possibility
of the corruption of other parts (of e.g. pg_log) ?

regards,
Hiroshi Inoue


pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: unicode words
Next
From: Brent Verner
Date:
Subject: Re: Problem reloading regression database