Re: Expose checkpoint start/finish times into SQL. - Mailing list pgsql-patches

From Greg Smith
Subject Re: Expose checkpoint start/finish times into SQL.
Date
Msg-id Pine.GSO.4.64.0804040200560.2256@westnet.com
Whole thread Raw
In response to Re: Expose checkpoint start/finish times into SQL.  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Expose checkpoint start/finish times into SQL.  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Expose checkpoint start/finish times into SQL.  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-patches
On Fri, 4 Apr 2008, Tom Lane wrote:

> (And you still didn't tell me what the actual failure case was.)

Database stops checkpointing.  WAL files pile up.  In the middle of
backup, system finally dies, and when it starts recovery there's a bad
record in the WAL files--which there are now thousands of to apply, and
the bad one is 4 hours of replay in.  Believe it or not, it goes downhill
from there.

It's what kicked off the first step that's the big mystery.  The only code
path I thought of that can block checkpoints like this is when the
archive_command isn't working anymore, and that wasn't being used.  Given
some of the other corruption found later and the bad memory issues
discovered, a bit flipping in the "do I need to checkpoint now?" code or
data seems just as likely as any other explanation.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

pgsql-patches by date:

Previous
From: Tom Lane
Date:
Subject: Re: Expose checkpoint start/finish times into SQL.
Next
From: Tom Lane
Date:
Subject: Re: Expose checkpoint start/finish times into SQL.