Re: Expose checkpoint start/finish times into SQL. - Mailing list pgsql-patches

From Tom Lane
Subject Re: Expose checkpoint start/finish times into SQL.
Date
Msg-id 6413.1207290995@sss.pgh.pa.us
Whole thread Raw
In response to Re: Expose checkpoint start/finish times into SQL.  (Greg Smith <gsmith@gregsmith.com>)
Responses Re: Expose checkpoint start/finish times into SQL.  (Greg Smith <gsmith@gregsmith.com>)
List pgsql-patches
Greg Smith <gsmith@gregsmith.com> writes:
> On Fri, 4 Apr 2008, Tom Lane wrote:
>> (And you still didn't tell me what the actual failure case was.)

> Database stops checkpointing.  WAL files pile up.  In the middle of
> backup, system finally dies, and when it starts recovery there's a bad
> record in the WAL files--which there are now thousands of to apply, and
> the bad one is 4 hours of replay in.  Believe it or not, it goes downhill
> from there.

> It's what kicked off the first step that's the big mystery.

Indeed :-(.  But given those observations, I'd still have about zero
faith in the usefulness of this patch.  If the bgwriter is not able to
complete checkpoints, is it able to tell you the truth about what it's
doing?

The actual advice I'd give to a DBA faced with such a case is to
kill -ABRT the bgwriter and send the stack trace to -hackers.
That's not in the proposed patch though...

            regards, tom lane

pgsql-patches by date:

Previous
From: Greg Smith
Date:
Subject: Re: Expose checkpoint start/finish times into SQL.
Next
From: Greg Smith
Date:
Subject: Re: Expose checkpoint start/finish times into SQL.