Re: Background writer and checkpointer in crash recovery - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Background writer and checkpointer in crash recovery
Date
Msg-id CANP8+jLqwRoKQEvAi=iRhSwgL1JWNxiUhx+RyG-D82CE_qrDew@mail.gmail.com
Whole thread Raw
In response to Re: Background writer and checkpointer in crash recovery  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Background writer and checkpointer in crash recovery  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
On Sun, 30 Aug 2020 at 01:39, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Thomas Munro <thomas.munro@gmail.com> writes:
> > Once we had the checkpointer running, we could also consider making
> > the end-of-recovery checkpoint optional, or at least the wait for it
> > to complete.  I haven't shown that in this patch, but it's just
> > different checkpoint request flags and possibly an end-of-recovery
> > record.  What problems do you foresee with that?  Why should we have
> > "fast" promotion but not "fast" crash recovery?
>
> I think that the rationale for that had something to do with trying
> to reduce the costs of repeated crashes.  If you've had one crash,
> you might well have another one in your near future, due to the same
> problem recurring.  Therefore, avoiding multiple replays of the same WAL
> is attractive; and to do that we have to complete a checkpoint before
> giving users a chance to crash us again.

Agreed. That rationale is shown in comments and in the commit message.

"We could launch it during crash recovery as well, but it seems better to keep
that codepath as simple as possible, for the sake of robustness. And it
couldn't do any restartpoints during crash recovery anyway, so it wouldn't be
that useful."

Having said that, we did raise the checkpoint_timeout by a lot, so the
situation today might be quite different. A large checkpoint_timeout
could eventually overflow shared buffers, with the right workload.

We don't have any stats to show whether this patch is worthwhile or
not, so I suggest adding the attached instrumentation patch as well so
we can see on production systems whether checkpoint_timeout is too
high by comparison with pg_stat_bgwriter. The patch is written in the
style of log_checkpoints.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: ModifyTable overheads in generic plans
Next
From: Peter Eisentraut
Date:
Subject: Clean up optional rules in grammar