Re: A few new options for CHECKPOINT - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: A few new options for CHECKPOINT
Date
Msg-id 20201204221739.GY16415@tamriel.snowman.net
Whole thread Raw
In response to Re: A few new options for CHECKPOINT  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: A few new options for CHECKPOINT
List pgsql-hackers
Greetings,

* Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:
> I think starting a spread checkpoint has some usefulness, if your
> checkpoint interval is very large but your completion target is not very
> close to 1.  In that case, you're expressing that you want a checkpoint
> to start now and not impact production unduly, so that you know when it
> finishes and therefore when is it a good time to start a backup.  (You
> will still have some WAL to replay, but it won't be as much as if you
> just ignored checkpoint considerations completely.)

You could view an immediate checkpoint as more-or-less being a 'spread'
checkpoint with a checkpoint completion target approaching 0.  In the
end, it's all about how much time you're going to spend trying to get
the data written out, because the WAL that's generated during that time
is what's going to have to get replayed.

If the goal is to not end up with an increase in IO from this, then you
want to spread things out as much as you can over as much time as you're
able to- but that then means that you're going to have that much WAL to
replay.  If you're alright with performing IO to get the amount of WAL
to replay to be minimal, then you just run 'CHECKPOINT;' before your
backup and you're good to go (and is why that's in the documentation as
a way to reduce your WAL replay time- because it reduces it as much as
possible given your IO capabilities).

If you don't mind the increased amount of IO and WAL, you could just
reduce checkpoint_timeout and then crash recovery and snapshot-based
backup recovery will also be reduced, no matter when you actually take
the snapshot.

> On the subject of measuring replay times for backups taking while
> pgbench is pounding the database, I think a realistic test does *not*
> have pgbench running at top speed; rather you have some non-maximal
> "-R xyz" option.  You would probably determine a value to use by running
> without -R, observing what's a typical transaction rate, and using some
> fraction (say, half) of that in the real run.

That'd halve the amount of WAL being generated per unit time, but I
don't think it really changes much when it comes to this particular
analysis..?

If you generate 16MB of WAL per minute, and the checkpoint timeout is 5
minutes, with a checkpoint target of 0.9, then at more-or-less any point
in time you've got ~5 minutes worth of WAL outstanding, or around 80MB.
If your completion target is 0.5 then, really, you might as well make it
0.9 and have your timeout be 2.5m, so that you've got a steady-state of
around 40MB of WAL outstanding.

What I'm getting around to is that the only place this kind of thing
makes sense is where you're front-loading all your IO during the
checkpoint because your checkpoint completion target is less than 0.9
and then, sure, there's a difference between snapshotting right when the
checkpoint completes vs. later- because if you wait around to snapshot,
we aren't actually doing IO during that time and just letting the WAL
build up, but that's an argument to remove checkpoint completion target
as an option that doesn't really make much sense in the first place,
imv, and recommend folks tune checkpoint timeout for the amount of
outstanding WAL they want to have when they are doing recovery (either
from a crash or from a snapshot).

Thanks,

Stephen

Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Removal of operator_precedence_warning
Next
From: Tom Lane
Date:
Subject: Re: Removal of operator_precedence_warning