Thread: A few new options for CHECKPOINT

A few new options for CHECKPOINT

From
"Bossart, Nathan"
Date:
Hi hackers,

I've attached a patch to add a few new options to CHECKPOINT that
might be useful.  Specifically, it adds the FORCE, IMMEDIATE, and WAIT
options.  All of these options are already implemented internally, but
it is not possible to adjust them for manually triggered checkpoints.
The bulk of this change is dedicated to expanding the syntax of
CHECKPOINT and adding detailed documentation.

I've mostly followed the pattern set forth by the options recently
added to VACUUM.  With this patch, CHECKPOINT takes an optional set of
parameters surround by parentheses.  The new parameters are enabled by
default but can be disabled by specifying FALSE, OFF, or 0.

The main purpose of this patch is to give users more control over
their manually requested checkpoints or restartpoints.  I suspect the
most useful option is IMMEDIATE, which can help avoid checkpoint-
related IO spikes.  However, I didn't see any strong reason to prevent
users from also adjusting FORCE and WAIT.

Nathan


Attachment

RE: A few new options for CHECKPOINT

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Bossart, Nathan <bossartn@amazon.com>
> The main purpose of this patch is to give users more control over their manually
> requested checkpoints or restartpoints.  I suspect the most useful option is
> IMMEDIATE, which can help avoid checkpoint- related IO spikes.  However, I
> didn't see any strong reason to prevent users from also adjusting FORCE and
> WAIT.

I think just IMMEDIATE would suffice, too.  But could you tell us why you got to want to give users more control?
Couldwe know concrete example situations where users want to perform CHECKPOINT with options?
 


Regards
Takayuki Tsunakawa


Re: A few new options for CHECKPOINT

From
"Bossart, Nathan"
Date:
On 11/24/20, 4:03 PM, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote:
> From: Bossart, Nathan <bossartn@amazon.com>
>> The main purpose of this patch is to give users more control over their manually
>> requested checkpoints or restartpoints.  I suspect the most useful option is
>> IMMEDIATE, which can help avoid checkpoint- related IO spikes.  However, I
>> didn't see any strong reason to prevent users from also adjusting FORCE and
>> WAIT.
>
> I think just IMMEDIATE would suffice, too.  But could you tell us why you got to want to give users more control?
Couldwe know concrete example situations where users want to perform CHECKPOINT with options?
 

It may be useful for backups taken with the "consistent snapshot"
approach.  As noted in the documentation [0], running CHECKPOINT
before taking the snapshot can reduce recovery time.  However, users
might wish to avoid the IO spike caused by an immediate checkpoint.

Nathan

[0] https://www.postgresql.org/docs/devel/backup-file.html


RE: A few new options for CHECKPOINT

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Bossart, Nathan <bossartn@amazon.com>
> It may be useful for backups taken with the "consistent snapshot"
> approach.  As noted in the documentation [0], running CHECKPOINT
> before taking the snapshot can reduce recovery time.  However, users
> might wish to avoid the IO spike caused by an immediate checkpoint.
> 
> [0] https://www.postgresql.org/docs/devel/backup-file.html

Ah, understood.  I agree that the slow or spread manual checkpoint is good to have.


Regards
Takayuki Tsunakawa


Re: A few new options for CHECKPOINT

From
Michael Paquier
Date:
On Wed, Nov 25, 2020 at 01:07:47AM +0000, tsunakawa.takay@fujitsu.com wrote:
> From: Bossart, Nathan <bossartn@amazon.com>
>> It may be useful for backups taken with the "consistent snapshot"
>> approach.  As noted in the documentation [0], running CHECKPOINT
>> before taking the snapshot can reduce recovery time.  However, users
>> might wish to avoid the IO spike caused by an immediate checkpoint.
>>
>> [0] https://www.postgresql.org/docs/devel/backup-file.html
>
> Ah, understood.  I agree that the slow or spread manual checkpoint is good to have.

I can see the use case for IMMEDIATE, but I fail to see the use cases
for WAIT and FORCE.  CHECKPOINT_FORCE is internally implied for the
end-of-recovery and shutdown checkpoints.  WAIT could be a dangerous
thing if disabled, as clients could pile up requests to the
checkpointer for no real purpose.
--
Michael

Attachment

Re: A few new options for CHECKPOINT

From
Fujii Masao
Date:

On 2020/11/25 13:47, Michael Paquier wrote:
> On Wed, Nov 25, 2020 at 01:07:47AM +0000, tsunakawa.takay@fujitsu.com wrote:
>> From: Bossart, Nathan <bossartn@amazon.com>
>>> It may be useful for backups taken with the "consistent snapshot"
>>> approach.  As noted in the documentation [0], running CHECKPOINT
>>> before taking the snapshot can reduce recovery time.  However, users
>>> might wish to avoid the IO spike caused by an immediate checkpoint.
>>>
>>> [0] https://www.postgresql.org/docs/devel/backup-file.html
>>
>> Ah, understood.  I agree that the slow or spread manual checkpoint is good to have.
> 
> I can see the use case for IMMEDIATE, but I fail to see the use cases
> for WAIT and FORCE.  CHECKPOINT_FORCE is internally implied for the
> end-of-recovery and shutdown checkpoints.  WAIT could be a dangerous
> thing if disabled, as clients could pile up requests to the
> checkpointer for no real purpose.

We may want to disable WAIT (or specify wait timeout?) when doing
checkpoint with IMMEDIATE disabled, to avoid long-running command.
OTOH, if we support WAIT disabled, I'd like to have the feature to see
whether the checkpoint has been completed or not. We can do that
by using log_checkpoints, but that's not convenient.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: A few new options for CHECKPOINT

From
Bernd Helmle
Date:
Am Mittwoch, den 25.11.2020, 13:47 +0900 schrieb Michael Paquier:
> I can see the use case for IMMEDIATE, but I fail to see the use cases
> for WAIT and FORCE.  CHECKPOINT_FORCE is internally implied for the
> end-of-recovery and shutdown checkpoints.  WAIT could be a dangerous
> thing if disabled, as clients could pile up requests to the
> checkpointer for no real purpose.

Wouldn't it be more convenient to use "FAST" for immediate checkpoint,
defaulting to "FAST ON"? That would be along the parameter used in the
streaming protocol command BASE_BACKUP, where "FAST" disables lazy
checkpointing.

I agree that the other options don't seem reasonable for exposing to
SQL.


-- 
Thanks,
    Bernd





Re: A few new options for CHECKPOINT

From
Laurenz Albe
Date:
On Wed, 2020-11-25 at 11:41 +0100, Bernd Helmle wrote:
> Am Mittwoch, den 25.11.2020, 13:47 +0900 schrieb Michael Paquier:
> 
> > I can see the use case for IMMEDIATE, but I fail to see the use cases
> > for WAIT and FORCE.  CHECKPOINT_FORCE is internally implied for the
> > end-of-recovery and shutdown checkpoints.  WAIT could be a dangerous
> > thing if disabled, as clients could pile up requests to the
> > checkpointer for no real purpose.
> 
> Wouldn't it be more convenient to use "FAST" for immediate checkpoint,
> defaulting to "FAST ON"? That would be along the parameter used in the
> streaming protocol command BASE_BACKUP, where "FAST" disables lazy
> checkpointing.

+1

That would also match pg_basebackup's "-c fast|spread".

Yours,
Laurenz Albe




Re: A few new options for CHECKPOINT

From
Stephen Frost
Date:
Greetings,

* Bossart, Nathan (bossartn@amazon.com) wrote:
> On 11/24/20, 4:03 PM, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote:
> > From: Bossart, Nathan <bossartn@amazon.com>
> >> The main purpose of this patch is to give users more control over their manually
> >> requested checkpoints or restartpoints.  I suspect the most useful option is
> >> IMMEDIATE, which can help avoid checkpoint- related IO spikes.  However, I
> >> didn't see any strong reason to prevent users from also adjusting FORCE and
> >> WAIT.
> >
> > I think just IMMEDIATE would suffice, too.  But could you tell us why you got to want to give users more control?
Couldwe know concrete example situations where users want to perform CHECKPOINT with options? 
>
> It may be useful for backups taken with the "consistent snapshot"
> approach.  As noted in the documentation [0], running CHECKPOINT
> before taking the snapshot can reduce recovery time.  However, users
> might wish to avoid the IO spike caused by an immediate checkpoint.

I'm a bit confused by the idea here..  The whole point of running a
CHECKPOINT is to get the immediate behavior with the IO spike to get
things flushed out to disk so that, on crash recovery, there's less
outstanding WAL to replay.

Avoiding the IO spike implies that you're waiting for a regular
checkpoint and that additional WAL is building up since that started and
therefore you're going to have to replay that WAL during crash recovery
and so you won't end up reducing your recovery time, so I'm failing to
see the point..?  I don't think you really get to have both..  pay the
price at backup time, or pay it during crash recovery.

Thanks,

Stephen

Attachment

Re: A few new options for CHECKPOINT

From
"Bossart, Nathan"
Date:
Thanks to all for the feedback.  I've attached v2 of the patch.  I've
removed the WAIT and FORCE options and renamed IMMEDIATE to FAST.

On 11/25/20, 7:52 AM, "Stephen Frost" <sfrost@snowman.net> wrote:
> I'm a bit confused by the idea here..  The whole point of running a
> CHECKPOINT is to get the immediate behavior with the IO spike to get
> things flushed out to disk so that, on crash recovery, there's less
> outstanding WAL to replay.
>
> Avoiding the IO spike implies that you're waiting for a regular
> checkpoint and that additional WAL is building up since that started and
> therefore you're going to have to replay that WAL during crash recovery
> and so you won't end up reducing your recovery time, so I'm failing to
> see the point..?  I don't think you really get to have both..  pay the
> price at backup time, or pay it during crash recovery.

It's true that you might not lower recovery time as much as if you did
an immediate checkpoint, but presumably there can still be some
benefit from doing a non-immediate checkpoint.  I think a similar
argument can be made for pg_start_backup(), which AFAICT is presently
the only way to manually request a non-immediate checkpoint.

Nathan


Attachment

Re: A few new options for CHECKPOINT

From
Stephen Frost
Date:
Greetings,

* Bossart, Nathan (bossartn@amazon.com) wrote:
> Thanks to all for the feedback.  I've attached v2 of the patch.  I've
> removed the WAIT and FORCE options and renamed IMMEDIATE to FAST.
>
> On 11/25/20, 7:52 AM, "Stephen Frost" <sfrost@snowman.net> wrote:
> > I'm a bit confused by the idea here..  The whole point of running a
> > CHECKPOINT is to get the immediate behavior with the IO spike to get
> > things flushed out to disk so that, on crash recovery, there's less
> > outstanding WAL to replay.
> >
> > Avoiding the IO spike implies that you're waiting for a regular
> > checkpoint and that additional WAL is building up since that started and
> > therefore you're going to have to replay that WAL during crash recovery
> > and so you won't end up reducing your recovery time, so I'm failing to
> > see the point..?  I don't think you really get to have both..  pay the
> > price at backup time, or pay it during crash recovery.
>
> It's true that you might not lower recovery time as much as if you did
> an immediate checkpoint, but presumably there can still be some
> benefit from doing a non-immediate checkpoint.  I think a similar
> argument can be made for pg_start_backup(), which AFAICT is presently
> the only way to manually request a non-immediate checkpoint.

If there isn't any actual outstanding WAL since the last checkpoint, the
only time that the fact that pg_start_backup includes CHECKPOINT_FORCE
is relevant, then performing a checkpoint isn't going to actually reduce
your crash recovery.

Also note that, in all other cases (that is, when there *is* outstanding
WAL since the last checkpoint), pg_start_backup actually just waits for
the existing checkpoint to complete- and while it's waiting for that to
happen, there'll be additional WAL building up since that checkpoint
started that will have to be replayed as part of crash recovery, just as
if you took a snapshot of the system at any other time.

So, either there won't be any WAL outstanding, in which case running a
CHECKPOINT FORCE just ends up creating more WAL without actually being
useful, or there's WAL outstanding and the only thing this does is delay
the snapshot being taken but doesn't actually reduce the amount of WAL
that's going to end up being outstanding and which will have to be
replayed during crash recovery.

Maybe there's a useful reason to have these options, but at least the
stated one isn't it and I wouldn't want to encourage people who are
using snapshot-based backups to use these options since they aren't
going to work the way that this thread is implying they would.

Thanks,

Stephen

Attachment

Re: A few new options for CHECKPOINT

From
"Bossart, Nathan"
Date:
On 11/27/20, 8:29 AM, "Stephen Frost" <sfrost@snowman.net> wrote:
> Also note that, in all other cases (that is, when there *is* outstanding
> WAL since the last checkpoint), pg_start_backup actually just waits for
> the existing checkpoint to complete- and while it's waiting for that to
> happen, there'll be additional WAL building up since that checkpoint
> started that will have to be replayed as part of crash recovery, just as
> if you took a snapshot of the system at any other time.
>
> So, either there won't be any WAL outstanding, in which case running a
> CHECKPOINT FORCE just ends up creating more WAL without actually being
> useful, or there's WAL outstanding and the only thing this does is delay
> the snapshot being taken but doesn't actually reduce the amount of WAL
> that's going to end up being outstanding and which will have to be
> replayed during crash recovery.
>
> Maybe there's a useful reason to have these options, but at least the
> stated one isn't it and I wouldn't want to encourage people who are
> using snapshot-based backups to use these options since they aren't
> going to work the way that this thread is implying they would.

I don't think it's true that pg_start_backup() just waits for the
existing checkpoint to complete.  It calls RequestCheckpoint() with
CHECKPOINT_WAIT, which should wait for a new checkpoint to start.

                /* Wait for a new checkpoint to start. */
                ConditionVariablePrepareToSleep(&CheckpointerShmem->start_cv);
                for (;;)
                {

I also tried running pg_start_backup() while an automatic checkpoint
was ongoing, and it seemed to create a new one.

        psql session:
                postgres=# SELECT now(); SELECT pg_start_backup('test'); SELECT now();
                now
                -------------------------------
                2020-11-27 16:52:31.958124+00
                (1 row)

                pg_start_backup
                -----------------
                0/D3D24F0
                (1 row)

                now
                -------------------------------
                2020-11-27 16:52:50.113372+00
                (1 row)

        logs:
                2020-11-27 16:52:20.129 UTC [16029] LOG:  checkpoint starting: time
                2020-11-27 16:52:35.121 UTC [16029] LOG:  checkpoint complete...
                2020-11-27 16:52:35.122 UTC [16029] LOG:  checkpoint starting: force wait
                2020-11-27 16:52:50.110 UTC [16029] LOG:  checkpoint complete...

The patch I've submitted does the same thing.

        psql session:
                postgres=# SELECT now(); CHECKPOINT (FAST FALSE); SELECT now();
                now
                -------------------------------
                2020-11-27 16:46:39.346131+00
                (1 row)

                CHECKPOINT
                now
                -------------------------------
                2020-11-27 16:47:05.083944+00
                (1 row)

        logs:
                2020-11-27 16:46:35.056 UTC [16029] LOG:  checkpoint starting: time
                2020-11-27 16:46:50.099 UTC [16029] LOG:  checkpoint complete...
                2020-11-27 16:46:50.099 UTC [16029] LOG:  checkpoint starting: force wait
                2020-11-27 16:47:05.083 UTC [16029] LOG:  checkpoint complete...

Even if it did simply wait for the existing checkpoint to complete,
isn't it still preferable to take a snapshot right after a checkpoint
completes, even if it is non-immediate?  You'll need to replay WAL in
either case, and it's true that you could need to replay less WAL if
you take an immediate checkpoint versus a non-immediate checkpoint.
However, if you take a snapshot without a checkpoint, you might need
to replay up to checkpoint_timeout + (time it takes for a non-
immediate checkpoint to complete) worth of WAL.  For the logs just
above this paragraph, if I take a snapshot at 16:47:04, I'd need to
replay 29 seconds of WAL.  However, if I take the snapshot at
16:47:06, I only need to replay 16 seconds of WAL.

I apologize if I'm missing something obvious here.

Nathan


Re: A few new options for CHECKPOINT

From
Stephen Frost
Date:
Greetings,

* Bossart, Nathan (bossartn@amazon.com) wrote:
> On 11/27/20, 8:29 AM, "Stephen Frost" <sfrost@snowman.net> wrote:
> > Also note that, in all other cases (that is, when there *is* outstanding
> > WAL since the last checkpoint), pg_start_backup actually just waits for
> > the existing checkpoint to complete- and while it's waiting for that to
> > happen, there'll be additional WAL building up since that checkpoint
> > started that will have to be replayed as part of crash recovery, just as
> > if you took a snapshot of the system at any other time.
> >
> > So, either there won't be any WAL outstanding, in which case running a
> > CHECKPOINT FORCE just ends up creating more WAL without actually being
> > useful, or there's WAL outstanding and the only thing this does is delay
> > the snapshot being taken but doesn't actually reduce the amount of WAL
> > that's going to end up being outstanding and which will have to be
> > replayed during crash recovery.
> >
> > Maybe there's a useful reason to have these options, but at least the
> > stated one isn't it and I wouldn't want to encourage people who are
> > using snapshot-based backups to use these options since they aren't
> > going to work the way that this thread is implying they would.
>
> I don't think it's true that pg_start_backup() just waits for the
> existing checkpoint to complete.  It calls RequestCheckpoint() with
> CHECKPOINT_WAIT, which should wait for a new checkpoint to start.

erm... right?  pg_start_backup waits for a new checkpoint to start.  I'm
confused how that's different from "waits for the existing checkpoint to
complete".  The entire point is that it *doesn't* force a checkpoint to
happen immediately.

>                 /* Wait for a new checkpoint to start. */
>                 ConditionVariablePrepareToSleep(&CheckpointerShmem->start_cv);
>                 for (;;)
>                 {
>
> I also tried running pg_start_backup() while an automatic checkpoint
> was ongoing, and it seemed to create a new one.
>
>         psql session:
>                 postgres=# SELECT now(); SELECT pg_start_backup('test'); SELECT now();
>                 now
>                 -------------------------------
>                 2020-11-27 16:52:31.958124+00
>                 (1 row)
>
>                 pg_start_backup
>                 -----------------
>                 0/D3D24F0
>                 (1 row)
>
>                 now
>                 -------------------------------
>                 2020-11-27 16:52:50.113372+00
>                 (1 row)
>
>         logs:
>                 2020-11-27 16:52:20.129 UTC [16029] LOG:  checkpoint starting: time
>                 2020-11-27 16:52:35.121 UTC [16029] LOG:  checkpoint complete...
>                 2020-11-27 16:52:35.122 UTC [16029] LOG:  checkpoint starting: force wait
>                 2020-11-27 16:52:50.110 UTC [16029] LOG:  checkpoint complete...

Yes- pg_start_backup specifies 'force' because it wants a checkpoint to
happen even if there isn't any outstanding WAL, but it doesn't make the
existing checkpoint go faster, which is the point that I'm trying to
make here.

If you'd really like to test and see what happens, run a pgbench that
loads the system up while doing pg_start_backup and see how long it
takes before pg_start_backup returns, and see how much outstanding WAL
there is from the starting checkpoint from pg_start_backup and the time
it returns.  To make it really clear, you should really also set
checkpoint completion timeout to 0.9 and make sure you max_wal_size is
set to a pretty large value, and make sure that the pgbench is
generating enough to have pretty large checkpoints while still allowing
them to happen due to 'time'.

> The patch I've submitted does the same thing.

Yes, I'm not surprised by that.  That doesn't change anything about the
point I'm trying to make..

> Even if it did simply wait for the existing checkpoint to complete,
> isn't it still preferable to take a snapshot right after a checkpoint
> completes, even if it is non-immediate?  You'll need to replay WAL in
> either case, and it's true that you could need to replay less WAL if
> you take an immediate checkpoint versus a non-immediate checkpoint.

Why is it preferable?  Your argument here is that it's preferable
because there'll be less outstanding WAL, but what I'm pointing out is
that that's not the case, so that isn't a reason for it to be preferable
and so I'm asking: what other reason is it preferable..?  I'm not aware
of one..

> However, if you take a snapshot without a checkpoint, you might need
> to replay up to checkpoint_timeout + (time it takes for a non-
> immediate checkpoint to complete) worth of WAL.  For the logs just
> above this paragraph, if I take a snapshot at 16:47:04, I'd need to
> replay 29 seconds of WAL.  However, if I take the snapshot at
> 16:47:06, I only need to replay 16 seconds of WAL.

This is exactly the point I'm making- if you're using a deferred
checkpoint then you're still going to have up to checkpoint_timeout
worth of WAL to replay.  Again, these tests aren't really worth much
because the system clearly isn't under any load..

> I apologize if I'm missing something obvious here.

If you'd like to show that I'm wrong, and it's entirely possible that I
am, then retry the above with actual load on the system, and also
actually look at how much outstanding WAL you end up with given the
different scenarios which has to be replayed during crash recovery.

Thanks,

Stephen

Attachment

Re: A few new options for CHECKPOINT

From
"Bossart, Nathan"
Date:
On 11/27/20, 10:58 AM, "Stephen Frost" <sfrost@snowman.net> wrote:
> If you'd like to show that I'm wrong, and it's entirely possible that I
> am, then retry the above with actual load on the system, and also
> actually look at how much outstanding WAL you end up with given the
> different scenarios which has to be replayed during crash recovery.

I did a little experiment to show the behavior I'm referring to.  I
used these settings:

        checkpoint_completion_target = 0.9
        checkpoint_timeout = 30s
        max_wal_size = 20GB
        WAL segment size is 64MB

I ran the following pgbench command for a few minutes before each
test:

        pgbench postgres -T 3600 -c 64 -j 64 -N

For the first test, I killed Postgres just before an automatic, non-
immediate checkpoint completed.

        2020-11-28 00:31:57 UTC::@:[51770]:LOG:  checkpoint complete...
        2020-11-28 00:32:00 UTC::@:[51770]:LOG:  checkpoint starting: time

        Killed Postgres at 00:32:26 UTC, 29 seconds after latest
        checkpoint completed.

        2020-11-28 00:32:42 UTC::@:[77256]:LOG:  redo starts at 3CF/FD6B8BD0
        2020-11-28 00:32:56 UTC::@:[77256]:LOG:  redo done at 3D0/C94D1D00

        Recovery took 14 seconds and replayed ~3.2 GB of WAL.

        postgres=> SELECT pg_wal_lsn_diff('3D0/C94D1D00', '3CF/FD6B8BD0');
         pg_wal_lsn_diff
        -----------------
              3420557616
        (1 row)

For the second test, I killed Postgres just after an automatic, non-
immediate checkpoint completed.

        2020-11-28 00:41:26 UTC::@:[77475]:LOG:  checkpoint complete...

        Killed Postgres at 00:41:26 UTC, just after latest checkpoint
        completed.

        2020-11-28 00:41:42 UTC::@:[8599]:LOG:  redo starts at 3D3/152EDD78
        2020-11-28 00:41:49 UTC::@:[8599]:LOG:  redo done at 3D3/78358A40

        Recovery took 7 seconds and replayed ~1.5 GB of WAL.

        postgres=> SELECT pg_wal_lsn_diff('3D3/78358A40', '3D3/152EDD78');
         pg_wal_lsn_diff
        -----------------
              1661381832
        (1 row)

Granted, I used a rather aggressive checkpoint_timeout, but I think
this demonstrates that waiting for a non-immediate checkpoint to
complete can lower the amount of WAL needed for recovery, even though
it might not lower it as much as waiting for an immediate checkpoint
would.

Nathan


Re: A few new options for CHECKPOINT

From
Stephen Frost
Date:
Greetings,

* Bossart, Nathan (bossartn@amazon.com) wrote:
> On 11/27/20, 10:58 AM, "Stephen Frost" <sfrost@snowman.net> wrote:
> > If you'd like to show that I'm wrong, and it's entirely possible that I
> > am, then retry the above with actual load on the system, and also
> > actually look at how much outstanding WAL you end up with given the
> > different scenarios which has to be replayed during crash recovery.
>
> I did a little experiment to show the behavior I'm referring to.  I

I'm rather confused why you didn't use your patch to show the actual
behavior that you'll get as a result of this change..?  That seems like
what would be meaningful here.  I appreciate that the analysis you did
might correlate but I don't really get why we'd try to use a proxy for
this.

> used these settings:
>
>         checkpoint_completion_target = 0.9
>         checkpoint_timeout = 30s
>         max_wal_size = 20GB
>         WAL segment size is 64MB

That's an exceedingly short and very uncommon checkpoint timeout in my
experience..  If anything, folks increase checkpoint timeout from the
default (in order to reduce WAL traffic and because they have a replica
they can flip to in the event of a crash, avoiding having to go through
WAL replay on recovery) and so I'm not really sure that it's a sensible
thing to look at?  Even so though...

>         Recovery took 14 seconds and replayed ~3.2 GB of WAL.

[ ... ]

>         Recovery took 7 seconds and replayed ~1.5 GB of WAL.

This is showing more-or-less what I expected: there's still a large
amount of outstanding WAL, even if you use a very low and unusual
timeout and attempt to time it perfectly.  A question that is still not
clear is what happens when you actually do an immediate checkpoint-
there would likely still be some outstanding WAL even in that case but
I'd expect it to be a whole lot less, which is why that comment exists
in the documentation in the first place.

> Granted, I used a rather aggressive checkpoint_timeout, but I think
> this demonstrates that waiting for a non-immediate checkpoint to
> complete can lower the amount of WAL needed for recovery, even though
> it might not lower it as much as waiting for an immediate checkpoint
> would.

The difference here feels like order of magnitudes to me, between an
immediate checkpoint and a non-immediate one, vs. a much smaller
difference as you've shown here (though, still, kill'ing the postmaster
isn't exactly the same as what your patch would be doing, so I don't
really like using this particular analysis to answer this question...).

If the use-case here was just that you wanted to add more options to the
CHECKPOINT command because we have them internally and maybe they'd be
useful to expose then these things probably wouldn't matter so much, but
to imply that this is really going to cut down on the amount of WAL
replay required a lot isn't really coming through even with these
results.

Thanks,

Stephen

Attachment

Re: A few new options for CHECKPOINT

From
"Bossart, Nathan"
Date:
On 11/28/20, 9:50 AM, "Stephen Frost" <sfrost@snowman.net> wrote:
>> Granted, I used a rather aggressive checkpoint_timeout, but I think
>> this demonstrates that waiting for a non-immediate checkpoint to
>> complete can lower the amount of WAL needed for recovery, even though
>> it might not lower it as much as waiting for an immediate checkpoint
>> would.
>
> The difference here feels like order of magnitudes to me, between an
> immediate checkpoint and a non-immediate one, vs. a much smaller
> difference as you've shown here (though, still, kill'ing the postmaster
> isn't exactly the same as what your patch would be doing, so I don't
> really like using this particular analysis to answer this question...).

I agree that using an immediate checkpoint is the best for reducing
recovery time.  IMO reducing the amount of WAL to recover by up to 50%
from doing no checkpoint at all is also a reasonable use case,
especially if avoiding an IO spike is important.

> If the use-case here was just that you wanted to add more options to the
> CHECKPOINT command because we have them internally and maybe they'd be
> useful to expose then these things probably wouldn't matter so much, but
> to imply that this is really going to cut down on the amount of WAL
> replay required a lot isn't really coming through even with these
> results.

This is how I initially presented this patch.  I only included this
use case because I was asked for concrete examples of how it might be
useful.

Nathan


Re: A few new options for CHECKPOINT

From
Stephen Frost
Date:
Greetings,

* Bossart, Nathan (bossartn@amazon.com) wrote:
> On 11/28/20, 9:50 AM, "Stephen Frost" <sfrost@snowman.net> wrote:
> >> Granted, I used a rather aggressive checkpoint_timeout, but I think
> >> this demonstrates that waiting for a non-immediate checkpoint to
> >> complete can lower the amount of WAL needed for recovery, even though
> >> it might not lower it as much as waiting for an immediate checkpoint
> >> would.
> >
> > The difference here feels like order of magnitudes to me, between an
> > immediate checkpoint and a non-immediate one, vs. a much smaller
> > difference as you've shown here (though, still, kill'ing the postmaster
> > isn't exactly the same as what your patch would be doing, so I don't
> > really like using this particular analysis to answer this question...).
>
> I agree that using an immediate checkpoint is the best for reducing
> recovery time.  IMO reducing the amount of WAL to recover by up to 50%
> from doing no checkpoint at all is also a reasonable use case,
> especially if avoiding an IO spike is important.

Checkpoints are always happening though, that's kind of my point..?
Sure, you get lucky sometimes that the time you snapshot might have less
outstanding WAL than at some other time, but I'm not convinced that this
patch is really going to give a given user that much reduced amount of
WAL that has to be replayed more than just randomly timing the
snapshot, and if it's not clearly better, always, then I don't think we
can reasonably document it as such or imply that this is how folks
should implement snapshot-based backups.

> > If the use-case here was just that you wanted to add more options to the
> > CHECKPOINT command because we have them internally and maybe they'd be
> > useful to expose then these things probably wouldn't matter so much, but
> > to imply that this is really going to cut down on the amount of WAL
> > replay required a lot isn't really coming through even with these
> > results.
>
> This is how I initially presented this patch.  I only included this
> use case because I was asked for concrete examples of how it might be
> useful.

Did you have other concrete examples that we could reference as to when
it would be useful to use these options?

Thanks,

Stephen

Attachment

Re: A few new options for CHECKPOINT

From
"Bossart, Nathan"
Date:
On 11/29/20, 7:21 PM, "Stephen Frost" <sfrost@snowman.net> wrote:
> Checkpoints are always happening though, that's kind of my point..?
> Sure, you get lucky sometimes that the time you snapshot might have less
> outstanding WAL than at some other time, but I'm not convinced that this
> patch is really going to give a given user that much reduced amount of
> WAL that has to be replayed more than just randomly timing the
> snapshot, and if it's not clearly better, always, then I don't think we
> can reasonably document it as such or imply that this is how folks
> should implement snapshot-based backups.

I see your point.  Using a non-immediate checkpoint might help you
keep your recovery time slightly more consistent, but you're right
that it's likely not going to be a dramatic improvement.  Your
recovery time will be 1 minute versus 1-2 minutes or 2 minutes versus
2-4 minutes, not 3 seconds versus 5 minutes.

> Did you have other concrete examples that we could reference as to when
> it would be useful to use these options?

I don't have any at the moment.  I figured that if we're going to
allow users to manually trigger checkpoints, we might as well allow
them to configure it to avoid things like IO spikes.

Nathan


Re: A few new options for CHECKPOINT

From
Alvaro Herrera
Date:
On the UI of this patch, you're proposing to add the option FAST.  I'm
not a fan of this option name and propose that (if we have it) we use
the name SPREAD instead (defaults to false).

Now we don't actually explain the term "spread" much in the documentation;
we just say "the writes are spread".  But it seems more natural to build
on that adjective rather than "fast/slow".


I think starting a spread checkpoint has some usefulness, if your
checkpoint interval is very large but your completion target is not very
close to 1.  In that case, you're expressing that you want a checkpoint
to start now and not impact production unduly, so that you know when it
finishes and therefore when is it a good time to start a backup.  (You
will still have some WAL to replay, but it won't be as much as if you
just ignored checkpoint considerations completely.)


On the subject of measuring replay times for backups taking while
pgbench is pounding the database, I think a realistic test does *not*
have pgbench running at top speed; rather you have some non-maximal
"-R xyz" option.  You would probably determine a value to use by running
without -R, observing what's a typical transaction rate, and using some
fraction (say, half) of that in the real run.



Re: A few new options for CHECKPOINT

From
Stephen Frost
Date:
Greetings,

* Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:
> I think starting a spread checkpoint has some usefulness, if your
> checkpoint interval is very large but your completion target is not very
> close to 1.  In that case, you're expressing that you want a checkpoint
> to start now and not impact production unduly, so that you know when it
> finishes and therefore when is it a good time to start a backup.  (You
> will still have some WAL to replay, but it won't be as much as if you
> just ignored checkpoint considerations completely.)

You could view an immediate checkpoint as more-or-less being a 'spread'
checkpoint with a checkpoint completion target approaching 0.  In the
end, it's all about how much time you're going to spend trying to get
the data written out, because the WAL that's generated during that time
is what's going to have to get replayed.

If the goal is to not end up with an increase in IO from this, then you
want to spread things out as much as you can over as much time as you're
able to- but that then means that you're going to have that much WAL to
replay.  If you're alright with performing IO to get the amount of WAL
to replay to be minimal, then you just run 'CHECKPOINT;' before your
backup and you're good to go (and is why that's in the documentation as
a way to reduce your WAL replay time- because it reduces it as much as
possible given your IO capabilities).

If you don't mind the increased amount of IO and WAL, you could just
reduce checkpoint_timeout and then crash recovery and snapshot-based
backup recovery will also be reduced, no matter when you actually take
the snapshot.

> On the subject of measuring replay times for backups taking while
> pgbench is pounding the database, I think a realistic test does *not*
> have pgbench running at top speed; rather you have some non-maximal
> "-R xyz" option.  You would probably determine a value to use by running
> without -R, observing what's a typical transaction rate, and using some
> fraction (say, half) of that in the real run.

That'd halve the amount of WAL being generated per unit time, but I
don't think it really changes much when it comes to this particular
analysis..?

If you generate 16MB of WAL per minute, and the checkpoint timeout is 5
minutes, with a checkpoint target of 0.9, then at more-or-less any point
in time you've got ~5 minutes worth of WAL outstanding, or around 80MB.
If your completion target is 0.5 then, really, you might as well make it
0.9 and have your timeout be 2.5m, so that you've got a steady-state of
around 40MB of WAL outstanding.

What I'm getting around to is that the only place this kind of thing
makes sense is where you're front-loading all your IO during the
checkpoint because your checkpoint completion target is less than 0.9
and then, sure, there's a difference between snapshotting right when the
checkpoint completes vs. later- because if you wait around to snapshot,
we aren't actually doing IO during that time and just letting the WAL
build up, but that's an argument to remove checkpoint completion target
as an option that doesn't really make much sense in the first place,
imv, and recommend folks tune checkpoint timeout for the amount of
outstanding WAL they want to have when they are doing recovery (either
from a crash or from a snapshot).

Thanks,

Stephen

Attachment

Re: A few new options for CHECKPOINT

From
"Bossart, Nathan"
Date:
On 12/4/20, 1:47 PM, "Alvaro Herrera" <alvherre@alvh.no-ip.org> wrote:
> On the UI of this patch, you're proposing to add the option FAST.  I'm
> not a fan of this option name and propose that (if we have it) we use
> the name SPREAD instead (defaults to false).
>
> Now we don't actually explain the term "spread" much in the documentation;
> we just say "the writes are spread".  But it seems more natural to build
> on that adjective rather than "fast/slow".

Here is a version of the patch that uses SPREAD instead of FAST.

Nathan


Attachment

Re: A few new options for CHECKPOINT

From
Alvaro Herrera
Date:
On 2020-Dec-04, Bossart, Nathan wrote:

> On 12/4/20, 1:47 PM, "Alvaro Herrera" <alvherre@alvh.no-ip.org> wrote:
> > On the UI of this patch, you're proposing to add the option FAST.  I'm
> > not a fan of this option name and propose that (if we have it) we use
> > the name SPREAD instead (defaults to false).
> >
> > Now we don't actually explain the term "spread" much in the documentation;
> > we just say "the writes are spread".  But it seems more natural to build
> > on that adjective rather than "fast/slow".
> 
> Here is a version of the patch that uses SPREAD instead of FAST.

WFM.

Instead of adding checkpt_option_list, how about utility_option_list?
It seems intended for reuse.



Re: A few new options for CHECKPOINT

From
"Bossart, Nathan"
Date:
On 12/4/20, 3:33 PM, "Alvaro Herrera" <alvherre@alvh.no-ip.org> wrote:
> Instead of adding checkpt_option_list, how about utility_option_list?
> It seems intended for reuse.

Ah, good call.  That simplifies the grammar changes quite a bit.

Nathan


Attachment

Re: A few new options for CHECKPOINT

From
Michael Paquier
Date:
On Sat, Dec 05, 2020 at 12:11:13AM +0000, Bossart, Nathan wrote:
> On 12/4/20, 3:33 PM, "Alvaro Herrera" <alvherre@alvh.no-ip.org> wrote:
>> Instead of adding checkpt_option_list, how about utility_option_list?
>> It seems intended for reuse.

+1.  It is intended for reuse.

> Ah, good call.  That simplifies the grammar changes quite a bit.

+CHECKPOINT;
+CHECKPOINT (SPREAD);
+CHECKPOINT (SPREAD FALSE);
+CHECKPOINT (SPREAD ON);
+CHECKPOINT (SPREAD 0);
+CHECKPOINT (SPREAD 2);
+ERROR:  spread requires a Boolean value
+CHECKPOINT (NONEXISTENT);
+ERROR:  unrecognized CHECKPOINT option "nonexistent"
+LINE 1: CHECKPOINT (NONEXISTENT);
Testing for negative cases like those two last ones is fine by me, but
I don't like much the idea of running 5 checkpoints as part of the
main regression test suite (think installcheck with a large shared
buffer pool for example).

--- a/src/include/postmaster/bgwriter.h
+++ b/src/include/postmaster/bgwriter.h
@@ -15,6 +15,8 @@
 #ifndef _BGWRITER_H
 #define _BGWRITER_H

+#include "nodes/parsenodes.h"
+#include "parser/parse_node.h"
I don't think you need to include parsenodes.h here.

+void
+ExecCheckPointStmt(ParseState *pstate, CheckPointStmt *stmt)
+{
Nit: perhaps this could just be ExecCheckPoint()?  See the existing
ExecVacuum().

+   flags = CHECKPOINT_WAIT |
+           (RecoveryInProgress() ? 0 : CHECKPOINT_FORCE) |
+           (spread ? 0 : CHECKPOINT_IMMEDIATE);
The handling done for CHECKPOINT_FORCE and CHECKPOINT_WAIT deserve
a comment.
--
Michael

Attachment

Re: A few new options for CHECKPOINT

From
"Bossart, Nathan"
Date:
Thanks for reviewing.

On 12/4/20, 5:44 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> On Sat, Dec 05, 2020 at 12:11:13AM +0000, Bossart, Nathan wrote:
>> On 12/4/20, 3:33 PM, "Alvaro Herrera" <alvherre@alvh.no-ip.org> wrote:
>>> Instead of adding checkpt_option_list, how about utility_option_list?
>>> It seems intended for reuse.
>
> +1.  It is intended for reuse.
>
>> Ah, good call.  That simplifies the grammar changes quite a bit.
>
> +CHECKPOINT;
> +CHECKPOINT (SPREAD);
> +CHECKPOINT (SPREAD FALSE);
> +CHECKPOINT (SPREAD ON);
> +CHECKPOINT (SPREAD 0);
> +CHECKPOINT (SPREAD 2);
> +ERROR:  spread requires a Boolean value
> +CHECKPOINT (NONEXISTENT);
> +ERROR:  unrecognized CHECKPOINT option "nonexistent"
> +LINE 1: CHECKPOINT (NONEXISTENT);
> Testing for negative cases like those two last ones is fine by me, but
> I don't like much the idea of running 5 checkpoints as part of the
> main regression test suite (think installcheck with a large shared
> buffer pool for example).
>
> --- a/src/include/postmaster/bgwriter.h
> +++ b/src/include/postmaster/bgwriter.h
> @@ -15,6 +15,8 @@
>  #ifndef _BGWRITER_H
>  #define _BGWRITER_H
>
> +#include "nodes/parsenodes.h"
> +#include "parser/parse_node.h"
> I don't think you need to include parsenodes.h here.
>
> +void
> +ExecCheckPointStmt(ParseState *pstate, CheckPointStmt *stmt)
> +{
> Nit: perhaps this could just be ExecCheckPoint()?  See the existing
> ExecVacuum().
>
> +   flags = CHECKPOINT_WAIT |
> +           (RecoveryInProgress() ? 0 : CHECKPOINT_FORCE) |
> +           (spread ? 0 : CHECKPOINT_IMMEDIATE);
> The handling done for CHECKPOINT_FORCE and CHECKPOINT_WAIT deserve
> a comment.

This all seems reasonable to me.  I've attached a new version of the
patch.

Nathan


Attachment

Re: A few new options for CHECKPOINT

From
Stephen Frost
Date:
Greetings,

* Bossart, Nathan (bossartn@amazon.com) wrote:
> This all seems reasonable to me.  I've attached a new version of the
> patch.

diff --git a/doc/src/sgml/ref/checkpoint.sgml b/doc/src/sgml/ref/checkpoint.sgml
index 2afee6d7b5..2b1e56fbd7 100644
--- a/doc/src/sgml/ref/checkpoint.sgml
+++ b/doc/src/sgml/ref/checkpoint.sgml
+  <para>
+   Note that the server may consolidate concurrently requested checkpoints or
+   restartpoints.  Such consolidated requests will contain a combined set of
+   options.  For example, if one session requested a spread checkpoint and
+   another session requested a fast checkpoint, the server may combine these
+   requests and perform one fast checkpoint.
+  </para>

[ ... ]

+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><literal>SPREAD</literal></term>
+    <listitem>
+     <para>
+      Specifies that checkpoint activity should be throttled based on the
+      setting for the <xref linkend="guc-checkpoint-completion-target"/>
+      parameter.  If the option is turned off, <command>CHECKPOINT</command>
+      creates a checkpoint or restartpoint as fast as possible.  By default,
+      <literal>SPREAD</literal> is turned off, and the checkpoint activity is
+      not throttled.
+     </para>
+    </listitem>
+   </varlistentry>

So- just to be clear, CHECKPOINTs are more-or-less always happening in
PG, and running this command might do something or might end up doing
nothing depending on if a checkpoint is already in progress and this
request just gets consolidated into an existing one, and it won't
actually reduce the amount of WAL replay except in the case where
checkpoint completion target is set to make a checkpoint happen in less
time than checkpoint timeout, which ultimately isn't a great way to run
the system anyway.

Assuming we actually want to do this, which I still generally don't
agree with since it isn't really clear if it'll actually end up doing
something, or not, wouldn't it make more sense to have a command that
just sits and waits for the currently running (or next) checkpoint to
complete..?  For the use-case that was explained, at least, we don't
actually need to cause another checkpoint to happen, we just want to
know when a checkpoint has completed, right?

Is there some other reason for this that isn't being explained..?

Thanks,

Stephen

Attachment

Re: A few new options for CHECKPOINT

From
"Bossart, Nathan"
Date:
On 12/5/20, 6:41 AM, "Stephen Frost" <sfrost@snowman.net> wrote:
> Assuming we actually want to do this, which I still generally don't
> agree with since it isn't really clear if it'll actually end up doing
> something, or not, wouldn't it make more sense to have a command that
> just sits and waits for the currently running (or next) checkpoint to
> complete..?  For the use-case that was explained, at least, we don't
> actually need to cause another checkpoint to happen, we just want to
> know when a checkpoint has completed, right?

If it's enough to just wait for the current checkpoint to complete or
to wait for the next one to complete, I suppose you could just poll
pg_control_checkpoint().  I think the only downside is that you could
end up sitting idle for a while, especially if checkpoint_timeout is
high and checkpoint_completion_target is low.  But, as you point out,
that may not be a typically recommended way to configure the system.

Nathan


Re: A few new options for CHECKPOINT

From
Stephen Frost
Date:
Greetings,

* Bossart, Nathan (bossartn@amazon.com) wrote:
> On 12/5/20, 6:41 AM, "Stephen Frost" <sfrost@snowman.net> wrote:
> > Assuming we actually want to do this, which I still generally don't
> > agree with since it isn't really clear if it'll actually end up doing
> > something, or not, wouldn't it make more sense to have a command that
> > just sits and waits for the currently running (or next) checkpoint to
> > complete..?  For the use-case that was explained, at least, we don't
> > actually need to cause another checkpoint to happen, we just want to
> > know when a checkpoint has completed, right?
>
> If it's enough to just wait for the current checkpoint to complete or
> to wait for the next one to complete, I suppose you could just poll
> pg_control_checkpoint().  I think the only downside is that you could
> end up sitting idle for a while, especially if checkpoint_timeout is
> high and checkpoint_completion_target is low.  But, as you point out,
> that may not be a typically recommended way to configure the system.

Maybe I missed something, but aren't you going to be waiting a while
with this patch given that it's asking for a spread checkpoint too..?

I agree that you could just monitor for the next checkpoint using
pg_control_checkpoint(), which is why I'm wondering again what the
point is behind this patch...  I'm trying to understand why we'd be
encouraging people to increase the number of checkpoints that are
happening when they're still going to be waiting around for that spread
(in other words, non-immediate) checkpoint to happen (just as if they'd
just waited until the next regular checkpoint), and they're still going
to have a fair bit of WAL to replay because it'll be however much WAL
has been written since we started the spread/non-immediate checkpoint.

Thanks,

Stephen

Attachment

Re: A few new options for CHECKPOINT

From
Alvaro Herrera
Date:
On 2020-Dec-05, Stephen Frost wrote:

> So- just to be clear, CHECKPOINTs are more-or-less always happening in
> PG, and running this command might do something or might end up doing
> nothing depending on if a checkpoint is already in progress and this
> request just gets consolidated into an existing one, and it won't
> actually reduce the amount of WAL replay except in the case where
> checkpoint completion target is set to make a checkpoint happen in less
> time than checkpoint timeout, which ultimately isn't a great way to run
> the system anyway.

You keep making this statement, and I don't necessarily disagree, but if
that is the case, please explain why don't we have
checkpoint_completion_target set to 0.9 by default?  Should we change
that?

> Assuming we actually want to do this, which I still generally don't
> agree with since it isn't really clear if it'll actually end up doing
> something, or not, wouldn't it make more sense to have a command that
> just sits and waits for the currently running (or next) checkpoint to
> complete..?  For the use-case that was explained, at least, we don't
> actually need to cause another checkpoint to happen, we just want to
> know when a checkpoint has completed, right?

Yes, I agree that the use case for this is unclear.



Re: A few new options for CHECKPOINT

From
Stephen Frost
Date:
Greetings,

* Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:
> On 2020-Dec-05, Stephen Frost wrote:
> > So- just to be clear, CHECKPOINTs are more-or-less always happening in
> > PG, and running this command might do something or might end up doing
> > nothing depending on if a checkpoint is already in progress and this
> > request just gets consolidated into an existing one, and it won't
> > actually reduce the amount of WAL replay except in the case where
> > checkpoint completion target is set to make a checkpoint happen in less
> > time than checkpoint timeout, which ultimately isn't a great way to run
> > the system anyway.
>
> You keep making this statement, and I don't necessarily disagree, but if
> that is the case, please explain why don't we have
> checkpoint_completion_target set to 0.9 by default?  Should we change
> that?

Yes, I do think we should change that..  In fact, I'd argue that we can
probably get rid of checkpoint_completion_target entirely as an option.
The main argument against that is that it could be annoying for people
upgrading, but changing the default to 0.9 would definitely be an
improvement.

Thanks,

Stephen

Attachment

Re: A few new options for CHECKPOINT

From
Michael Paquier
Date:
On Sun, Dec 06, 2020 at 10:03:08AM -0500, Stephen Frost wrote:
> * Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:
>> You keep making this statement, and I don't necessarily disagree, but if
>> that is the case, please explain why don't we have
>> checkpoint_completion_target set to 0.9 by default?  Should we change
>> that?
>
> Yes, I do think we should change that..

Agreed.  FWIW, no idea for others, but it is one of those parameters I
keep telling to update after a default installation.

> In fact, I'd argue that we can
> probably get rid of checkpoint_completion_target entirely as an option.
> The main argument against that is that it could be annoying for people
> upgrading, but changing the default to 0.9 would definitely be an
> improvement.

Not sure there is enough ground to do that though.
--
Michael

Attachment

Re: A few new options for CHECKPOINT

From
"Bossart, Nathan"
Date:
On 12/5/20, 9:11 AM, "Stephen Frost" <sfrost@snowman.net> wrote:
> * Bossart, Nathan (bossartn@amazon.com) wrote:
>> On 12/5/20, 6:41 AM, "Stephen Frost" <sfrost@snowman.net> wrote:
>> > Assuming we actually want to do this, which I still generally don't
>> > agree with since it isn't really clear if it'll actually end up doing
>> > something, or not, wouldn't it make more sense to have a command that
>> > just sits and waits for the currently running (or next) checkpoint to
>> > complete..?  For the use-case that was explained, at least, we don't
>> > actually need to cause another checkpoint to happen, we just want to
>> > know when a checkpoint has completed, right?
>> 
>> If it's enough to just wait for the current checkpoint to complete or
>> to wait for the next one to complete, I suppose you could just poll
>> pg_control_checkpoint().  I think the only downside is that you could
>> end up sitting idle for a while, especially if checkpoint_timeout is
>> high and checkpoint_completion_target is low.  But, as you point out,
>> that may not be a typically recommended way to configure the system.
>
> Maybe I missed something, but aren't you going to be waiting a while
> with this patch given that it's asking for a spread checkpoint too..?
>
> I agree that you could just monitor for the next checkpoint using
> pg_control_checkpoint(), which is why I'm wondering again what the
> point is behind this patch...  I'm trying to understand why we'd be
> encouraging people to increase the number of checkpoints that are
> happening when they're still going to be waiting around for that spread
> (in other words, non-immediate) checkpoint to happen (just as if they'd
> just waited until the next regular checkpoint), and they're still going
> to have a fair bit of WAL to replay because it'll be however much WAL
> has been written since we started the spread/non-immediate checkpoint.

I plan to mark this patch as withdrawn after the next commitfest
unless anyone objects.

Nathan


Re: A few new options for CHECKPOINT

From
Bruce Momjian
Date:
On Mon, Dec  7, 2020 at 11:22:01AM +0900, Michael Paquier wrote:
> On Sun, Dec 06, 2020 at 10:03:08AM -0500, Stephen Frost wrote:
> > * Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:
> >> You keep making this statement, and I don't necessarily disagree, but if
> >> that is the case, please explain why don't we have
> >> checkpoint_completion_target set to 0.9 by default?  Should we change
> >> that?
> > 
> > Yes, I do think we should change that..
> 
> Agreed.  FWIW, no idea for others, but it is one of those parameters I
> keep telling to update after a default installation.

+1 for making it 0.9.   FYI< Robert Haas has argued that our estimation
of completion isn't great, which is why it defaults to 0.5.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee