Re: A few new options for CHECKPOINT - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: A few new options for CHECKPOINT
Date
Msg-id 20201127185742.GL16415@tamriel.snowman.net
Whole thread Raw
In response to Re: A few new options for CHECKPOINT  ("Bossart, Nathan" <bossartn@amazon.com>)
Responses Re: A few new options for CHECKPOINT  ("Bossart, Nathan" <bossartn@amazon.com>)
List pgsql-hackers
Greetings,

* Bossart, Nathan (bossartn@amazon.com) wrote:
> On 11/27/20, 8:29 AM, "Stephen Frost" <sfrost@snowman.net> wrote:
> > Also note that, in all other cases (that is, when there *is* outstanding
> > WAL since the last checkpoint), pg_start_backup actually just waits for
> > the existing checkpoint to complete- and while it's waiting for that to
> > happen, there'll be additional WAL building up since that checkpoint
> > started that will have to be replayed as part of crash recovery, just as
> > if you took a snapshot of the system at any other time.
> >
> > So, either there won't be any WAL outstanding, in which case running a
> > CHECKPOINT FORCE just ends up creating more WAL without actually being
> > useful, or there's WAL outstanding and the only thing this does is delay
> > the snapshot being taken but doesn't actually reduce the amount of WAL
> > that's going to end up being outstanding and which will have to be
> > replayed during crash recovery.
> >
> > Maybe there's a useful reason to have these options, but at least the
> > stated one isn't it and I wouldn't want to encourage people who are
> > using snapshot-based backups to use these options since they aren't
> > going to work the way that this thread is implying they would.
>
> I don't think it's true that pg_start_backup() just waits for the
> existing checkpoint to complete.  It calls RequestCheckpoint() with
> CHECKPOINT_WAIT, which should wait for a new checkpoint to start.

erm... right?  pg_start_backup waits for a new checkpoint to start.  I'm
confused how that's different from "waits for the existing checkpoint to
complete".  The entire point is that it *doesn't* force a checkpoint to
happen immediately.

>                 /* Wait for a new checkpoint to start. */
>                 ConditionVariablePrepareToSleep(&CheckpointerShmem->start_cv);
>                 for (;;)
>                 {
>
> I also tried running pg_start_backup() while an automatic checkpoint
> was ongoing, and it seemed to create a new one.
>
>         psql session:
>                 postgres=# SELECT now(); SELECT pg_start_backup('test'); SELECT now();
>                 now
>                 -------------------------------
>                 2020-11-27 16:52:31.958124+00
>                 (1 row)
>
>                 pg_start_backup
>                 -----------------
>                 0/D3D24F0
>                 (1 row)
>
>                 now
>                 -------------------------------
>                 2020-11-27 16:52:50.113372+00
>                 (1 row)
>
>         logs:
>                 2020-11-27 16:52:20.129 UTC [16029] LOG:  checkpoint starting: time
>                 2020-11-27 16:52:35.121 UTC [16029] LOG:  checkpoint complete...
>                 2020-11-27 16:52:35.122 UTC [16029] LOG:  checkpoint starting: force wait
>                 2020-11-27 16:52:50.110 UTC [16029] LOG:  checkpoint complete...

Yes- pg_start_backup specifies 'force' because it wants a checkpoint to
happen even if there isn't any outstanding WAL, but it doesn't make the
existing checkpoint go faster, which is the point that I'm trying to
make here.

If you'd really like to test and see what happens, run a pgbench that
loads the system up while doing pg_start_backup and see how long it
takes before pg_start_backup returns, and see how much outstanding WAL
there is from the starting checkpoint from pg_start_backup and the time
it returns.  To make it really clear, you should really also set
checkpoint completion timeout to 0.9 and make sure you max_wal_size is
set to a pretty large value, and make sure that the pgbench is
generating enough to have pretty large checkpoints while still allowing
them to happen due to 'time'.

> The patch I've submitted does the same thing.

Yes, I'm not surprised by that.  That doesn't change anything about the
point I'm trying to make..

> Even if it did simply wait for the existing checkpoint to complete,
> isn't it still preferable to take a snapshot right after a checkpoint
> completes, even if it is non-immediate?  You'll need to replay WAL in
> either case, and it's true that you could need to replay less WAL if
> you take an immediate checkpoint versus a non-immediate checkpoint.

Why is it preferable?  Your argument here is that it's preferable
because there'll be less outstanding WAL, but what I'm pointing out is
that that's not the case, so that isn't a reason for it to be preferable
and so I'm asking: what other reason is it preferable..?  I'm not aware
of one..

> However, if you take a snapshot without a checkpoint, you might need
> to replay up to checkpoint_timeout + (time it takes for a non-
> immediate checkpoint to complete) worth of WAL.  For the logs just
> above this paragraph, if I take a snapshot at 16:47:04, I'd need to
> replay 29 seconds of WAL.  However, if I take the snapshot at
> 16:47:06, I only need to replay 16 seconds of WAL.

This is exactly the point I'm making- if you're using a deferred
checkpoint then you're still going to have up to checkpoint_timeout
worth of WAL to replay.  Again, these tests aren't really worth much
because the system clearly isn't under any load..

> I apologize if I'm missing something obvious here.

If you'd like to show that I'm wrong, and it's entirely possible that I
am, then retry the above with actual load on the system, and also
actually look at how much outstanding WAL you end up with given the
different scenarios which has to be replayed during crash recovery.

Thanks,

Stephen

Attachment

pgsql-hackers by date:

Previous
From: Patrick Handja
Date:
Subject: Re: Setof RangeType returns
Next
From: Peter Geoghegan
Date:
Subject: Re: [PATCH] LWLock self-deadlock detection