Re: A few new options for CHECKPOINT - Mailing list pgsql-hackers

From Bossart, Nathan
Subject Re: A few new options for CHECKPOINT
Date
Msg-id 121FB996-26ED-450A-B04C-7EDC771C449A@amazon.com
Whole thread Raw
In response to Re: A few new options for CHECKPOINT  (Stephen Frost <sfrost@snowman.net>)
Responses Re: A few new options for CHECKPOINT  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On 11/27/20, 8:29 AM, "Stephen Frost" <sfrost@snowman.net> wrote:
> Also note that, in all other cases (that is, when there *is* outstanding
> WAL since the last checkpoint), pg_start_backup actually just waits for
> the existing checkpoint to complete- and while it's waiting for that to
> happen, there'll be additional WAL building up since that checkpoint
> started that will have to be replayed as part of crash recovery, just as
> if you took a snapshot of the system at any other time.
>
> So, either there won't be any WAL outstanding, in which case running a
> CHECKPOINT FORCE just ends up creating more WAL without actually being
> useful, or there's WAL outstanding and the only thing this does is delay
> the snapshot being taken but doesn't actually reduce the amount of WAL
> that's going to end up being outstanding and which will have to be
> replayed during crash recovery.
>
> Maybe there's a useful reason to have these options, but at least the
> stated one isn't it and I wouldn't want to encourage people who are
> using snapshot-based backups to use these options since they aren't
> going to work the way that this thread is implying they would.

I don't think it's true that pg_start_backup() just waits for the
existing checkpoint to complete.  It calls RequestCheckpoint() with
CHECKPOINT_WAIT, which should wait for a new checkpoint to start.

                /* Wait for a new checkpoint to start. */
                ConditionVariablePrepareToSleep(&CheckpointerShmem->start_cv);
                for (;;)
                {

I also tried running pg_start_backup() while an automatic checkpoint
was ongoing, and it seemed to create a new one.

        psql session:
                postgres=# SELECT now(); SELECT pg_start_backup('test'); SELECT now();
                now
                -------------------------------
                2020-11-27 16:52:31.958124+00
                (1 row)

                pg_start_backup
                -----------------
                0/D3D24F0
                (1 row)

                now
                -------------------------------
                2020-11-27 16:52:50.113372+00
                (1 row)

        logs:
                2020-11-27 16:52:20.129 UTC [16029] LOG:  checkpoint starting: time
                2020-11-27 16:52:35.121 UTC [16029] LOG:  checkpoint complete...
                2020-11-27 16:52:35.122 UTC [16029] LOG:  checkpoint starting: force wait
                2020-11-27 16:52:50.110 UTC [16029] LOG:  checkpoint complete...

The patch I've submitted does the same thing.

        psql session:
                postgres=# SELECT now(); CHECKPOINT (FAST FALSE); SELECT now();
                now
                -------------------------------
                2020-11-27 16:46:39.346131+00
                (1 row)

                CHECKPOINT
                now
                -------------------------------
                2020-11-27 16:47:05.083944+00
                (1 row)

        logs:
                2020-11-27 16:46:35.056 UTC [16029] LOG:  checkpoint starting: time
                2020-11-27 16:46:50.099 UTC [16029] LOG:  checkpoint complete...
                2020-11-27 16:46:50.099 UTC [16029] LOG:  checkpoint starting: force wait
                2020-11-27 16:47:05.083 UTC [16029] LOG:  checkpoint complete...

Even if it did simply wait for the existing checkpoint to complete,
isn't it still preferable to take a snapshot right after a checkpoint
completes, even if it is non-immediate?  You'll need to replay WAL in
either case, and it's true that you could need to replay less WAL if
you take an immediate checkpoint versus a non-immediate checkpoint.
However, if you take a snapshot without a checkpoint, you might need
to replay up to checkpoint_timeout + (time it takes for a non-
immediate checkpoint to complete) worth of WAL.  For the logs just
above this paragraph, if I take a snapshot at 16:47:04, I'd need to
replay 29 seconds of WAL.  However, if I take the snapshot at
16:47:06, I only need to replay 16 seconds of WAL.

I apologize if I'm missing something obvious here.

Nathan


pgsql-hackers by date:

Previous
From: Paul Förster
Date:
Subject: Re: configure and DocBook XML
Next
From: Tom Lane
Date:
Subject: Re: configure and DocBook XML