Home > mailing lists

Re: A few new options for CHECKPOINT - Mailing list pgsql-hackers

From	Bossart, Nathan
Subject	Re: A few new options for CHECKPOINT
Date	November 28, 2020 01:50:54
Msg-id	A07F8329-4870-40B2-9991-5016491AA11A@amazon.com Whole thread Raw
In response to	Re: A few new options for CHECKPOINT (Stephen Frost <sfrost@snowman.net>)
Responses	Re: A few new options for CHECKPOINT
List	pgsql-hackers

Tree view

On 11/27/20, 10:58 AM, "Stephen Frost" <sfrost@snowman.net> wrote:
> If you'd like to show that I'm wrong, and it's entirely possible that I
> am, then retry the above with actual load on the system, and also
> actually look at how much outstanding WAL you end up with given the
> different scenarios which has to be replayed during crash recovery.

I did a little experiment to show the behavior I'm referring to.  I
used these settings:

        checkpoint_completion_target = 0.9
        checkpoint_timeout = 30s
        max_wal_size = 20GB
        WAL segment size is 64MB

I ran the following pgbench command for a few minutes before each
test:

        pgbench postgres -T 3600 -c 64 -j 64 -N

For the first test, I killed Postgres just before an automatic, non-
immediate checkpoint completed.

        2020-11-28 00:31:57 UTC::@:[51770]:LOG:  checkpoint complete...
        2020-11-28 00:32:00 UTC::@:[51770]:LOG:  checkpoint starting: time

        Killed Postgres at 00:32:26 UTC, 29 seconds after latest
        checkpoint completed.

        2020-11-28 00:32:42 UTC::@:[77256]:LOG:  redo starts at 3CF/FD6B8BD0
        2020-11-28 00:32:56 UTC::@:[77256]:LOG:  redo done at 3D0/C94D1D00

        Recovery took 14 seconds and replayed ~3.2 GB of WAL.

        postgres=> SELECT pg_wal_lsn_diff('3D0/C94D1D00', '3CF/FD6B8BD0');
         pg_wal_lsn_diff
        -----------------
              3420557616
        (1 row)

For the second test, I killed Postgres just after an automatic, non-
immediate checkpoint completed.

        2020-11-28 00:41:26 UTC::@:[77475]:LOG:  checkpoint complete...

        Killed Postgres at 00:41:26 UTC, just after latest checkpoint
        completed.

        2020-11-28 00:41:42 UTC::@:[8599]:LOG:  redo starts at 3D3/152EDD78
        2020-11-28 00:41:49 UTC::@:[8599]:LOG:  redo done at 3D3/78358A40

        Recovery took 7 seconds and replayed ~1.5 GB of WAL.

        postgres=> SELECT pg_wal_lsn_diff('3D3/78358A40', '3D3/152EDD78');
         pg_wal_lsn_diff
        -----------------
              1661381832
        (1 row)

Granted, I used a rather aggressive checkpoint_timeout, but I think
this demonstrates that waiting for a non-immediate checkpoint to
complete can lower the amount of WAL needed for recovery, even though
it might not lower it as much as waiting for an immediate checkpoint
would.

Nathan

pgsql-hackers by date:

From: Tomas Vondra
Date: 28 November 2020, 01:37:17
Subject: Re: [PoC] Non-volatile WAL buffer

From: Tomas Vondra
Date: 28 November 2020, 02:10:40
Subject: Re: POC: postgres_fdw insert batching

Re: A few new options for CHECKPOINT - Mailing list pgsql-hackers

Previous

Next