Re: Handing off SLRU fsyncs to the checkpointer - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Handing off SLRU fsyncs to the checkpointer
Date
Msg-id CA+hUKG+WPWgLyO59+ADoDQ9ar0mpiH3jFYqJXSKpvx6igA16Pg@mail.gmail.com
Whole thread Raw
In response to Re: Handing off SLRU fsyncs to the checkpointer  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: Handing off SLRU fsyncs to the checkpointer
List pgsql-hackers
On Tue, Aug 4, 2020 at 6:02 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> ... speedup of around 6% ...

I did some better testing.  OS: Linux, storage: consumer SSD.  I
repeatedly ran crash recovery on 3.3GB worth of WAL generated with 8M
pgbench transactions.  I tested 3 different builds 7 times each and
used "ministat" to compare the recovery times.  It told me that:

* Master is around 11% faster than last week before commit c5315f4f
"Cache smgrnblocks() results in recovery."
* This patch gives a similar speedup, bringing the total to around 25%
faster than last week (the time is ~20% less, the WAL processing speed
is ~1.25x).

My test fit in RAM and was all cached.  With the patch, the recovery
process used 100% of a single core the whole time and stayed on that
core and the variance is low, but in the other builds it hovered
around 90% and hopped around as it kept getting rescheduled and the
variance was higher.

Of course, SLRU fsyncs aren't the only I/O stalls in a real system;
among others, there are also reads from faulting in referenced pages
that don't have full page images in the WAL.  I'm working on that
separately, but that's a tad more complicated than this stuff.

Added to commit fest.

=== ministat output showing recovery times in seconds ===

x patched.dat
+ master.dat
* lastweek.dat
+------------------------------------------------------------------------------+
|                                                                          *   |
|    x                             +                                       *   |
|x x xx                  +         +  ++ +    +              *           ****  |
|  |AM|                        |_____AM____|                       |_____A_M__||
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   7        38.655        39.406        39.218     39.134857    0.25188849
+   7        42.128        45.068        43.958     43.815286    0.91387758
Difference at 95.0% confidence
    4.68043 +/- 0.780722
    11.9597% +/- 1.99495%
    (Student's t, pooled s = 0.670306)
*   7        47.187        49.404        49.203     48.904286    0.76793483
Difference at 95.0% confidence
    9.76943 +/- 0.665613
    24.9635% +/- 1.70082%
    (Student's t, pooled s = 0.571477)



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: For standby pg_ctl doesn't wait for PM_STATUS_READY in presence of promote_trigger_file
Next
From: "Joel Mariadasan (jomariad)"
Date:
Subject: Reg. Postgres 13