Home > mailing lists

Re: Online checksums verification in the backend - Mailing list pgsql-hackers

From	Michael Paquier
Subject	Re: Online checksums verification in the backend
Date	September 16, 2020 06:45:58
Msg-id	20200916034558.GB11110@paquier.xyz Whole thread Raw
In response to	Re: Online checksums verification in the backend (Julien Rouhaud <rjuju123@gmail.com>)
Responses	Re: Online checksums verification in the backend
List	pgsql-hackers

Tree view

On Fri, Sep 11, 2020 at 09:49:16AM +0200, Julien Rouhaud wrote:
> Thanks!

I got some numbers out of my pocket, using the following base
configuration:
wal_level = minimal
fsync = off
shared_buffers = '300MB' # also tested with 30MB and 60MB
checksum_cost_delay = 0 # default in patch

And for the test I have used pgbench initialized at a scale of 250, to
have close to 3.5GB of data, so as it gives us a sort of 90% buffer
eviction, with all the data cached in the OS (we may want to look as
well at the case where the OS cache does not hold all the relation
pages).  I have also run some tests with 30MB and 60MB of shared
buffers, for similar results.

I also applied some prewarming on the cluster:
create extension pg_prewarm
select pg_prewarm(oid) from pg_class where oid > 16000;

Then, I have done five runs using that:
pgbench -S -M prepared -c 64/128/256 -n -T 60
In parallel of that, I got this stuff running in parallel all the
time:
select pg_check_relation('pgbench_accounts');
\watch 0.1

Here are some TPS numbers with the execution time of pg_check_relation.
In the five runs, I removed the highest and lowest ones, then took an
average of the remaining three.  I have also tested two modes: with
and without the optimization, that requires a one-liner in checksum.c
as per your latest patch:
--- a/src/backend/storage/page/checksum.c
+++ b/src/backend/storage/page/checksum.c
@@ -84,7 +84,7 @@ check_one_block(Relation relation, ForkNumber forknum, BlockNumber blkno,
                 uint16 *chk_expected, uint16 *chk_found)
 {
     char        buffer[BLCKSZ];
-    bool        force_lock = false;
+    bool        force_lock = true;
     bool        found_in_sb;

Within parenthesis is the amount of time taken by pg_relation_check()
for a single check.  This is not completely exact and I saw some
variations, just to give an impression:
Conns                      64                 128               256
force_lock=true     60590 (7~8s)  55652 (10.2~10.5s)     46812 (9~10s)
force_lock=false   58637 (5s)        54131 (6~7s)  37091 (1.1~1.2s)

For connections higher than 128, I was kind of surprised to see
pg_relation_check being more aggressive without the optimization, with
much less contention on the buffer mapping LWLock actually, but that's
an impression from looking at pg_stat_activity.

Looking at the wait events for each run, I saw much more hiccups with
the buffer mapping LWLock when forcing the lock rather than not, still
I was able to see some contention when also not forcing the lock.  Not
surprising as this rejects a bunch of pages from shared buffers.

> I used all default settings, but by default checksum_cost_delay is 0
> so there shouldn't be any throttling.

Thanks, so did I.  From what I can see, there could be as well
benefits in not using the optimization by default so as the relation
check applies some natural throttling by making the checks actually
slower (there is a link between the individual runtime of
pg_relation_time and the TPS).  So it also seems to me that the
throttling parameters would be beneficial, but it looks to me that
there is as well a point to not include any throttling in a first
version if the optimization to go full speed is not around.  Using
three new GUCs for those function calls is still too much IMO, so
there is also the argument to move all this stuff into a new contrib/
module, and have a bgworker implementation as part of it as it would
need shared_preload_libraries anyway.

Also, I have been putting some thoughts into the APIs able to fetch a
buffer without going through the shared buffers.  And neither
checksum.c, because it should be a place that those APIs depends on
and include only the most-internal logic for checksum algorithm and
computation, nor checksumfuncs.c, because we need to think about the
case of a background worker as well (that could spawn a set of dynamic
workers connecting to different databases able to do checksum
verifications?).  It would be good to keep the handling of the buffer
mapping lock as well as the calls to smgrread() into a single place.
ReadBuffer_common() is a natural place for that, though it means for
our use case the addition of three new options:
- Being able to pass down directly a buffer pointer to save the page
contents.
- Being able to not verify directly a page, leaving the verification
to the caller upthread.
- Addition of a new mode, that I am calling here RBM_PRIVATE, where we
actually read the page from disk if not yet in shared buffers, except
that we fill in the contents of the page using the pointer given by
the caller.  That's different than the use of local buffers, as we
don't need this much amount of complications like temporary tables of
course for per-page checks.

Another idea would be to actually just let ReadBuffer_common just do
the check by itself, with a different mode like a kind of
RBM_VALIDATE, where we just return a verification state of the page
that can be consumed by callers.

This also comes with some more advantages:
- Tracking of reads from disk with track_io_timing.
- Addition of some private stats dedicated to this private mode, with
new fields in pgBufferUsage, all in a single place
--
Michael

Attachment

signature.asc

pgsql-hackers by date:

From: Tom Lane
Date: 16 September 2020, 06:44:38
Subject: Re: recovering from "found xmin ... from before relfrozenxid ..."

From: Amit Kapila
Date: 16 September 2020, 06:49:36
Subject: Re: Subscription test 013_partition.pl fails under CLOBBER_CACHE_ALWAYS

Re: Online checksums verification in the backend - Mailing list pgsql-hackers

Attachment

Previous

Next