Re: Online checksums verification in the backend - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: Online checksums verification in the backend |
Date | |
Msg-id | 20200916034558.GB11110@paquier.xyz Whole thread Raw |
In response to | Re: Online checksums verification in the backend (Julien Rouhaud <rjuju123@gmail.com>) |
Responses |
Re: Online checksums verification in the backend
|
List | pgsql-hackers |
On Fri, Sep 11, 2020 at 09:49:16AM +0200, Julien Rouhaud wrote: > Thanks! I got some numbers out of my pocket, using the following base configuration: wal_level = minimal fsync = off shared_buffers = '300MB' # also tested with 30MB and 60MB checksum_cost_delay = 0 # default in patch And for the test I have used pgbench initialized at a scale of 250, to have close to 3.5GB of data, so as it gives us a sort of 90% buffer eviction, with all the data cached in the OS (we may want to look as well at the case where the OS cache does not hold all the relation pages). I have also run some tests with 30MB and 60MB of shared buffers, for similar results. I also applied some prewarming on the cluster: create extension pg_prewarm select pg_prewarm(oid) from pg_class where oid > 16000; Then, I have done five runs using that: pgbench -S -M prepared -c 64/128/256 -n -T 60 In parallel of that, I got this stuff running in parallel all the time: select pg_check_relation('pgbench_accounts'); \watch 0.1 Here are some TPS numbers with the execution time of pg_check_relation. In the five runs, I removed the highest and lowest ones, then took an average of the remaining three. I have also tested two modes: with and without the optimization, that requires a one-liner in checksum.c as per your latest patch: --- a/src/backend/storage/page/checksum.c +++ b/src/backend/storage/page/checksum.c @@ -84,7 +84,7 @@ check_one_block(Relation relation, ForkNumber forknum, BlockNumber blkno, uint16 *chk_expected, uint16 *chk_found) { char buffer[BLCKSZ]; - bool force_lock = false; + bool force_lock = true; bool found_in_sb; Within parenthesis is the amount of time taken by pg_relation_check() for a single check. This is not completely exact and I saw some variations, just to give an impression: Conns 64 128 256 force_lock=true 60590 (7~8s) 55652 (10.2~10.5s) 46812 (9~10s) force_lock=false 58637 (5s) 54131 (6~7s) 37091 (1.1~1.2s) For connections higher than 128, I was kind of surprised to see pg_relation_check being more aggressive without the optimization, with much less contention on the buffer mapping LWLock actually, but that's an impression from looking at pg_stat_activity. Looking at the wait events for each run, I saw much more hiccups with the buffer mapping LWLock when forcing the lock rather than not, still I was able to see some contention when also not forcing the lock. Not surprising as this rejects a bunch of pages from shared buffers. > I used all default settings, but by default checksum_cost_delay is 0 > so there shouldn't be any throttling. Thanks, so did I. From what I can see, there could be as well benefits in not using the optimization by default so as the relation check applies some natural throttling by making the checks actually slower (there is a link between the individual runtime of pg_relation_time and the TPS). So it also seems to me that the throttling parameters would be beneficial, but it looks to me that there is as well a point to not include any throttling in a first version if the optimization to go full speed is not around. Using three new GUCs for those function calls is still too much IMO, so there is also the argument to move all this stuff into a new contrib/ module, and have a bgworker implementation as part of it as it would need shared_preload_libraries anyway. Also, I have been putting some thoughts into the APIs able to fetch a buffer without going through the shared buffers. And neither checksum.c, because it should be a place that those APIs depends on and include only the most-internal logic for checksum algorithm and computation, nor checksumfuncs.c, because we need to think about the case of a background worker as well (that could spawn a set of dynamic workers connecting to different databases able to do checksum verifications?). It would be good to keep the handling of the buffer mapping lock as well as the calls to smgrread() into a single place. ReadBuffer_common() is a natural place for that, though it means for our use case the addition of three new options: - Being able to pass down directly a buffer pointer to save the page contents. - Being able to not verify directly a page, leaving the verification to the caller upthread. - Addition of a new mode, that I am calling here RBM_PRIVATE, where we actually read the page from disk if not yet in shared buffers, except that we fill in the contents of the page using the pointer given by the caller. That's different than the use of local buffers, as we don't need this much amount of complications like temporary tables of course for per-page checks. Another idea would be to actually just let ReadBuffer_common just do the check by itself, with a different mode like a kind of RBM_VALIDATE, where we just return a verification state of the page that can be consumed by callers. This also comes with some more advantages: - Tracking of reads from disk with track_io_timing. - Addition of some private stats dedicated to this private mode, with new fields in pgBufferUsage, all in a single place -- Michael
Attachment
pgsql-hackers by date: