Re: Online checksums verification in the backend - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: Online checksums verification in the backend
Date
Msg-id 20200909092524.GC57691@nol
Whole thread Raw
In response to Re: Online checksums verification in the backend  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Online checksums verification in the backend  (Michael Paquier <michael@paquier.xyz>)
Re: Online checksums verification in the backend  (Julien Rouhaud <rjuju123@gmail.com>)
List pgsql-hackers
On Mon, Sep 07, 2020 at 05:50:38PM +0900, Michael Paquier wrote:
> On Mon, Sep 07, 2020 at 09:38:30AM +0200, Julien Rouhaud wrote:
> > Did you mean creating a new checksumfuncs.c file? I don't find any
> > such file in the current tree.
> 
> Your patch adds checksumfuncs.c, so the subroutines grabbing a given
> block could just be moved there.
> 

Sorry, I was in the middle of a rebase for another patch and missed the new
files added in this one.  I added a new checksumfuncs.h for the required
include that should not be seen by client code.  I kept checksumfuncs.c and
checksums.c so that the SQL visible declaration are separated from the rest of
the implementation as this is what we already do elsewhere I think.  If that's
a problem I'll change and put everything in checksumfuncs.[ch].

I also moved the tap tests in src/test/modules and renamed the file with a 3
digits.  For the record I initially copied src/test/modules/brin, and this is
apparently the only subdir that has a 2 digits pattern.

I also added a new WAIT_EVENT_CHECK_DELAY wait event.

> > I'm not sure I understand.  Unless I missed something this approach
> > *cannot* raise a false positive.  What it does is force a 2nd check
> > with stronger lock *to make sure it's actually a corruption*, so we
> > don't raise false positive.  The only report that can happen in this
> > 1st loop is if smgread raises an error, which AFAICT can only happen
> > (at least with mdread) if the whole block couldn't be read, which is a
> > sign of a very bad problem.  This should clearly be reported, as this
> > cannot be caused by the locking heuristics used here.
> 
> We don't know how much this optimization matters though?  Could it be
> possible to get an idea of that?  For example, take the case of one
> relation with a fixed size in a read-only workload and a read-write
> workload (as long as autovacuum and updates make the number of
> relation blocks rather constant for the read-write case), doing a
> checksum verification in parallel of multiple clients working on the
> relation concurrently.  Assuming that the relation is fully in the OS
> cache, we could get an idea of the impact with multiple
> (shared_buffers / relation size) rates to make the eviction more
> aggressive?  The buffer partition locks, knowing that
> NUM_BUFFER_PARTITIONS caps that, should be the bottleneck, still it
> seems to me that it would be good to see if we have a difference.
> What do you think?

I assumed that the odds of having to check the buffer twice were so low, and
avoiding to keep a bufmapping lock while doing some IO was an uncontroversial
enough optimisation, but maybe that's only wishful thinking.

I'll do some becnhmarking and see if I can get some figures, but it'll probably
take some time.  In the meantime I'm attaching v11 of the patch that should
address all other comments.

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Next
From: "Andrey V. Lepikhov"
Date:
Subject: Re: [POC] Fast COPY FROM command for the table with foreign partitions