On Mon, Sep 07, 2020 at 05:50:38PM +0900, Michael Paquier wrote:
> On Mon, Sep 07, 2020 at 09:38:30AM +0200, Julien Rouhaud wrote:
> > Did you mean creating a new checksumfuncs.c file? I don't find any
> > such file in the current tree.
>
> Your patch adds checksumfuncs.c, so the subroutines grabbing a given
> block could just be moved there.
>
Sorry, I was in the middle of a rebase for another patch and missed the new
files added in this one. I added a new checksumfuncs.h for the required
include that should not be seen by client code. I kept checksumfuncs.c and
checksums.c so that the SQL visible declaration are separated from the rest of
the implementation as this is what we already do elsewhere I think. If that's
a problem I'll change and put everything in checksumfuncs.[ch].
I also moved the tap tests in src/test/modules and renamed the file with a 3
digits. For the record I initially copied src/test/modules/brin, and this is
apparently the only subdir that has a 2 digits pattern.
I also added a new WAIT_EVENT_CHECK_DELAY wait event.
> > I'm not sure I understand. Unless I missed something this approach
> > *cannot* raise a false positive. What it does is force a 2nd check
> > with stronger lock *to make sure it's actually a corruption*, so we
> > don't raise false positive. The only report that can happen in this
> > 1st loop is if smgread raises an error, which AFAICT can only happen
> > (at least with mdread) if the whole block couldn't be read, which is a
> > sign of a very bad problem. This should clearly be reported, as this
> > cannot be caused by the locking heuristics used here.
>
> We don't know how much this optimization matters though? Could it be
> possible to get an idea of that? For example, take the case of one
> relation with a fixed size in a read-only workload and a read-write
> workload (as long as autovacuum and updates make the number of
> relation blocks rather constant for the read-write case), doing a
> checksum verification in parallel of multiple clients working on the
> relation concurrently. Assuming that the relation is fully in the OS
> cache, we could get an idea of the impact with multiple
> (shared_buffers / relation size) rates to make the eviction more
> aggressive? The buffer partition locks, knowing that
> NUM_BUFFER_PARTITIONS caps that, should be the bottleneck, still it
> seems to me that it would be good to see if we have a difference.
> What do you think?
I assumed that the odds of having to check the buffer twice were so low, and
avoiding to keep a bufmapping lock while doing some IO was an uncontroversial
enough optimisation, but maybe that's only wishful thinking.
I'll do some becnhmarking and see if I can get some figures, but it'll probably
take some time. In the meantime I'm attaching v11 of the patch that should
address all other comments.