Re: Online checksums verification in the backend - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: Online checksums verification in the backend
Date
Msg-id 20200910180610.GA62228@nol
Whole thread Raw
In response to Re: Online checksums verification in the backend  (Julien Rouhaud <rjuju123@gmail.com>)
Responses Re: Online checksums verification in the backend  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
On Thu, Sep 10, 2020 at 09:47:23AM +0200, Julien Rouhaud wrote:
> On Wed, Sep 09, 2020 at 03:41:30PM +0200, Julien Rouhaud wrote:
> > On Wed, Sep 09, 2020 at 11:25:29AM +0200, Julien Rouhaud wrote:
> > > 
> > > I'll do some becnhmarking and see if I can get some figures, but it'll probably
> > > take some time.  In the meantime I'm attaching v11 of the patch that should
> > > address all other comments.
> > 
> > I just realized that I forgot to update one of the Makefile when moving the TAP
> > test folder.  v12 attached should fix that.
> 
> 
> And the cfbot just reported a new error for Windows build.  Attached v13 should
> fix that.


I did some benchmarking using the following environnment:

- 16MB shared buffers
- 490MB table (10M rows)
- synchronized_seqscan to off
- table in OS cache

I don't have a big machine so I went with a very small shared_buffers and a
small table, to make sure that all data is in OS cache but the table more than
an order bigger than the shared_buffers, to simulate some plausible environment.

I used a simple read only query that performs a sequential scan of the table (a
simple SELECT * FROM table), run using 10 concurrent connections, 5 runs of 700
seconds.  I did that without any other activity, with a \watch of the original
pg_check_relation function using \watch .1, and a modified version of that
function without the optimisation, still with a \watch .1

The TPS is obviously overall extremely bad, but I can see that the submitted
version added an overhead of ~3.9% (average of 5 runs), while the version
without the optimisation added an overhead of ~6.57%.

This is supposed to be a relatively fair benchmark as all the data are cached
on the OS side, so IO done while holding the bufmapping lock aren't too long,
but we can see that we already get a non negligible benefit from this
optimisation.  Should I do additional benchmarking, like dropping the OS cache
and/or adding some write activity?  This would probably only make the
unoptimized version perform even worse.



pgsql-hackers by date:

Previous
From: "Jameson, Hunter 'James'"
Date:
Subject: Re: Fix for parallel BTree initialization bug
Next
From: "Jameson, Hunter 'James'"
Date:
Subject: Re: Fix for parallel BTree initialization bug