Re: Online checksums verification in the backend - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: Online checksums verification in the backend
Date
Msg-id CAOBaU_ZV9hEBo2z=0htszBw2PFSkp=kLmjUTV5PzxQjCKB9zHA@mail.gmail.com
Whole thread Raw
In response to Re: Online checksums verification in the backend  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Online checksums verification in the backend
List pgsql-hackers
On Thu, Oct 1, 2020 at 1:07 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Sep 25, 2020 at 06:11:47PM +0800, Julien Rouhaud wrote:
> > Thanks a lot for the tests!  I'm not surprised that forcing the lock
> > will slow down the pg_check_relation() execution, but I'm a bit
> > surprised that holding the buffer mapping lock longer in a workload
> > that has a lot of evictions actually makes things faster.  Do you have
> > any idea why that's the case?
>
> That's still a bit unclear to me, but I have not spent much time
> thinking about this particular point either.
>
> > I'm assuming that you prefer to remove both the optimization and the
> > throttling part?  I'll do that with the next version unless there's
> > objections.
>
> Yeah, any tests I have done tends to show that.  It would be good to
> also check some perf profiles here, at least for the process running
> the relation check in a loop.
>
> > I agree that putting the code nearby ReadBuffer_common() would be a
> > good idea.  However, that means that I can't move all the code to
> > contrib/  I'm wondering what you'd like to see going there.  I can see
> > some values in also having the SQL functions available in core rather
> > than contrib, e.g. if you need to quickly check a relation on a
> > standby, so without requiring to create the extension on the primary
> > node first.
>
> Good point.  This could make the user experience worse.
>
> >  Then, I'm a bit worried about adding this code in ReadBuffer_common.
> > What this code does is quite different, and I'm afraid that it'll make
> > ReadBuffer_common more complex than needed, which  is maybe not a good
> > idea for something as critical as this function.
> >
> > What do you think?
>
> Yeah, I have been looking at ReadBuffer_common() and it is true that
> it is complicated enough so we may not really need an extra mode or
> more options, for a final logic that is actually different than what a
> buffer read does: we just want to know if a page has a valid checksum
> or not.  An idea that I got here would be to add a new, separate
> function to do the page check directly in bufmgr.c, but that's what
> you mean.  Now only the prefetch routine and ReadBuffer_common use
> partition locks, but getting that done in the same file looks like a
> good compromise to me.  It would be also possible to keep the BLCKSZ
> buffer used to check the page directly in this routine, so as any
> caller willing to do a check don't need to worry about any
> allocation.

I made all the suggested modifications in attached v14:

- moved the C code in bufmgr.c nearby ReadBuffer
- removed the GUC and throttling options
- removed the dubious optimization

All documentation and comments are updated to reflect those changes.
I also split the commit in two, one for the backend infrastructure and
one for the SQL wrappers.

Attachment

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Resetting spilled txn statistics in pg_stat_replication
Next
From: Thomas Munro
Date:
Subject: Re: kevent latch paths don't handle postmaster death well