Re: Online checksums verification in the backend - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Online checksums verification in the backend
Date
Msg-id 20201102010525.GA27101@paquier.xyz
Whole thread Raw
In response to Re: Online checksums verification in the backend  (Andres Freund <andres@anarazel.de>)
Responses Re: Online checksums verification in the backend  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Thu, Oct 29, 2020 at 10:08:52PM -0700, Andres Freund wrote:
> I think its pretty much *never* OK to do IO while holding a buffer
> mapping lock. You're locking a significant fraction of shared buffers
> over IO. That's just not OK.  Don't think there's any place doing so
> currently either.

There is no place doing that on HEAD.

This specific point was mentioned in the first message of this thread,
7th paragraph.  That was a long thread, so it is easy to miss:
https://www.postgresql.org/message-id/CAOBaU_aVvMjQn=ge5qPiJOPMmOj5=ii3st5Q0Y+WuLML5sR17w@mail.gmail.com

I am wondering what you have in mind regarding the use of
BM_IO_IN_PROGRESS or a similar flag.  Wouldn't that imply some
consequences for other existing buffers in the table, like a possible
eviction?  I'd like to think that we should not do any manipulation of
the buffer tables in this case.  Hence, in order to prevent a
concurrent activity to load in shared buffers the page currently
checked on disk, I got to think that we would need something new here,
like a filtering hash table that would be checked each time a backend
tries to insert an entry into the buffer tables.  That's something I
was wondering here:
https://www.postgresql.org/message-id/20200316030638.GA2331@paquier.xyz
I called that a preemptive lock, but you could also call that a
discard filter or a virtual pin, just something to mean that a page
locked this way cannot be loaded into the shared buffers.  I'd like to
think that this should not touch the existing buffer table, but it
would impact the performance when attempting to insert an entry in the
tables, as anything would need to be pre-checked.

Assuming that we could make this thing work without holding the
partition lock, and assuming that we only hold a share lock on the
relation, we have two cases:
1) If the buffer is in shared buffers, we have the APIs to solve that
by using a pin, unlock the partition, and then do the I/O.  (Still
that's unsafe with the smgrwrite() argument?)
2) If the buffer is not in shared buffers, we don't have what it takes
to solve the problem yet.  But even if we solve this problem, we will
never really be sure that this is entirely safe, as per the argument
with concurrent smgrwrite() calls.  Current in-core code assumes that
this can happen only for init forks of unlogged relations which would
not be visible yet in the backend doing a page check, still it can be
really easy to break this assumption with any new code added by a new
feature.

These arguments bring down to reduce the scope of CheckBuffer() as
follows:
- Use an AEL on the relation, pass down a Relation instead of
SMgrRelation, and add on the way an assertion to make sure that the
caller holds an AEL on the relation.  I wanted to study the possiblity
to use that stuff for base backups, but if you bring the concurrent
smgrwrite() calls into the set of possibilities this shuts down the
whole study from the start.
- It is still useful to check that a page is in shared buffers IMO, so
as if it is dirty we just discard it from the checks and rely on the
next checkpoint to do a flush.  It is also useful to check the state
of the on-disk data is good or not if the page is not dirty, as the
page could have gone rogue on-disk while a system was up for weeks.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Parallel Append can break run-time partition pruning
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Explicit NULL dereference (src/backend/utils/adt/ruleutils.c)