Home > mailing lists

Re: Online checksums verification in the backend - Mailing list pgsql-hackers

From	Michael Paquier
Subject	Re: Online checksums verification in the backend
Date	November 2, 2020 01:05:25
Msg-id	20201102010525.GA27101@paquier.xyz Whole thread Raw
In response to	Re: Online checksums verification in the backend (Andres Freund <andres@anarazel.de>)
Responses	Re: Online checksums verification in the backend
List	pgsql-hackers

Tree view

On Thu, Oct 29, 2020 at 10:08:52PM -0700, Andres Freund wrote:
> I think its pretty much *never* OK to do IO while holding a buffer
> mapping lock. You're locking a significant fraction of shared buffers
> over IO. That's just not OK. Don't think there's any place doing so
> currently either.

There is no place doing that on HEAD.

This specific point was mentioned in the first message of this thread,
7th paragraph. That was a long thread, so it is easy to miss:
https://www.postgresql.org/message-id/CAOBaU_aVvMjQn=ge5qPiJOPMmOj5=ii3st5Q0Y+WuLML5sR17w@mail.gmail.com

I am wondering what you have in mind regarding the use of
BM_IO_IN_PROGRESS or a similar flag. Wouldn't that imply some
consequences for other existing buffers in the table, like a possible
eviction? I'd like to think that we should not do any manipulation of
the buffer tables in this case. Hence, in order to prevent a
concurrent activity to load in shared buffers the page currently
checked on disk, I got to think that we would need something new here,
like a filtering hash table that would be checked each time a backend
tries to insert an entry into the buffer tables. That's something I
was wondering here:
https://www.postgresql.org/message-id/20200316030638.GA2331@paquier.xyz
I called that a preemptive lock, but you could also call that a
discard filter or a virtual pin, just something to mean that a page
locked this way cannot be loaded into the shared buffers. I'd like to
think that this should not touch the existing buffer table, but it
would impact the performance when attempting to insert an entry in the
tables, as anything would need to be pre-checked.

Assuming that we could make this thing work without holding the
partition lock, and assuming that we only hold a share lock on the
relation, we have two cases:
1) If the buffer is in shared buffers, we have the APIs to solve that
by using a pin, unlock the partition, and then do the I/O. (Still
that's unsafe with the smgrwrite() argument?)
2) If the buffer is not in shared buffers, we don't have what it takes
to solve the problem yet. But even if we solve this problem, we will
never really be sure that this is entirely safe, as per the argument
with concurrent smgrwrite() calls. Current in-core code assumes that
this can happen only for init forks of unlogged relations which would
not be visible yet in the backend doing a page check, still it can be
really easy to break this assumption with any new code added by a new
feature.

These arguments bring down to reduce the scope of CheckBuffer() as
follows:
- Use an AEL on the relation, pass down a Relation instead of
SMgrRelation, and add on the way an assertion to make sure that the
caller holds an AEL on the relation. I wanted to study the possiblity
to use that stuff for base backups, but if you bring the concurrent
smgrwrite() calls into the set of possibilities this shuts down the
whole study from the start.
- It is still useful to check that a page is in shared buffers IMO, so
as if it is dirty we just discard it from the checks and rely on the
next checkpoint to do a flush. It is also useful to check the state
of the on-disk data is good or not if the page is not dirty, as the
page could have gone rogue on-disk while a system was up for weeks.
--
Michael

Attachment

signature.asc

pgsql-hackers by date:

From: David Rowley
Date: 02 November 2020, 00:50:57
Subject: Re: Parallel Append can break run-time partition pruning

From: Kyotaro Horiguchi
Date: 02 November 2020, 01:36:10
Subject: Re: Explicit NULL dereference (src/backend/utils/adt/ruleutils.c)

Re: Online checksums verification in the backend - Mailing list pgsql-hackers

Attachment

Previous

Next