Re: Online checksums verification in the backend - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: Online checksums verification in the backend |
Date | |
Msg-id | 20201102010525.GA27101@paquier.xyz Whole thread Raw |
In response to | Re: Online checksums verification in the backend (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Online checksums verification in the backend
|
List | pgsql-hackers |
On Thu, Oct 29, 2020 at 10:08:52PM -0700, Andres Freund wrote: > I think its pretty much *never* OK to do IO while holding a buffer > mapping lock. You're locking a significant fraction of shared buffers > over IO. That's just not OK. Don't think there's any place doing so > currently either. There is no place doing that on HEAD. This specific point was mentioned in the first message of this thread, 7th paragraph. That was a long thread, so it is easy to miss: https://www.postgresql.org/message-id/CAOBaU_aVvMjQn=ge5qPiJOPMmOj5=ii3st5Q0Y+WuLML5sR17w@mail.gmail.com I am wondering what you have in mind regarding the use of BM_IO_IN_PROGRESS or a similar flag. Wouldn't that imply some consequences for other existing buffers in the table, like a possible eviction? I'd like to think that we should not do any manipulation of the buffer tables in this case. Hence, in order to prevent a concurrent activity to load in shared buffers the page currently checked on disk, I got to think that we would need something new here, like a filtering hash table that would be checked each time a backend tries to insert an entry into the buffer tables. That's something I was wondering here: https://www.postgresql.org/message-id/20200316030638.GA2331@paquier.xyz I called that a preemptive lock, but you could also call that a discard filter or a virtual pin, just something to mean that a page locked this way cannot be loaded into the shared buffers. I'd like to think that this should not touch the existing buffer table, but it would impact the performance when attempting to insert an entry in the tables, as anything would need to be pre-checked. Assuming that we could make this thing work without holding the partition lock, and assuming that we only hold a share lock on the relation, we have two cases: 1) If the buffer is in shared buffers, we have the APIs to solve that by using a pin, unlock the partition, and then do the I/O. (Still that's unsafe with the smgrwrite() argument?) 2) If the buffer is not in shared buffers, we don't have what it takes to solve the problem yet. But even if we solve this problem, we will never really be sure that this is entirely safe, as per the argument with concurrent smgrwrite() calls. Current in-core code assumes that this can happen only for init forks of unlogged relations which would not be visible yet in the backend doing a page check, still it can be really easy to break this assumption with any new code added by a new feature. These arguments bring down to reduce the scope of CheckBuffer() as follows: - Use an AEL on the relation, pass down a Relation instead of SMgrRelation, and add on the way an assertion to make sure that the caller holds an AEL on the relation. I wanted to study the possiblity to use that stuff for base backups, but if you bring the concurrent smgrwrite() calls into the set of possibilities this shuts down the whole study from the start. - It is still useful to check that a page is in shared buffers IMO, so as if it is dirty we just discard it from the checks and rely on the next checkpoint to do a flush. It is also useful to check the state of the on-disk data is good or not if the page is not dirty, as the page could have gone rogue on-disk while a system was up for weeks. -- Michael
Attachment
pgsql-hackers by date: