Re: Online checksums verification in the backend - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Online checksums verification in the backend |
Date | |
Msg-id | 20201102013940.tk5e2co2maedzksw@alap3.anarazel.de Whole thread Raw |
In response to | Re: Online checksums verification in the backend (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: Online checksums verification in the backend
|
List | pgsql-hackers |
Hi, I'm a bit limited writing - one handed for a while following an injury on Friday... On 2020-11-02 10:05:25 +0900, Michael Paquier wrote: > On Thu, Oct 29, 2020 at 10:08:52PM -0700, Andres Freund wrote: > > I think its pretty much *never* OK to do IO while holding a buffer > > mapping lock. You're locking a significant fraction of shared buffers > > over IO. That's just not OK. Don't think there's any place doing so > > currently either. > > There is no place doing that on HEAD. Err? /* see if the block is in the buffer pool or not */ LWLockAcquire(partLock, LW_SHARED); buf_id = BufTableLookup(&buf_tag, buf_hash); if (buf_id >= 0) { ... /* Read the buffer from disk, with the I/O lock still held */ smgrread(smgr, forknum, blkno, buffer); LWLockRelease(BufferDescriptorGetIOLock(bufdesc)); } else { /* * Simply read the buffer. There's no risk of modification on it as * we are holding the buffer pool partition mapping lock. */ smgrread(smgr, forknum, blkno, buffer); } /* buffer lookup done, so now do its check */ LWLockRelease(partLock); How is this not doing IO while holding a buffer mapping lock? > This specific point was mentioned in the first message of this thread, > 7th paragraph. That was a long thread, so it is easy to miss: > https://www.postgresql.org/message-id/CAOBaU_aVvMjQn=ge5qPiJOPMmOj5=ii3st5Q0Y+WuLML5sR17w@mail.gmail.com The code clearly doesnt implement it that way. > I am wondering what you have in mind regarding the use of > BM_IO_IN_PROGRESS or a similar flag. Wouldn't that imply some > consequences for other existing buffers in the table, like a possible > eviction? You'd need exactly one empty buffer for that - it can be reused for the next to-be-checked buffer. > I'd like to think that we should not do any manipulation of > the buffer tables in this case. Why? Its the way we lock buffers - why is this so special that we need to do differently? > Hence, in order to prevent a > concurrent activity to load in shared buffers the page currently > checked on disk, I got to think that we would need something new here, > like a filtering hash table that would be checked each time a backend > tries to insert an entry into the buffer tables. Thats going to slow down everything a bit - the mapping already is a bottleneck. > 1) If the buffer is in shared buffers, we have the APIs to solve that > by using a pin, unlock the partition, and then do the I/O. (Still > that's unsafe with the smgrwrite() argument?) Thats why you need an appropriate relation lock... Something CheckBuffer didnt bother to document. Its a restriction, but one we probably can live with. > 2) If the buffer is not in shared buffers, we don't have what it takes > to solve the problem yet. We do. Set up enough state for the case to be otherwise the same as the in s_b case. > But even if we solve this problem, we will > never really be sure that this is entirely safe, as per the argument > with concurrent smgrwrite() calls. Current in-core code assumes that > this can happen only for init forks of unlogged relations which would > not be visible yet in the backend doing a page check, still it can be > really easy to break this assumption with any new code added by a new > feature. It also happens in a few other cases than just init forks. But visibility & relation locking can take care of that. But you need to document that. If the locking allows concurent readers - and especially concurrent writers, then you cant really use smgrwite for anything but relation extension. Greetings, Andres Freund
pgsql-hackers by date: