Re: Online checksums verification in the backend - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Online checksums verification in the backend |
Date | |
Msg-id | 20201029181729.2nrub47u7yqncsv7@alap3.anarazel.de Whole thread Raw |
In response to | Re: Online checksums verification in the backend (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: Online checksums verification in the backend
Re: Online checksums verification in the backend Re: Online checksums verification in the backend Re: Online checksums verification in the backend |
List | pgsql-hackers |
Hi, On 2020-10-28 14:08:52 +0900, Michael Paquier wrote: > Thanks for confirming. I have gone through the whole set today, > splitted the thing into two commits and applied them. We had > buildfarm member florican complain about a mistake in one of the > GetDatum() calls that I took care of already, and there is nothing > else on my radar. The code does IO while holding the buffer mapping lock. That seems *entirely* unacceptable to me. That basically locks 1/128 of shared buffers against concurrent mapping changes, while reading data that is likely not to be on disk? Seriously? /* see if the block is in the buffer pool or not */ LWLockAcquire(partLock, LW_SHARED); buf_id = BufTableLookup(&buf_tag, buf_hash); if (buf_id >= 0) { uint32 buf_state; /* * Found it. Now, retrieve its state to know what to do with it, and * release the pin immediately. We do so to limit overhead as much as * possible. We keep the shared LWLock on the target buffer mapping * partition for now, so this buffer cannot be evicted, and we acquire * an I/O Lock on the buffer as we may need to read its contents from * disk. */ bufdesc = GetBufferDescriptor(buf_id); LWLockAcquire(BufferDescriptorGetIOLock(bufdesc), LW_SHARED); buf_state = LockBufHdr(bufdesc); UnlockBufHdr(bufdesc, buf_state); /* If the page is dirty or invalid, skip it */ if ((buf_state & BM_DIRTY) != 0 || (buf_state & BM_TAG_VALID) == 0) { LWLockRelease(BufferDescriptorGetIOLock(bufdesc)); LWLockRelease(partLock); return true; } /* Read the buffer from disk, with the I/O lock still held */ smgrread(smgr, forknum, blkno, buffer); LWLockRelease(BufferDescriptorGetIOLock(bufdesc)); } else { /* * Simply read the buffer. There's no risk of modification on it as * we are holding the buffer pool partition mapping lock. */ smgrread(smgr, forknum, blkno, buffer); } The justification in the in-shared-buffers case seems to completely mis-judge costs too: * Found it. Now, retrieve its state to know what to do with it, and * release the pin immediately. We do so to limit overhead as much as * possible. We keep the shared LWLock on the target buffer mapping * partition for now, so this buffer cannot be evicted, and we acquire * an I/O Lock on the buffer as we may need to read its contents from * disk. a pin is cheap. Holding the partition lock is not. Also, using char[BLCKSZ] as a buffer isn't ok. This should use PGAlignedBlock: /* * Use this, not "char buf[BLCKSZ]", to declare a field or local variable * holding a page buffer, if that page might be accessed as a page and not * just a string of bytes. Otherwise the variable might be under-aligned, * causing problems on alignment-picky hardware. (In some places, we use * this to declare buffers even though we only pass them to read() and * write(), because copying to/from aligned buffers is usually faster than * using unaligned buffers.) We include both "double" and "int64" in the * union to ensure that the compiler knows the value must be MAXALIGN'ed * (cf. configure's computation of MAXIMUM_ALIGNOF). */ typedef union PGAlignedBlock I think this needs to be quickly reworked or reverted. Greetings, Andres Freund
pgsql-hackers by date: