Re: Online checksums verification in the backend - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Online checksums verification in the backend
Date
Msg-id 20201029181729.2nrub47u7yqncsv7@alap3.anarazel.de
Whole thread Raw
In response to Re: Online checksums verification in the backend  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Online checksums verification in the backend
Re: Online checksums verification in the backend
Re: Online checksums verification in the backend
Re: Online checksums verification in the backend
List pgsql-hackers
Hi,

On 2020-10-28 14:08:52 +0900, Michael Paquier wrote:
> Thanks for confirming.  I have gone through the whole set today,
> splitted the thing into two commits and applied them.  We had
> buildfarm member florican complain about a mistake in one of the
> GetDatum() calls that I took care of already, and there is nothing
> else on my radar.

The code does IO while holding the buffer mapping lock. That seems
*entirely* unacceptable to me. That basically locks 1/128 of shared
buffers against concurrent mapping changes, while reading data that is
likely not to be on disk?  Seriously?


    /* see if the block is in the buffer pool or not */
    LWLockAcquire(partLock, LW_SHARED);
    buf_id = BufTableLookup(&buf_tag, buf_hash);
    if (buf_id >= 0)
    {
        uint32        buf_state;

        /*
         * Found it.  Now, retrieve its state to know what to do with it, and
         * release the pin immediately.  We do so to limit overhead as much as
         * possible.  We keep the shared LWLock on the target buffer mapping
         * partition for now, so this buffer cannot be evicted, and we acquire
         * an I/O Lock on the buffer as we may need to read its contents from
         * disk.
         */
        bufdesc = GetBufferDescriptor(buf_id);

        LWLockAcquire(BufferDescriptorGetIOLock(bufdesc), LW_SHARED);
        buf_state = LockBufHdr(bufdesc);
        UnlockBufHdr(bufdesc, buf_state);

        /* If the page is dirty or invalid, skip it */
        if ((buf_state & BM_DIRTY) != 0 || (buf_state & BM_TAG_VALID) == 0)
        {
            LWLockRelease(BufferDescriptorGetIOLock(bufdesc));
            LWLockRelease(partLock);
            return true;
        }

        /* Read the buffer from disk, with the I/O lock still held */
        smgrread(smgr, forknum, blkno, buffer);
        LWLockRelease(BufferDescriptorGetIOLock(bufdesc));
    }
    else
    {
        /*
         * Simply read the buffer.  There's no risk of modification on it as
         * we are holding the buffer pool partition mapping lock.
         */
        smgrread(smgr, forknum, blkno, buffer);
    }


The justification in the in-shared-buffers case seems to completely
mis-judge costs too:
         * Found it.  Now, retrieve its state to know what to do with it, and
         * release the pin immediately.  We do so to limit overhead as much as
         * possible.  We keep the shared LWLock on the target buffer mapping
         * partition for now, so this buffer cannot be evicted, and we acquire
         * an I/O Lock on the buffer as we may need to read its contents from
         * disk.
a pin is cheap. Holding the partition lock is not.


Also, using char[BLCKSZ] as a buffer isn't ok. This should use
PGAlignedBlock:
/*
 * Use this, not "char buf[BLCKSZ]", to declare a field or local variable
 * holding a page buffer, if that page might be accessed as a page and not
 * just a string of bytes.  Otherwise the variable might be under-aligned,
 * causing problems on alignment-picky hardware.  (In some places, we use
 * this to declare buffers even though we only pass them to read() and
 * write(), because copying to/from aligned buffers is usually faster than
 * using unaligned buffers.)  We include both "double" and "int64" in the
 * union to ensure that the compiler knows the value must be MAXALIGN'ed
 * (cf. configure's computation of MAXIMUM_ALIGNOF).
 */
typedef union PGAlignedBlock


I think this needs to be quickly reworked or reverted.


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Autovacuum worker doesn't immediately exit on postmaster death
Next
From: Andres Freund
Date:
Subject: Re: Online checksums verification in the backend