Re: refactoring relation extension and BufferAlloc(), faster COPY - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: refactoring relation extension and BufferAlloc(), faster COPY |
Date | |
Msg-id | 8185d3a4-5123-10ca-db33-30b4a03fff7e@iki.fi Whole thread Raw |
In response to | Re: refactoring relation extension and BufferAlloc(), faster COPY (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
On 11/02/2023 23:36, Andres Freund wrote: > On 2023-02-11 23:03:56 +0200, Heikki Linnakangas wrote: >> * I don't understand this comment: >> >>> /* >>> * Clear out the buffer's tag and flags and usagecount. We must do >>> * this to ensure that linear scans of the buffer array don't think >>> * the buffer is valid. >>> * >>> * XXX: This is a pre-existing comment I just moved, but isn't it >>> * entirely bogus with regard to the tag? We can't do anything with >>> * the buffer without taking BM_VALID / BM_TAG_VALID into >>> * account. Likely doesn't matter because we're already dirtying the >>> * cacheline, but still. >>> * >>> */ >>> ClearBufferTag(&buf_hdr->tag); >>> buf_state &= ~(BUF_FLAG_MASK | BUF_USAGECOUNT_MASK); >>> UnlockBufHdr(buf_hdr, buf_state); >> >> What exactly is wrong with clearing the tag? What does dirtying the >> cacheline have to do with the correctness here? > > There's nothing wrong with clearing out the tag, but I don't think it's a hard > requirement today, and certainly not for the reason stated above. > > Validity of the buffer isn't determined by the tag, it's determined by > BM_VALID (or, if you interpret valid more widely, BM_TAG_VALID). > > Without either having pinned the buffer, or holding the buffer header > spinlock, the tag can change at any time. And code like DropDatabaseBuffers() > knows that, and re-checks the the tag after locking the buffer header > spinlock. > > Afaict, there'd be no correctness issue with removing the > ClearBufferTag(). There would be an efficiency issue though, because when > encountering an invalid buffer, we'd unnecessarily enter InvalidateBuffer(), > which'd find that BM_[TAG_]VALID isn't set, and not to anything. Okay, gotcha. > Even though it's not a correctness issue, it seems to me that > DropRelationsAllBuffers() etc ought to check if the buffer is BM_TAG_VALID, > before doing anything further. Particularly in DropRelationsAllBuffers(), the > check we do for each buffer isn't cheap. Doing it for buffers that don't even > have a tag seems .. not smart. Depends on what percentage of buffers are valid, I guess. If all buffers are valid, checking BM_TAG_VALID first would lose. In practice, I doubt it makes any measurable difference either way. Since we're micro-optimizing, I noticed that BufTagMatchesRelFileLocator() compares the fields in order "spcOid, dbOid, relNumber". Before commit 82ac34db20, we used RelFileLocatorEqual(), which has this comment: /* * Note: RelFileLocatorEquals and RelFileLocatorBackendEquals compare relNumber * first since that is most likely to be different in two unequal * RelFileLocators. It is probably redundant to compare spcOid if the other * fields are found equal, but do it anyway to be sure. Likewise for checking * the backend ID in RelFileLocatorBackendEquals. */ So we lost that micro-optimization. Should we reorder the checks in BufTagMatchesRelFileLocator()? - Heikki
pgsql-hackers by date: