Re: refactoring relation extension and BufferAlloc(), faster COPY - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: refactoring relation extension and BufferAlloc(), faster COPY
Date
Msg-id 8185d3a4-5123-10ca-db33-30b4a03fff7e@iki.fi
Whole thread Raw
In response to Re: refactoring relation extension and BufferAlloc(), faster COPY  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On 11/02/2023 23:36, Andres Freund wrote:
> On 2023-02-11 23:03:56 +0200, Heikki Linnakangas wrote:
>> * I don't understand this comment:
>>
>>>     /*
>>>      * Clear out the buffer's tag and flags and usagecount.  We must do
>>>      * this to ensure that linear scans of the buffer array don't think
>>>      * the buffer is valid.
>>>      *
>>>      * XXX: This is a pre-existing comment I just moved, but isn't it
>>>      * entirely bogus with regard to the tag? We can't do anything with
>>>      * the buffer without taking BM_VALID / BM_TAG_VALID into
>>>      * account. Likely doesn't matter because we're already dirtying the
>>>      * cacheline, but still.
>>>      *
>>>      */
>>>     ClearBufferTag(&buf_hdr->tag);
>>>     buf_state &= ~(BUF_FLAG_MASK | BUF_USAGECOUNT_MASK);
>>>     UnlockBufHdr(buf_hdr, buf_state);
>>
>> What exactly is wrong with clearing the tag? What does dirtying the
>> cacheline have to do with the correctness here?
> 
> There's nothing wrong with clearing out the tag, but I don't think it's a hard
> requirement today, and certainly not for the reason stated above.
> 
> Validity of the buffer isn't determined by the tag, it's determined by
> BM_VALID (or, if you interpret valid more widely, BM_TAG_VALID).
> 
> Without either having pinned the buffer, or holding the buffer header
> spinlock, the tag can change at any time. And code like DropDatabaseBuffers()
> knows that, and re-checks the the tag after locking the buffer header
> spinlock.
> 
> Afaict, there'd be no correctness issue with removing the
> ClearBufferTag(). There would be an efficiency issue though, because when
> encountering an invalid buffer, we'd unnecessarily enter InvalidateBuffer(),
> which'd find that BM_[TAG_]VALID isn't set, and not to anything.

Okay, gotcha.

> Even though it's not a correctness issue, it seems to me that
> DropRelationsAllBuffers() etc ought to check if the buffer is BM_TAG_VALID,
> before doing anything further.  Particularly in DropRelationsAllBuffers(), the
> check we do for each buffer isn't cheap. Doing it for buffers that don't even
> have a tag seems .. not smart.

Depends on what percentage of buffers are valid, I guess. If all buffers 
are valid, checking BM_TAG_VALID first would lose. In practice, I doubt 
it makes any measurable difference either way.

Since we're micro-optimizing, I noticed that 
BufTagMatchesRelFileLocator() compares the fields in order "spcOid, 
dbOid, relNumber". Before commit 82ac34db20, we used 
RelFileLocatorEqual(), which has this comment:

/*
  * Note: RelFileLocatorEquals and RelFileLocatorBackendEquals compare 
relNumber
  * first since that is most likely to be different in two unequal
  * RelFileLocators.  It is probably redundant to compare spcOid if the 
other
  * fields are found equal, but do it anyway to be sure.  Likewise for 
checking
  * the backend ID in RelFileLocatorBackendEquals.
  */

So we lost that micro-optimization. Should we reorder the checks in 
BufTagMatchesRelFileLocator()?

- Heikki




pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Missing llvm_leave_fatal_on_oom() call
Next
From: Heikki Linnakangas
Date:
Subject: Re: refactoring relation extension and BufferAlloc(), faster COPY