Re: refactoring relation extension and BufferAlloc(), faster COPY - Mailing list pgsql-hackers

From Andres Freund
Subject Re: refactoring relation extension and BufferAlloc(), faster COPY
Date
Msg-id 20230211213651.pu4ns2764fxpyit5@awork3.anarazel.de
Whole thread Raw
In response to Re: refactoring relation extension and BufferAlloc(), faster COPY  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: refactoring relation extension and BufferAlloc(), faster COPY  (Andres Freund <andres@anarazel.de>)
Re: refactoring relation extension and BufferAlloc(), faster COPY  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
Hi,

On 2023-02-11 23:03:56 +0200, Heikki Linnakangas wrote:
> > v2-0005-bufmgr-Acquire-and-clean-victim-buffer-separately.patch
> This can be applied separately from the rest of the patches, which is nice.
> Some small comments on it:

Thanks for looking at these!


> * Needs a rebase, it conflicted slightly with commit f30d62c2fc.

Will work on that.


> * GetVictimBuffer needs a comment to explain what it does. In particular,
> mention that it returns a buffer that's pinned and known !BM_TAG_VALID.

Will add.


> * I suggest renaming 'cur_buf' and other such local variables in
> GetVictimBufffer to just 'buf'. 'cur' prefix suggests that there is some
> other buffer involved too, but there is no 'prev' or 'next' or 'other'
> buffer. The old code called it just 'buf' too, and before this patch it
> actually was a bit confusing because there were two buffers involved. But
> with this patch, GetVictimBuffer only deals with one buffer at a time.

Hm. Yea. I probably ended up with these names because initially
GetVictimBuffer() wasn't a separate function, and I indeed constantly got
confused by which buffer was referenced.


> * This FIXME:
>
> >         /* OK, do the I/O */
> >         /* FIXME: These used the wrong smgr before afaict? */
> >         {
> >             SMgrRelation smgr = smgropen(BufTagGetRelFileLocator(&buf_hdr->tag),
> >                                          InvalidBackendId);
> >
> >             TRACE_POSTGRESQL_BUFFER_WRITE_DIRTY_START(buf_hdr->tag.forkNum,
> >                                                       buf_hdr->tag.blockNum,
> >                                                       smgr->smgr_rlocator.locator.spcOid,
> >                                                       smgr->smgr_rlocator.locator.dbOid,
> >                                                       smgr->smgr_rlocator.locator.relNumber);
> >
> >             FlushBuffer(buf_hdr, smgr, IOOBJECT_RELATION, io_context);
> >             LWLockRelease(content_lock);
> >
> >             ScheduleBufferTagForWriteback(&BackendWritebackContext,
> >                                           &buf_hdr->tag);
> >
> >             TRACE_POSTGRESQL_BUFFER_WRITE_DIRTY_DONE(buf_hdr->tag.forkNum,
> >                                                      buf_hdr->tag.blockNum,
> >                                                      smgr->smgr_rlocator.locator.spcOid,
> >                                                      smgr->smgr_rlocator.locator.dbOid,
> >                                                      smgr->smgr_rlocator.locator.relNumber);
> >         }
>
> I believe that was intentional. The probes previously reported the block and
> relation whose read *caused* the eviction. It was not just the smgr but also
> the blockNum and forkNum that referred to the block that was being read.

You're probably right.  It's certainly not understandable from our docs
though:

    <row>
     <entry><literal>buffer-write-dirty-start</literal></entry>
     <entry><literal>(ForkNumber, BlockNumber, Oid, Oid, Oid)</literal></entry>
     <entry>Probe that fires when a server process begins to write a dirty
      buffer.  (If this happens often, it implies that
      <xref linkend="guc-shared-buffers"/> is too
      small or the background writer control parameters need adjustment.)
      arg0 and arg1 contain the fork and block numbers of the page.
      arg2, arg3, and arg4 contain the tablespace, database, and relation OIDs
      identifying the relation.</entry>
    </row>


> I see that reporting the evicted page is more convenient with this patch,
> otherwise you'd need to pass the smgr and blocknum of the page that's being
> read to InvalidateVictimBuffer(). IMHO you can just remove these probe
> points. We don't need to bend over backwards to maintain specific probe
> points.

Agreed.


> * InvalidateVictimBuffer reads the buffer header with an atomic read op,
> just to check if BM_TAG_VALID is set.

It's not a real atomic op, in the sense of being special instruction. It does
force the compiler to actually read from memory, but that's it.

But you're right, even that is unnecessary. I think it ended up that way
because I also wanted the full buf_hdr, and it seemed somewhat error prone to
pass in both.



> * I don't understand this comment:
>
> >     /*
> >      * Clear out the buffer's tag and flags and usagecount.  We must do
> >      * this to ensure that linear scans of the buffer array don't think
> >      * the buffer is valid.
> >      *
> >      * XXX: This is a pre-existing comment I just moved, but isn't it
> >      * entirely bogus with regard to the tag? We can't do anything with
> >      * the buffer without taking BM_VALID / BM_TAG_VALID into
> >      * account. Likely doesn't matter because we're already dirtying the
> >      * cacheline, but still.
> >      *
> >      */
> >     ClearBufferTag(&buf_hdr->tag);
> >     buf_state &= ~(BUF_FLAG_MASK | BUF_USAGECOUNT_MASK);
> >     UnlockBufHdr(buf_hdr, buf_state);
>
> What exactly is wrong with clearing the tag? What does dirtying the
> cacheline have to do with the correctness here?

There's nothing wrong with clearing out the tag, but I don't think it's a hard
requirement today, and certainly not for the reason stated above.

Validity of the buffer isn't determined by the tag, it's determined by
BM_VALID (or, if you interpret valid more widely, BM_TAG_VALID).

Without either having pinned the buffer, or holding the buffer header
spinlock, the tag can change at any time. And code like DropDatabaseBuffers()
knows that, and re-checks the the tag after locking the buffer header
spinlock.

Afaict, there'd be no correctness issue with removing the
ClearBufferTag(). There would be an efficiency issue though, because when
encountering an invalid buffer, we'd unnecessarily enter InvalidateBuffer(),
which'd find that BM_[TAG_]VALID isn't set, and not to anything.


Even though it's not a correctness issue, it seems to me that
DropRelationsAllBuffers() etc ought to check if the buffer is BM_TAG_VALID,
before doing anything further.  Particularly in DropRelationsAllBuffers(), the
check we do for each buffer isn't cheap. Doing it for buffers that don't even
have a tag seems .. not smart.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: proposal: psql: psql variable BACKEND_PID
Next
From: Corey Huinker
Date:
Subject: Re: Improving inferred query column names