Re: refactoring relation extension and BufferAlloc(), faster COPY - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: refactoring relation extension and BufferAlloc(), faster COPY |
Date | |
Msg-id | 20230211213651.pu4ns2764fxpyit5@awork3.anarazel.de Whole thread Raw |
In response to | Re: refactoring relation extension and BufferAlloc(), faster COPY (Heikki Linnakangas <hlinnaka@iki.fi>) |
Responses |
Re: refactoring relation extension and BufferAlloc(), faster COPY
Re: refactoring relation extension and BufferAlloc(), faster COPY |
List | pgsql-hackers |
Hi, On 2023-02-11 23:03:56 +0200, Heikki Linnakangas wrote: > > v2-0005-bufmgr-Acquire-and-clean-victim-buffer-separately.patch > This can be applied separately from the rest of the patches, which is nice. > Some small comments on it: Thanks for looking at these! > * Needs a rebase, it conflicted slightly with commit f30d62c2fc. Will work on that. > * GetVictimBuffer needs a comment to explain what it does. In particular, > mention that it returns a buffer that's pinned and known !BM_TAG_VALID. Will add. > * I suggest renaming 'cur_buf' and other such local variables in > GetVictimBufffer to just 'buf'. 'cur' prefix suggests that there is some > other buffer involved too, but there is no 'prev' or 'next' or 'other' > buffer. The old code called it just 'buf' too, and before this patch it > actually was a bit confusing because there were two buffers involved. But > with this patch, GetVictimBuffer only deals with one buffer at a time. Hm. Yea. I probably ended up with these names because initially GetVictimBuffer() wasn't a separate function, and I indeed constantly got confused by which buffer was referenced. > * This FIXME: > > > /* OK, do the I/O */ > > /* FIXME: These used the wrong smgr before afaict? */ > > { > > SMgrRelation smgr = smgropen(BufTagGetRelFileLocator(&buf_hdr->tag), > > InvalidBackendId); > > > > TRACE_POSTGRESQL_BUFFER_WRITE_DIRTY_START(buf_hdr->tag.forkNum, > > buf_hdr->tag.blockNum, > > smgr->smgr_rlocator.locator.spcOid, > > smgr->smgr_rlocator.locator.dbOid, > > smgr->smgr_rlocator.locator.relNumber); > > > > FlushBuffer(buf_hdr, smgr, IOOBJECT_RELATION, io_context); > > LWLockRelease(content_lock); > > > > ScheduleBufferTagForWriteback(&BackendWritebackContext, > > &buf_hdr->tag); > > > > TRACE_POSTGRESQL_BUFFER_WRITE_DIRTY_DONE(buf_hdr->tag.forkNum, > > buf_hdr->tag.blockNum, > > smgr->smgr_rlocator.locator.spcOid, > > smgr->smgr_rlocator.locator.dbOid, > > smgr->smgr_rlocator.locator.relNumber); > > } > > I believe that was intentional. The probes previously reported the block and > relation whose read *caused* the eviction. It was not just the smgr but also > the blockNum and forkNum that referred to the block that was being read. You're probably right. It's certainly not understandable from our docs though: <row> <entry><literal>buffer-write-dirty-start</literal></entry> <entry><literal>(ForkNumber, BlockNumber, Oid, Oid, Oid)</literal></entry> <entry>Probe that fires when a server process begins to write a dirty buffer. (If this happens often, it implies that <xref linkend="guc-shared-buffers"/> is too small or the background writer control parameters need adjustment.) arg0 and arg1 contain the fork and block numbers of the page. arg2, arg3, and arg4 contain the tablespace, database, and relation OIDs identifying the relation.</entry> </row> > I see that reporting the evicted page is more convenient with this patch, > otherwise you'd need to pass the smgr and blocknum of the page that's being > read to InvalidateVictimBuffer(). IMHO you can just remove these probe > points. We don't need to bend over backwards to maintain specific probe > points. Agreed. > * InvalidateVictimBuffer reads the buffer header with an atomic read op, > just to check if BM_TAG_VALID is set. It's not a real atomic op, in the sense of being special instruction. It does force the compiler to actually read from memory, but that's it. But you're right, even that is unnecessary. I think it ended up that way because I also wanted the full buf_hdr, and it seemed somewhat error prone to pass in both. > * I don't understand this comment: > > > /* > > * Clear out the buffer's tag and flags and usagecount. We must do > > * this to ensure that linear scans of the buffer array don't think > > * the buffer is valid. > > * > > * XXX: This is a pre-existing comment I just moved, but isn't it > > * entirely bogus with regard to the tag? We can't do anything with > > * the buffer without taking BM_VALID / BM_TAG_VALID into > > * account. Likely doesn't matter because we're already dirtying the > > * cacheline, but still. > > * > > */ > > ClearBufferTag(&buf_hdr->tag); > > buf_state &= ~(BUF_FLAG_MASK | BUF_USAGECOUNT_MASK); > > UnlockBufHdr(buf_hdr, buf_state); > > What exactly is wrong with clearing the tag? What does dirtying the > cacheline have to do with the correctness here? There's nothing wrong with clearing out the tag, but I don't think it's a hard requirement today, and certainly not for the reason stated above. Validity of the buffer isn't determined by the tag, it's determined by BM_VALID (or, if you interpret valid more widely, BM_TAG_VALID). Without either having pinned the buffer, or holding the buffer header spinlock, the tag can change at any time. And code like DropDatabaseBuffers() knows that, and re-checks the the tag after locking the buffer header spinlock. Afaict, there'd be no correctness issue with removing the ClearBufferTag(). There would be an efficiency issue though, because when encountering an invalid buffer, we'd unnecessarily enter InvalidateBuffer(), which'd find that BM_[TAG_]VALID isn't set, and not to anything. Even though it's not a correctness issue, it seems to me that DropRelationsAllBuffers() etc ought to check if the buffer is BM_TAG_VALID, before doing anything further. Particularly in DropRelationsAllBuffers(), the check we do for each buffer isn't cheap. Doing it for buffers that don't even have a tag seems .. not smart. Greetings, Andres Freund
pgsql-hackers by date: