Re: refactoring relation extension and BufferAlloc(), faster COPY - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: refactoring relation extension and BufferAlloc(), faster COPY |
Date | |
Msg-id | 20230329034754.elzpfwacqmapv6pj@awork3.anarazel.de Whole thread Raw |
In response to | Re: refactoring relation extension and BufferAlloc(), faster COPY (Melanie Plageman <melanieplageman@gmail.com>) |
Responses |
Re: refactoring relation extension and BufferAlloc(), faster COPY
|
List | pgsql-hackers |
Hi, On 2023-03-26 17:42:45 -0400, Melanie Plageman wrote: > Below is my review of a slightly older version than you just posted -- > much of it you may have already addressed. Far from all is already - thanks for the review! > From 3a6c3f41000e057bae12ab4431e6bb1c5f3ec4b0 Mon Sep 17 00:00:00 2001 > From: Andres Freund <andres@anarazel.de> > Date: Mon, 20 Mar 2023 21:57:40 -0700 > Subject: [PATCH v5 01/15] createdb-using-wal-fixes > > This could use a more detailed commit message -- I don't really get what > it is doing It's a fix for a bug that I encountered while hacking on this, it since has been committed. commit 5df319f3d55d09fadb4f7e4b58c5b476a3aeceb4 Author: Andres Freund <andres@anarazel.de> Date: 2023-03-20 21:57:40 -0700 Fix memory leak and inefficiency in CREATE DATABASE ... STRATEGY WAL_LOG RelationCopyStorageUsingBuffer() did not free the strategies used to access the source / target relation. They memory was released at the end of the transaction, but when using a template database with a lot of relations, the temporary leak can become big prohibitively big. RelationCopyStorageUsingBuffer() acquired the buffer for the target relation with RBM_NORMAL, therefore requiring a read of a block guaranteed to be zero. Use RBM_ZERO_AND_LOCK instead. Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/20230321070113.o2vqqxogjykwgfrr@awork3.anarazel.de Backpatch: 15-, where STRATEGY WAL_LOG was introduced > From 6faba69c241fd5513022bb042c33af09d91e84a6 Mon Sep 17 00:00:00 2001 > From: Andres Freund <andres@anarazel.de> > Date: Wed, 1 Jul 2020 19:06:45 -0700 > Subject: [PATCH v5 02/15] Add some error checking around pinning > > --- > src/backend/storage/buffer/bufmgr.c | 40 ++++++++++++++++++++--------- > src/include/storage/bufmgr.h | 1 + > 2 files changed, 29 insertions(+), 12 deletions(-) > > diff --git a/src/backend/storage/buffer/bufmgr.c > b/src/backend/storage/buffer/bufmgr.c > index 95212a3941..fa20fab5a2 100644 > --- a/src/backend/storage/buffer/bufmgr.c > +++ b/src/backend/storage/buffer/bufmgr.c > @@ -4283,6 +4287,25 @@ ConditionalLockBuffer(Buffer buffer) > LW_EXCLUSIVE); > } > > +void > +BufferCheckOneLocalPin(Buffer buffer) > +{ > + if (BufferIsLocal(buffer)) > + { > + /* There should be exactly one pin */ > + if (LocalRefCount[-buffer - 1] != 1) > + elog(ERROR, "incorrect local pin count: %d", > + LocalRefCount[-buffer - 1]); > + } > + else > + { > + /* There should be exactly one local pin */ > + if (GetPrivateRefCount(buffer) != 1) > > I'd rather this be an else if (was already like this, but, no reason not > to change it now) I don't like that much - it'd break the symmetry between local / non-local. > +/* > + * Zero a region of the file. > + * > + * Returns 0 on success, -1 otherwise. In the latter case errno is set to the > + * appropriate error. > + */ > +int > +FileZero(File file, off_t offset, off_t amount, uint32 wait_event_info) > +{ > + int returnCode; > + ssize_t written; > + > + Assert(FileIsValid(file)); > + returnCode = FileAccess(file); > + if (returnCode < 0) > + return returnCode; > + > + pgstat_report_wait_start(wait_event_info); > + written = pg_pwrite_zeros(VfdCache[file].fd, amount, offset); > + pgstat_report_wait_end(); > + > + if (written < 0) > + return -1; > + else if (written != amount) > > this doesn't need to be an else if You mean it could be a "bare" if instead? I don't really think that's clearer. > + { > + /* if errno is unset, assume problem is no disk space */ > + if (errno == 0) > + errno = ENOSPC; > + return -1; > + } > > +int > +FileFallocate(File file, off_t offset, off_t amount, uint32 wait_event_info) > +{ > +#ifdef HAVE_POSIX_FALLOCATE > + int returnCode; > + > + Assert(FileIsValid(file)); > + returnCode = FileAccess(file); > + if (returnCode < 0) > + return returnCode; > + > + pgstat_report_wait_start(wait_event_info); > + returnCode = posix_fallocate(VfdCache[file].fd, offset, amount); > + pgstat_report_wait_end(); > + > + if (returnCode == 0) > + return 0; > + > + /* for compatibility with %m printing etc */ > + errno = returnCode; > + > + /* > + * Return in cases of a "real" failure, if fallocate is not supported, > + * fall through to the FileZero() backed implementation. > + */ > + if (returnCode != EINVAL && returnCode != EOPNOTSUPP) > + return returnCode; > > I'm pretty sure you can just delete the below if statement > > + if (returnCode == 0 || > + (returnCode != EINVAL && returnCode != EINVAL)) > + return returnCode; Hm. I don't see how - wouldn't that lead us to call FileZero(), even if FileFallocate() succeeded or failed (rather than not being supported)? > > +/* > + * mdzeroextend() -- Add ew zeroed out blocks to the specified relation. > > not sure what ew is A hurried new :) > + * > + * Similar to mdrextend(), except the relation can be extended by > > mdrextend->mdextend > + * multiple blocks at once, and that the added blocks will be > filled with > > I would lose the comma and just say "and the added blocks will be filled..." Done. > +void > +mdzeroextend(SMgrRelation reln, ForkNumber forknum, > + BlockNumber blocknum, int nblocks, bool skipFsync) > > So, I think there are a few too many local variables in here, and it > actually makes it more confusing. > Assuming you would like to keep the input parameters blocknum and > nblocks unmodified for debugging/other reasons, here is a suggested > refactor of this function I'm mostly adopting this. > Also, I think you can combine the two error cases (I don't know if the > user cares what you were trying to extend the file with). Hm. I do find it a somewhat useful distinction for figuring out problems - we haven't used posix_fallocate for files so far, it seems plausible we'd hit some portability issues. We could make it an errdetail(), I guess? > void > mdzeroextend(SMgrRelation reln, ForkNumber forknum, > BlockNumber blocknum, int nblocks, bool skipFsync) > { > MdfdVec *v; > BlockNumber curblocknum = blocknum; > int remblocks = nblocks; > Assert(nblocks > 0); > > /* This assert is too expensive to have on normally ... */ > #ifdef CHECK_WRITE_VS_EXTEND > Assert(blocknum >= mdnblocks(reln, forknum)); > #endif > > /* > * If a relation manages to grow to 2^32-1 blocks, refuse to extend it any > * more --- we mustn't create a block whose number actually is > * InvalidBlockNumber or larger. > */ > if ((uint64) blocknum + nblocks >= (uint64) InvalidBlockNumber) > ereport(ERROR, > (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), > errmsg("cannot extend file \"%s\" beyond %u blocks", > relpath(reln->smgr_rlocator, forknum), > InvalidBlockNumber))); > > while (remblocks > 0) > { > int segstartblock = curblocknum % ((BlockNumber) > RELSEG_SIZE); Hm - this shouldn't be an int - that's my fault, not yours... > From ad7cd10a6c340d7f7d0adf26d5e39224dfd8439d Mon Sep 17 00:00:00 2001 > From: Andres Freund <andres@anarazel.de> > Date: Wed, 26 Oct 2022 12:05:07 -0700 > Subject: [PATCH v5 05/15] bufmgr: Add Pin/UnpinLocalBuffer() > > diff --git a/src/backend/storage/buffer/bufmgr.c > b/src/backend/storage/buffer/bufmgr.c > index fa20fab5a2..6f50dbd212 100644 > --- a/src/backend/storage/buffer/bufmgr.c > +++ b/src/backend/storage/buffer/bufmgr.c > @@ -4288,18 +4268,16 @@ ConditionalLockBuffer(Buffer buffer) > } > > void > -BufferCheckOneLocalPin(Buffer buffer) > +BufferCheckWePinOnce(Buffer buffer) > > This name is weird. Who is we? The current backend. I.e. the function checks the current backend pins the buffer exactly once, rather that *any* backend pins it once. I now see that BufferIsPinned() is named, IMO, misleadingly, more generally, even though it also just applies to pins by the current backend. > diff --git a/src/backend/storage/buffer/localbuf.c > b/src/backend/storage/buffer/localbuf.c > index 5325ddb663..798c5b93a8 100644 > --- a/src/backend/storage/buffer/localbuf.c > +++ b/src/backend/storage/buffer/localbuf.c > +bool > +PinLocalBuffer(BufferDesc *buf_hdr, bool adjust_usagecount) > +{ > + uint32 buf_state; > + Buffer buffer = BufferDescriptorGetBuffer(buf_hdr); > + int bufid = -(buffer + 1); > > You do > int buffid = -buffer - 1; > in UnpinLocalBuffer() > They should be consistent. > > int bufid = -(buffer + 1); > > I think this version is better: > > int buffid = -buffer - 1; > > Since if buffer is INT_MAX, then the -(buffer + 1) version invokes > undefined behavior while the -buffer - 1 version doesn't. You are right! Not sure what I was doing there... Ah - turns out there's pre-existing code in localbuf.c that do it that way :( See at least MarkLocalBufferDirty(). We really need to wrap this in something, rather than open coding it all over bufmgr.c/localbuf.c. I really dislike this indexing :(. > From a0228218e2ac299aac754eeb5b2be7ddfc56918d Mon Sep 17 00:00:00 2001 > From: Andres Freund <andres@anarazel.de> > Date: Fri, 17 Feb 2023 18:26:34 -0800 > Subject: [PATCH v5 07/15] bufmgr: Acquire and clean victim buffer separately > > Previously we held buffer locks for two buffer mapping partitions at the same > time to change the identity of buffers. Particularly for extending relations > needing to hold the extension lock while acquiring a victim buffer is > painful. By separating out the victim buffer acquisition, future commits will > be able to change relation extensions to scale better. > > diff --git a/src/backend/storage/buffer/bufmgr.c > b/src/backend/storage/buffer/bufmgr.c > index 3d0683593f..ea423ae484 100644 > --- a/src/backend/storage/buffer/bufmgr.c > +++ b/src/backend/storage/buffer/bufmgr.c > @@ -1200,293 +1200,111 @@ BufferAlloc(SMgrRelation smgr, char > relpersistence, ForkNumber forkNum, > > /* > * Buffer contents are currently invalid. Try to obtain the right to > * start I/O. If StartBufferIO returns false, then someone else managed > * to read it before we did, so there's nothing left for BufferAlloc() to > * do. > */ > - if (StartBufferIO(buf, true)) > + if (StartBufferIO(victim_buf_hdr, true)) > *foundPtr = false; > else > *foundPtr = true; > > I know it was already like this, but since you edited the line already, > can we just make this this now? > > *foundPtr = !StartBufferIO(victim_buf_hdr, true); Hm, I do think it's easier to review if largely unchanged code is just moved around, rather also rewritten. So I'm hesitant to do that here. > @@ -1595,6 +1413,237 @@ retry: > StrategyFreeBuffer(buf); > } > > +/* > + * Helper routine for GetVictimBuffer() > + * > + * Needs to be called on a buffer with a valid tag, pinned, but without the > + * buffer header spinlock held. > + * > + * Returns true if the buffer can be reused, in which case the buffer is only > + * pinned by this backend and marked as invalid, false otherwise. > + */ > +static bool > +InvalidateVictimBuffer(BufferDesc *buf_hdr) > +{ > + /* > + * Clear out the buffer's tag and flags and usagecount. This is not > + * strictly required, as BM_TAG_VALID/BM_VALID needs to be checked before > + * doing anything with the buffer. But currently it's beneficial as the > + * pre-check for several linear scans of shared buffers just checks the > + * tag. > > I don't really understand the above comment -- mainly the last sentence. To start with, it's s/checks/check/ "linear scans" is a reference to functions like DropRelationBuffers(), which iterate over all buffers, and just check the tag for a match. If we leave the tag around, it'll still work, as InvalidateBuffer() etc will figure out that the buffer is invalid. But of course that's slower then just skipping the buffer "early on". > +static Buffer > +GetVictimBuffer(BufferAccessStrategy strategy, IOContext io_context) > +{ > + BufferDesc *buf_hdr; > + Buffer buf; > + uint32 buf_state; > + bool from_ring; > + > + /* > + * Ensure, while the spinlock's not yet held, that there's a free refcount > + * entry. > + */ > + ReservePrivateRefCountEntry(); > + ResourceOwnerEnlargeBuffers(CurrentResourceOwner); > + > + /* we return here if a prospective victim buffer gets used concurrently */ > +again: > > Why use goto instead of a loop here (again is the goto label)? I find it way more readable this way. I'd use a loop if it were the common case to loop, but it's the rare case, and for that I find the goto more readable. > @@ -4709,8 +4704,6 @@ TerminateBufferIO(BufferDesc *buf, bool > clear_dirty, uint32 set_flag_bits) > { > uint32 buf_state; > > I noticed that the comment above TermianteBufferIO() says > * TerminateBufferIO: release a buffer we were doing I/O on > * (Assumptions) > * My process is executing IO for the buffer > > Can we still say this is an assumption? What about when it is being > cleaned up after being called from AbortBufferIO() That hasn't really changed - it was already called by AbortBufferIO(). I think it's still correct, too. We must have marked the IO as being in progress to get there. > diff --git a/src/backend/utils/resowner/resowner.c > b/src/backend/utils/resowner/resowner.c > index 19b6241e45..fccc59b39d 100644 > --- a/src/backend/utils/resowner/resowner.c > +++ b/src/backend/utils/resowner/resowner.c > @@ -121,6 +121,7 @@ typedef struct ResourceOwnerData > > /* We have built-in support for remembering: */ > ResourceArray bufferarr; /* owned buffers */ > + ResourceArray bufferioarr; /* in-progress buffer IO */ > ResourceArray catrefarr; /* catcache references */ > ResourceArray catlistrefarr; /* catcache-list pins */ > ResourceArray relrefarr; /* relcache references */ > @@ -441,6 +442,7 @@ ResourceOwnerCreate(ResourceOwner parent, const char *name) > > Maybe worth mentioning in-progress buffer IO in resowner README? I know > it doesn't claim to be exhaustive, so, up to you. Hm. Given the few types of resources mentioned in the README, I don't think it's worth doing so. > Also, I realize that existing code in this file has the extraneous > parantheses, but maybe it isn't worth staying consistent with that? > as in: &(owner->bufferioarr) I personally don't find it worth being consistent with that, but if you / others think it is, I'd be ok with adapting to that. > From f26d1fa7e528d04436402aa8f94dc2442999dde3 Mon Sep 17 00:00:00 2001 > From: Andres Freund <andres@anarazel.de> > Date: Wed, 1 Mar 2023 13:24:19 -0800 > Subject: [PATCH v5 09/15] bufmgr: Move relation extension handling into > ExtendBufferedRel{By,To,} > > diff --git a/src/backend/storage/buffer/bufmgr.c > b/src/backend/storage/buffer/bufmgr.c > index 3c95b87bca..4e07a5bc48 100644 > --- a/src/backend/storage/buffer/bufmgr.c > +++ b/src/backend/storage/buffer/bufmgr.c > > +/* > + * Extend relation by multiple blocks. > + * > + * Tries to extend the relation by extend_by blocks. Depending on the > + * availability of resources the relation may end up being extended by a > + * smaller number of pages (unless an error is thrown, always by at least one > + * page). *extended_by is updated to the number of pages the relation has been > + * extended to. > + * > + * buffers needs to be an array that is at least extend_by long. Upon > + * completion, the first extend_by array elements will point to a pinned > + * buffer. > + * > + * If EB_LOCK_FIRST is part of flags, the first returned buffer is > + * locked. This is useful for callers that want a buffer that is guaranteed to > + * be empty. > > This should document what the returned BlockNumber is. Ok. > Also, instead of having extend_by and extended_by, how about just having > one which is set by the caller to the desired number to extend by and > then overwritten in this function to the value it successfully extended > by. I had it that way at first - but I think it turned out to be more confusing. > It would be nice if the function returned the number it extended by > instead of the BlockNumber. It's not actually free to get the block number from a buffer (it causes more sharing of the BufferDesc cacheline, which then makes modifications of the cacheline more expensive). We should work on removing all those BufferGetBlockNumber(). So I don't want to introduce a new function that requires using BufferGetBlockNumber(). So I don't think this would be an improvement. > + Assert((eb.rel != NULL) ^ (eb.smgr != NULL)); > > Can we turn these into != > > Assert((eb.rel != NULL) != (eb.smgr != NULL)); > > since it is easier to understand. Done. > + * Extend the relation so it is at least extend_to blocks large, read buffer > > Use of "read buffer" here is confusing. We only read the block if, after > we try extending the relation, someone else already did so and we have > to read the block they extended in, right? That's one case, yes. I think there's also some unfortunate other case that I'd like to get rid of. See my archeology at https://postgr.es/m/20230223010147.32oir7sb66slqnjk%40awork3.anarazel.de > + uint32 num_pages = lengthof(buffers); > + BlockNumber first_block; > + > + if ((uint64) current_size + num_pages > extend_to) > + num_pages = extend_to - current_size; > + > + first_block = ExtendBufferedRelCommon(eb, fork, strategy, flags, > + num_pages, extend_to, > + buffers, &extended_by); > + > + current_size = first_block + extended_by; > + Assert(current_size <= extend_to); > + Assert(num_pages != 0 || current_size >= extend_to); > + > + for (int i = 0; i < extended_by; i++) > + { > + if (first_block + i != extend_to - 1) > > Is there a way we could avoid pinning these other buffers to begin with > (e.g. passing a parameter to ExtendBufferedRelCommon()) We can't avoid pinning them. We could make ExtendBufferedRelCommon() release them though - but I'm not sure that'd be an improvement. I actually had a flag for that temporarily, but > + if (buffer == InvalidBuffer) > + { > + bool hit; > + > + Assert(extended_by == 0); > + buffer = ReadBuffer_common(eb.smgr, eb.relpersistence, > + fork, extend_to - 1, mode, strategy, > + &hit); > + } > + > + return buffer; > +} > > Do we use compound literals? Here, this could be: > > buffer = ReadBuffer_common(eb.smgr, eb.relpersistence, > fork, extend_to - 1, mode, strategy, > &(bool) {0}); > > To eliminate the extraneous hit variable. We do use compound literals in a few places. However, I don't think it's a good idea to pass a pointer to a temporary. At least I need to look up the lifetime rules for those every time. And this isn't a huge win, so I wouldn't go for it here. > /* > * ReadBuffer_common -- common logic for all ReadBuffer variants > @@ -801,35 +991,36 @@ ReadBuffer_common(SMgrRelation smgr, char > relpersistence, ForkNumber forkNum, > bool found; > IOContext io_context; > IOObject io_object; > - bool isExtend; > bool isLocalBuf = SmgrIsTemp(smgr); > > *hit = false; > > + /* > + * Backward compatibility path, most code should use > + * ExtendRelationBuffered() instead, as acquiring the extension lock > + * inside ExtendRelationBuffered() scales a lot better. > > Think these are old function names in the comment Indeed. > +static BlockNumber > +ExtendBufferedRelShared(ExtendBufferedWhat eb, > + ForkNumber fork, > + BufferAccessStrategy strategy, > + uint32 flags, > + uint32 extend_by, > + BlockNumber extend_upto, > + Buffer *buffers, > + uint32 *extended_by) > +{ > + BlockNumber first_block; > + IOContext io_context = IOContextForStrategy(strategy); > + > + LimitAdditionalPins(&extend_by); > + > + /* > + * Acquire victim buffers for extension without holding extension lock. > + * Writing out victim buffers is the most expensive part of extending the > + * relation, particularly when doing so requires WAL flushes. Zeroing out > + * the buffers is also quite expensive, so do that before holding the > + * extension lock as well. > + * > + * These pages are pinned by us and not valid. While we hold the pin they > + * can't be acquired as victim buffers by another backend. > + */ > + for (uint32 i = 0; i < extend_by; i++) > + { > + Block buf_block; > + > + buffers[i] = GetVictimBuffer(strategy, io_context); > + buf_block = BufHdrGetBlock(GetBufferDescriptor(buffers[i] - 1)); > + > + /* new buffers are zero-filled */ > + MemSet((char *) buf_block, 0, BLCKSZ); > + } > + > + /* > + * Lock relation against concurrent extensions, unless requested not to. > + * > + * We use the same extension lock for all forks. That's unnecessarily > + * restrictive, but currently extensions for forks don't happen often > + * enough to make it worth locking more granularly. > + * > + * Note that another backend might have extended the relation by the time > + * we get the lock. > + */ > + if (!(flags & EB_SKIP_EXTENSION_LOCK)) > + { > + LockRelationForExtension(eb.rel, ExclusiveLock); > + eb.smgr = RelationGetSmgr(eb.rel); > + } > + > + /* > + * If requested, invalidate size cache, so that smgrnblocks asks the > + * kernel. > + */ > + if (flags & EB_CLEAR_SIZE_CACHE) > + eb.smgr->smgr_cached_nblocks[fork] = InvalidBlockNumber; > > I don't see this in master, is it new? Not really - it's just elsewhere. See vm_extend() and fsm_extend(). I could move this part back into "Convert a few places to ExtendBufferedRelTo", but I doin't think that'd be better. > + * Flags influencing the behaviour of ExtendBufferedRel* > + */ > +typedef enum ExtendBufferedFlags > +{ > + /* > + * Don't acquire extension lock. This is safe only if the relation isn't > + * shared, an access exclusive lock is held or if this is the startup > + * process. > + */ > + EB_SKIP_EXTENSION_LOCK = (1 << 0), > + > + /* Is this extension part of recovery? */ > + EB_PERFORMING_RECOVERY = (1 << 1), > + > + /* > + * Should the fork be created if it does not currently exist? This likely > + * only ever makes sense for relation forks. > + */ > + EB_CREATE_FORK_IF_NEEDED = (1 << 2), > + > + /* Should the first (possibly only) return buffer be returned locked? */ > + EB_LOCK_FIRST = (1 << 3), > + > + /* Should the smgr size cache be cleared? */ > + EB_CLEAR_SIZE_CACHE = (1 << 4), > + > + /* internal flags follow */ > > I don't understand what this comment means ("internal flags follow") Hm - just that the flags defined subsequently are for internal use, not for callers to specify. > + */ > +static int > +heap_multi_insert_pages(HeapTuple *heaptuples, int done, int ntuples, > Size saveFreeSpace) > +{ > + size_t page_avail; > + int npages = 0; > + > + page_avail = BLCKSZ - SizeOfPageHeaderData - saveFreeSpace; > + npages++; > > can this not just be this: > > size_t page_avail = BLCKSZ - SizeOfPageHeaderData - saveFreeSpace; > int npages = 1; Yes. > > From 5d2be27caf8f4ee8f26841b2aa1674c90bd51754 Mon Sep 17 00:00:00 2001 > From: Andres Freund <andres@anarazel.de> > Date: Wed, 26 Oct 2022 14:14:11 -0700 > Subject: [PATCH v5 12/15] hio: Use ExtendBufferedRelBy() > --- > src/backend/access/heap/hio.c | 285 +++++++++++++++++----------------- > 1 file changed, 146 insertions(+), 139 deletions(-) > > diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c > index 65886839e7..48cfcff975 100644 > --- a/src/backend/access/heap/hio.c > +++ b/src/backend/access/heap/hio.c > @@ -354,6 +270,9 @@ RelationGetBufferForTuple(Relation relation, Size len, > > so in RelationGetBufferForTuple() up above where your changes start, > there is this code > > > /* > * We first try to put the tuple on the same page we last inserted a tuple > * on, as cached in the BulkInsertState or relcache entry. If that > * doesn't work, we ask the Free Space Map to locate a suitable page. > * Since the FSM's info might be out of date, we have to be prepared to > * loop around and retry multiple times. (To insure this isn't an infinite > * loop, we must update the FSM with the correct amount of free space on > * each page that proves not to be suitable.) If the FSM has no record of > * a page with enough free space, we give up and extend the relation. > * > * When use_fsm is false, we either put the tuple onto the existing target > * page or extend the relation. > */ > if (bistate && bistate->current_buf != InvalidBuffer) > { > targetBlock = BufferGetBlockNumber(bistate->current_buf); > } > else > targetBlock = RelationGetTargetBlock(relation); > > if (targetBlock == InvalidBlockNumber && use_fsm) > { > /* > * We have no cached target page, so ask the FSM for an initial > * target. > */ > targetBlock = GetPageWithFreeSpace(relation, targetFreeSpace); > } > > And, I was thinking how, ReadBufferBI() only has one caller now > (RelationGetBufferForTuple()) and, this caller basically already has > checked for the case in the inside of ReadBufferBI() (the code I pasted > above) > > /* If we have the desired block already pinned, re-pin and return it */ > if (bistate->current_buf != InvalidBuffer) > { > if (BufferGetBlockNumber(bistate->current_buf) == targetBlock) > { > /* > * Currently the LOCK variants are only used for extending > * relation, which should never reach this branch. > */ > Assert(mode != RBM_ZERO_AND_LOCK && > mode != RBM_ZERO_AND_CLEANUP_LOCK); > > IncrBufferRefCount(bistate->current_buf); > return bistate->current_buf; > } > /* ... else drop the old buffer */ > > So, I was thinking maybe there is some way to inline the logic for > ReadBufferBI(), because I think it would feel more streamlined to me. I don't really see how - I'd welcome suggestions? > @@ -558,18 +477,46 @@ loop: > ReleaseBuffer(buffer); > } > > Oh, and I forget which commit introduced BulkInsertState->next_free and > last_free, but I remember thinking that it didn't seem to fit with the > other parts of that commit. I'll move it into the one using ExtendBufferedRelBy(). > - /* Without FSM, always fall out of the loop and extend */ > - if (!use_fsm) > - break; > + if (bistate > + && bistate->next_free != InvalidBlockNumber > + && bistate->next_free <= bistate->last_free) > + { > + /* > + * We bulk extended the relation before, and there are still some > + * unused pages from that extension, so we don't need to look in > + * the FSM for a new page. But do record the free space from the > + * last page, somebody might insert narrower tuples later. > + */ > > Why couldn't we have found out that we bulk-extended before and get the > block from there up above the while loop? I'm not quite sure I follow - above the loop there might still have been space on the prior page? We also need the ability to loop if the space has been used since. I guess there's an argument for also checking above the loop, but I don't think that'd currently ever be reachable. > + { > + bistate->next_free = InvalidBlockNumber; > + bistate->last_free = InvalidBlockNumber; > + } > + else > + bistate->next_free++; > + } > + else if (!use_fsm) > + { > + /* Without FSM, always fall out of the loop and extend */ > + break; > + } > > It would be nice to have a comment explaining why this is in its own > else if instead of breaking earlier (i.e. !use_fsm is still a valid case > in the if branch above it) I'm not quite following. Breaking where earlier? Note that that branch is old code, it's just that a new way of getting a page that potentially has free space is preceding it. > we can get rid of needLock and waitcount variables like this > > +#define MAX_BUFFERS 64 > + Buffer victim_buffers[MAX_BUFFERS]; > + BlockNumber firstBlock = InvalidBlockNumber; > + BlockNumber firstBlockFSM = InvalidBlockNumber; > + BlockNumber curBlock; > + uint32 extend_by_pages; > + uint32 no_fsm_pages; > + uint32 waitcount; > + > + extend_by_pages = num_pages; > + > + /* > + * Multiply the number of pages to extend by the number of waiters. Do > + * this even if we're not using the FSM, as it does relieve > + * contention. Pages will be found via bistate->next_free. > + */ > + if (needLock) > + waitcount = RelationExtensionLockWaiterCount(relation); > + else > + waitcount = 0; > + extend_by_pages += extend_by_pages * waitcount; > > if (!RELATION_IS_LOCAL(relation)) > extend_by_pages += extend_by_pages * > RelationExtensionLockWaiterCount(relation); I guess I find it useful to be able to quickly add logging messages for stuff like this. I don't think local variables are as bad as you make them out to be :) Thanks for the review! Andres
pgsql-hackers by date: