Re: Optimize single tuple fetch from nbtree index - Mailing list pgsql-hackers
From | Anastasia Lubennikova |
---|---|
Subject | Re: Optimize single tuple fetch from nbtree index |
Date | |
Msg-id | 4b660433-d74e-2075-28d9-907e09df9b9a@postgrespro.ru Whole thread Raw |
In response to | Re: Optimize single tuple fetch from nbtree index (Floris Van Nee <florisvannee@Optiver.com>) |
Responses |
Re: Optimize single tuple fetch from nbtree index
Re: Optimize single tuple fetch from nbtree index |
List | pgsql-hackers |
05.08.2019 14:34, Floris Van Nee wrote: > I have one further question about these index offsets. There are several comments in master that indicate that it's impossiblethat an item moves 'left' on a page, if we continuously hold a pin on the page. For example, _bt_killitems hasa comment like this: > > * Note that if we hold a pin on the target page continuously from initially > * reading the items until applying this function, VACUUM cannot have deleted > * any items from the page, and so there is no need to search left from the > * recorded offset. (This observation also guarantees that the item is still > * the right one to delete, which might otherwise be questionable since heap > * TIDs can get recycled.) This holds true even if the page has been modified > * by inserts and page splits, so there is no need to consult the LSN. > > Still, exactly this case happens in practice. In my tests I was able to get behavior like: > 1) pin + lock a page in _bt_first > 2) read a tuple, record indexOffset (for example offset=100) and heap tid > 3) unlock page, but*keep* the pin (end of _bt_first of my patch) > 4) lock page again in _bt_next (we still hold the pin, so vacuum shouldn't have occurred) > 5) look inside the current page for the heap Tid that we registered earlier > 6) we find that we can now find this tuple at indexOffset=98, eg. it moved left. This should not be possible. > This case sometimes randomly happens when running 'make check', which is why I added code in my patch to also look lefton the page from the previous indexOffset. > > However, this is in contradiction with the comments (and code) of _bt_killitems. > Is the comment incorrect/outdated or is there a bug in vacuum or any other part of Postgres that might move index itemsleft even though there are others holding a pin? Hello, welcome to hackers with your first patch) As far as I understood from the thread above, the design of this optimization is under discussion, so I didn't review the proposed patch itself. Though, I got interested in the comment inconsistency you have found. I added debug message into this code branch of the patch and was able to see it in regression.diffs after 'make check': Speaking of your patch, it seems that the buffer was unpinned and pinned again between two reads, and the condition of holding it continuously has not been met. I didn't dig into the code, but this line looks suspicious (see my findings about BTScanPosIsPinned below): /* bump pin on current buffer for assignment to mark buffer */ if (BTScanPosIsPinned(so->currPos)) IncrBufferRefCount(so->currPos.buf); While reading the code to answer your question, I noticed that BTScanPosIsPinned macro name is misleading. It calls BufferIsValid(), not BufferIsPinned() as one could expect. And BufferIsValid in bufmgr.h comment explicitly states that it shouldn't be confused with BufferIsPinned. The same goes for BTScanPosUnpinIfPinned(). I propose that we update BTScanPosIsPinned macro. Or, at least write a comment, why its current behavior is fine. There are a few existing callers, that are definitely expecting that this macro checks a pin, which it doesn't do. I don't quite understand if that already causes any subtle bug, or the current algorithm is fine. Peter, Tom, what do you think? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
pgsql-hackers by date: