Re: Index Skip Scan - Mailing list pgsql-hackers
From | Dmitry Dolgov |
---|---|
Subject | Re: Index Skip Scan |
Date | |
Msg-id | 20200204200205.ip5c2mzy3i2okqey@localhost Whole thread Raw |
In response to | RE: Index Skip Scan (Floris Van Nee <florisvannee@Optiver.com>) |
Responses |
RE: Index Skip Scan
|
List | pgsql-hackers |
> On Mon, Jan 27, 2020 at 02:00:39PM +0000, Floris Van Nee wrote: > > Thanks for the new patch! I tested it and managed to find a case that causes > some issues. Here's how to reproduce: So, after a bit of investigation I found out the issue (it was actually there even in the previous version). In this only case of moving forward and reading backward, exactly scenarious you've described above, current implementation was not ignoring deleted tuples. My first idea to fix this was to use _bt_readpage when necessary and put couple of _bt_killitems when we leave a page while jumping before, so that deleted tuples will be ignored. To demonstrate it visually, let's say we want to go backward on a cursor over an ORDER BY a DESC, b DESC query, i.e. return: (1,100), (2, 100), (3, 100) etc. To achieve that we jump from (1,1) to (1,100), from (2,1) to (2,100) and so on. If some values are deleted, we need to read backward. E.g. if (3,100) is deleted, we need to return (3,99). +---------------+---------------+---------------+---------------+ | | | | | | 1,1 ... 1,100 | 2,1 ... 2,100 | 3,1 ... 3,100 | 4,1 ... 4,100 | | | | | | +---------------+---------------+---------------+---------------+ | ^ | ^ | ^ | ^ | | | | | | | | +-------------+ +-------------+ +-------------+ +-------------+ If it happened that a whole value series is deleted, we return to the previous value and need to detect such situation. E.g. if all the values from (3,1) to (3,100) were deleted, we will return to (2,100). +---------------+---------------+ +---------------+ | | | | | | 1,1 ... 1,100 | 2,1 ... 2,100 |<--------------+ 4,1 ... 4,100 | | | | | | +---------------+---------------+ +---------------+ ^ | ^ | ^ | ^ | | | | | | | | +-------------+ +-------------+ +-------------+ | +-----------------------------+ This all is implemented inside _bt_skip. Unfortunately as I see it now the idea of relying on ability to skip dead index tuples without checking a heap tuple is not reliable, since an index tuple will be added into killedItems and can be marked as dead only when not a single transaction can see it anymore. Potentially there are two possible solutions: * Adjust the code in nodeIndexOnlyscan to perform a proper visibility check and understand if we returned back. Obviously it will make the patch more invasive. * Reduce scope of the patch, and simply do not apply jumping in this case. This means less functionality but hopefully still brings some value. At this point me and Jesper inclined to go with the second option. But maybe I'm missing something, are there any other suggestions?
pgsql-hackers by date: