Re: Index Skip Scan - Mailing list pgsql-hackers

From Dmitry Dolgov
Subject Re: Index Skip Scan
Date
Msg-id 20200204200205.ip5c2mzy3i2okqey@localhost
Whole thread Raw
In response to RE: Index Skip Scan  (Floris Van Nee <florisvannee@Optiver.com>)
Responses RE: Index Skip Scan
List pgsql-hackers
> On Mon, Jan 27, 2020 at 02:00:39PM +0000, Floris Van Nee wrote:
>
> Thanks for the new patch! I tested it and managed to find a case that causes
> some issues. Here's how to reproduce:

So, after a bit of investigation I found out the issue (it was actually there
even in the previous version). In this only case of moving forward and reading
backward, exactly scenarious you've described above, current implementation was
not ignoring deleted tuples.

My first idea to fix this was to use _bt_readpage when necessary and put
couple of _bt_killitems when we leave a page while jumping before, so that
deleted tuples will be ignored. To demonstrate it visually, let's say we
want to go backward on a cursor over an ORDER BY a DESC, b DESC query,
i.e. return:

    (1,100), (2, 100), (3, 100) etc.

To achieve that we jump from (1,1) to (1,100), from (2,1) to (2,100) and so on.
If some values are deleted, we need to read backward. E.g. if (3,100) is
deleted, we need to return (3,99).
                                                                             
   +---------------+---------------+---------------+---------------+         
   |               |               |               |               |         
   | 1,1 ... 1,100 | 2,1 ... 2,100 | 3,1 ... 3,100 | 4,1 ... 4,100 |         
   |               |               |               |               |         
   +---------------+---------------+---------------+---------------+         
                                                                             
   |             ^ |             ^ |             ^ |             ^           
   |             | |             | |             | |             |           
   +-------------+ +-------------+ +-------------+ +-------------+           
                                                                             
If it happened that a whole value series is deleted, we return to the
previous value and need to detect such situation. E.g. if all the values
from (3,1) to (3,100) were deleted, we will return to (2,100).
                                                                             
   +---------------+---------------+               +---------------+         
   |               |               |               |               |         
   | 1,1 ... 1,100 | 2,1 ... 2,100 |<--------------+ 4,1 ... 4,100 |         
   |               |               |               |               |         
   +---------------+---------------+               +---------------+         
                                                                 ^           
   |             ^ |             ^ |             ^               |           
   |             | |             | |             |               |           
   +-------------+ +-------------+ +-------------+               |           
                                   +-----------------------------+ 
                                                                             
This all is implemented inside _bt_skip. Unfortunately as I see it now the idea
of relying on ability to skip dead index tuples without checking a heap tuple
is not reliable, since an index tuple will be added into killedItems and can be
marked as dead only when not a single transaction can see it anymore.

Potentially there are two possible solutions:

* Adjust the code in nodeIndexOnlyscan to perform a proper visibility check and
  understand if we returned back. Obviously it will make the patch more invasive.

* Reduce scope of the patch, and simply do not apply jumping in this case. This
  means less functionality but hopefully still brings some value.

At this point me and Jesper inclined to go with the second option. But maybe
I'm missing something, are there any other suggestions?



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Index only scan and ctid
Next
From: Tom Lane
Date:
Subject: Re: Clarifying/rationalizing Vars' varno/varattno/varnoold/varoattno