Re: Index Skip Scan - Mailing list pgsql-hackers

From Dmitry Dolgov
Subject Re: Index Skip Scan
Date
Msg-id 20200122160441.ograupmzyytde2mi@localhost
Whole thread Raw
In response to RE: Index Skip Scan  (Floris Van Nee <florisvannee@Optiver.com>)
Responses Re: Index Skip Scan
List pgsql-hackers
> On Wed, Jan 22, 2020 at 07:50:30AM +0000, Floris Van Nee wrote:
>
> Anyone please correct me if I'm wrong, but I think one case where the current patch relies on some data from the page
ithas locked before it in checking this hi/lo key. I think it's possible for the following sequence to happen. Suppose
wehave a very simple one leaf-page btree containing four elements: leaf page 1 = [2,4,6,8]
 
> We do a backwards index skip scan on this and have just returned our first tuple (8). The buffer is left pinned but
unlocked.Now, someone else comes in and inserts a tuple (value 5) into this page, but suppose the page happens to be
full.So a page split occurs. As far as I know, a page split could happen at any random element in the page. One of the
situationswe could be left with is:
 
> Leaf page 1 = [2,4]
> Leaf page 2 = [5,6,8]
> However, our scan is still pointing to leaf page 1.

In case if we just returned a tuple, the next action would be either
check the next page for another key or search down to the tree. Maybe
I'm missing something in your scenario, but the latter will land us on a
required page (we do not point to any leaf here), and before the former
there is a check for high/low key. Is there anything else missing?

> Now that I look at the patch again, I fear there currently may also be such a dependency in the "Advance forward but
readbackward"-case. It saves the offset number of a tuple in a variable, then does a _bt_search (releasing the lock and
pinon the page). At this point, anything can happen to the tuples on this page - the page may be compacted by vacuum
suchthat the offset number you have in your variable does not match the actual offset number of the tuple on the page
anymore.Then, at the check for (nextOffset == startOffset) later, there's a possibility the offsets are different even
thoughthey relate to the same tuple.
 

Interesting point. The original idea here was to check that we're not
returned to the same position after jumping, so maybe instead of offsets
we can check a tuple we found.



pgsql-hackers by date:

Previous
From: Sergei Kornilov
Date:
Subject: Re: pgsql: walreceiver uses a temporary replication slot by default
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] kqueue