Re: Index Skip Scan - Mailing list pgsql-hackers

From Dmitry Dolgov
Subject Re: Index Skip Scan
Date
Msg-id 20200122213603.ndlm6t4ryivdwxs7@localhost
Whole thread Raw
In response to Re: Index Skip Scan  (Floris Van Nee <florisvannee@Optiver.com>)
Responses Re: Index Skip Scan  (Peter Geoghegan <pg@bowt.ie>)
Re: Index Skip Scan  (Dmitry Dolgov <9erthalion6@gmail.com>)
List pgsql-hackers
> On Wed, Jan 22, 2020 at 05:24:43PM +0000, Floris Van Nee wrote:
>
> > > Anyone please correct me if I'm wrong, but I think one case where the current patch relies on some data from the
pageit has locked before it in checking this hi/lo key. I think it's possible for the following sequence to happen.
Supposewe have a very simple one leaf-page btree containing four elements: leaf page 1 = [2,4,6,8]
 
> > > We do a backwards index skip scan on this and have just returned our first tuple (8). The buffer is left pinned
butunlocked. Now, someone else comes in and inserts a tuple (value 5) into this page, but suppose the page happens to
befull. So a page split occurs. As far as I know, a page split could happen at any random element in the page. One of
thesituations we could be left with is:
 
> > > Leaf page 1 = [2,4]
> > > Leaf page 2 = [5,6,8]
> > > However, our scan is still pointing to leaf page 1.
> 
> > In case if we just returned a tuple, the next action would be either
> >> check the next page for another key or search down to the tree. Maybe
> 
> But it won't look at the 'next page for another key', but rather at the 'same page or another key', right? In the
_bt_scankey_within_pageshortcut we're taking, there's no stepping to a next page involved. It just locks the page again
thatit previously also locked.
 

Yep, it would look only on the same page. Not sure what do you mean by
"another key", if the current key is not found within the current page
at the first stage, we restart from the root.

> > I'm missing something in your scenario, but the latter will land us on a
> > required page (we do not point to any leaf here), and before the former
> > there is a check for high/low key. Is there anything else missing?
> 
> Let me try to clarify. After we return the first tuple, so->currPos.buf is pointing to page=1 in my example (it's the
onlypage after all). We've returned item=8. Then the split happens and the items get rearranged as in my example. We're
stillpointing with so->currPos.buf to page=1, but the page now contains [2,4]. The split happened to the right, so
there'sa page=2 with [5,6,8], however the ongoing index scan is unaware of that.
 
> Now _bt_skip gets called to fetch the next tuple. It starts by checking _bt_scankey_within_page(scan,
so->skipScanKey,so->currPos.buf, dir), the result of which will be 'true': we're comparing the skip key to the low key
ofthe page. So it thinks the next key can be found on the current page. It locks the page and does a _binsrch, finding
item=4to be returned.
 
> 
> The problem here is that _bt_scankey_within_page mistakenly returns true, thereby limiting the search to just the
pagethat it's pointing to already.
 
> It may be fine to just fix this function to return the proper value (I guess it'd also need to look at the high key
inthis example). It could also be fixed by not looking at the lo/hi key of the page, but to use the local tuple buffer
instead.We already did a _read_page once, so if we have any matching tuples on that specific page, we have them locally
inthe buffer already. That way we never need to lock the same page twice.
 

Oh, that's what you mean. Yes, I was somehow tricked by the name of this
function and didn't notice that it checks only one boundary, so in case
of backward scan it returns wrong result. I think in the situation
you've describe it would actually not find any item on the current page
and restart from the root, but nevertheless we need to check for both
keys in _bt_scankey_within_page.



pgsql-hackers by date:

Previous
From: Floris Van Nee
Date:
Subject: RE: Index Skip Scan
Next
From: Dmitry Dolgov
Date:
Subject: Re: Index Skip Scan