Re: Index Skip Scan - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: Index Skip Scan
Date
Msg-id 20190131.153153.83078698.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: Index Skip Scan  (Dmitry Dolgov <9erthalion6@gmail.com>)
Responses Re: Index Skip Scan  (Jesper Pedersen <jesper.pedersen@redhat.com>)
Re: Index Skip Scan  (James Coleman <jtc331@gmail.com>)
Re: Index Skip Scan  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-hackers
Hello.

At Wed, 30 Jan 2019 18:19:05 +0100, Dmitry Dolgov <9erthalion6@gmail.com> wrote in
<CA+q6zcVP18wYiO=aa+fz3GuncuTF52q1sufB7ise37TJPSDK1w@mail.gmail.com>
> A bit of adjustment after nodes/relation -> nodes/pathnodes.

I had a look on this.

The name "index skip scan" is a different feature from the
feature with the name on other prodcuts, which means "index scan
with postfix key (of mainly of multi column key) that scans
ignoring the prefixing part" As Thomas suggested I'd suggest that
we call it "index hop scan". (I can accept Hopscotch, either:p)

Also as mentioned upthread by Peter Geoghegan, this could easly
give worse plan by underestimation. So I also suggest that this
has dynamic fallback function.  In such perspective it is not
suitable for AM API level feature.

If all leaf pages are on the buffer and the average hopping
distance is less than (expectedly) 4 pages (the average height of
the tree), the skip scan will lose. If almost all leaf pages are
staying on disk, we could win only by 2-pages step (skipping over
one page).

=====
As I'm writing the above, I came to think that it's better
implement this as an pure executor optimization.

Specifically, let _bt_steppage count the ratio of skipped pages
so far then if the ratio exceeds some threshold (maybe around
3/4) it gets into hopscotching mode, where it uses index scan to
find the next page (rather than traversing). As mentioned above,
I think using skip scan to go beyond the next page is a good
bet. If the success ration of skip scans goes below some
threshold (I'm not sure for now), we should fall back to
traversing.

Any opinions?

====

Some comments on the patch below.

+     skip scan approach, which is based on the idea of
+     <ulink url="https://wiki.postgresql.org/wiki/Free_Space_Map_Problems">
+     Loose index scan</ulink>. Rather than scanning all equal values of a key,
+     as soon as a new value is found, it will search for a larger value on the

I'm not sure it is a good thing to put a pointer to rather
unstable source in the documentation.


This adds a new AM method but it seems avaiable only for ordered
indexes, specifically btree. And it seems that the feature can be
implemented in btgettuple since btskip apparently does the same
thing. (I agree to Robert in the point in [1]).

[1] https://www.postgresql.org/message-id/CA%2BTgmobb3uN0xDqTRu7f7WdjGRAXpSFxeAQnvNr%3DOK5_kC_SSg%40mail.gmail.com


Related to the above, it seems better that the path generation of
skip scan is a part of index scan. Whether skip scan or not is a
matter of index scan itself.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Arseny Sher
Date:
Subject: Too rigorous assert in reorderbuffer.c
Next
From: Michael Paquier
Date:
Subject: Re: Unused parameters & co in code