Re: Index Skip Scan - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Index Skip Scan
Date
Msg-id 20190725.201737.192223037.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Index Skip Scan  (Dmitry Dolgov <9erthalion6@gmail.com>)
Responses Re: Index Skip Scan  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
Hello.

At Wed, 24 Jul 2019 22:49:32 +0200, Dmitry Dolgov <9erthalion6@gmail.com> wrote in
<CA+q6zcXgwDMiowOGbr7gimTY3NV-LbcwP=rbma_L56pc+9p1Xw@mail.gmail.com>
> > On Mon, Jul 22, 2019 at 7:10 PM Jesper Pedersen <jesper.pedersen@redhat.com> wrote:
> >
> > On 7/22/19 1:44 AM, David Rowley wrote:
> > > Here are the comments I noted down during the review:
> > >
> > > cost_index:
> > >
> > > I know you've not finished here, but I think it'll need to adjust
> > > tuples_fetched somehow to account for estimate_num_groups() on the
> > > Path's unique keys. Any Eclass with an ec_has_const = true does not
> > > need to be part of the estimate there as there can only be at most one
> > > value for these.
> > >
> > > For example, in a query such as:
> > >
> > > SELECT x,y FROM t WHERE x = 1 GROUP BY x,y;
> > >
> > > you only need to perform estimate_num_groups() on "y".
> > >
> > > I'm really not quite sure on what exactly will be required from
> > > amcostestimate() here.  The cost of the skip scan is not the same as
> > > the normal scan. So other that API needs adjusted to allow the caller
> > > to mention that we want skip scans estimated, or there needs to be
> > > another callback.
> > >
> >
> > I think this part will become more clear once the index skip scan patch
> > is rebased, as we got the uniquekeys field on the Path, and the
> > indexskipprefixy info on the IndexPath node.
> 
> Here is what I came up with to address the problems, mentioned above in this
> thread. It passes tests, but I haven't tested it yet more thoughtful (e.g. it
> occurred to me, that `_bt_read_closest` probably wouldn't work, if a next key,
> that passes an index condition is few pages away - I'll try to tackle it soon).
> Just another small step forward, but I hope it's enough to rebase on top of it
> planner changes.
> 
> Also I've added few tags, mostly to mention reviewers contribution.

I have some comments.

+           * The order of columns in the index should be the same, as for
+           * unique distincs pathkeys, otherwise we cannot use _bt_search
+           * in the skip implementation - this can lead to a missing
+           * records.

It seems that it is enough that distinct pathkeys is contained in
index pathkeys. If it's right, that is almost checked in existing
code:

> if (pathkeys_contained_in(needed_pathkeys, path->pathkeys))

It is perfect when needed_pathkeys is distinct_pathkeys.  So
additional check is required if and only if it is not the case.

>        if (enable_indexskipscan &&
>          IsA(path, IndexPath) &&
>          ((IndexPath *) path)->indexinfo->amcanskip &&
>          (path->pathtype == T_IndexOnlyScan ||
>           path->pathtype == T_IndexScan) &&
>          (needed_pathkeys == root->distinct_pathkeys ||
>           pathkeys_contained_in(root->distinct_pathkeys,
>                       path->pathkeys)))

path->pathtype is always one of T_IndexOnlyScan or T_IndexScan so
the check against them isn't needed. If you have concern on that,
we can add that as Assert().

I feel uncomfortable to look into indexinfo there. Couldnd't we
use indexskipprefix == -1 to signal !amcanskip from
create_index_path?


+          /*
+           * XXX: In case of index scan quals evaluation happens after
+           * ExecScanFetch, which means skip results could be fitered out
+           */

Why can't we use skipscan path if having filter condition?  If
something bad happens, the reason must be written here instead of
what we do.


+          if (path->pathtype == T_IndexScan &&
+            parse->jointree != NULL &&
+            parse->jointree->quals != NULL &&
+            ((List *)parse->jointree->quals)->length != 0)

It's better to use list_length instead of peeping inside. It
handles the NULL case as well. (The structure has recently
changed but .length is not, though.)


+     * If advancing direction is different from index direction, we must
+     * skip right away, but _bt_skip requires a starting point.

It doesn't seem needed to me. Could you elaborate on the reason
for that?


+     * If advancing direction is different from index direction, we must
+     * skip right away, but _bt_skip requires a starting point.
+     */
+    if (direction * indexonlyscan->indexorderdir < 0 &&
+      !node->ioss_FirstTupleEmitted)

I'm confused by this. "direction" there is the physical scan
direction (fwd/bwd) of index scan, which is already compensated
by indexorderdir. Thus the condition means we do that when
logical ordering (ASC/DESC) is DESC. (Though I'm not sure what
"index direction" exactly means...)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: Add parallelism and glibc dependent only options to reindexdb
Next
From: Rafia Sabih
Date:
Subject: Re: Initdb failure