Re: Index Skip Scan - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Index Skip Scan |
Date | |
Msg-id | CA+hUKGKW4dXTP9G+WBskjT09tzD+9aMWEm=Fpeb6RS5SXfPyKw@mail.gmail.com Whole thread Raw |
In response to | Re: Index Skip Scan (Jeff Janes <jeff.janes@gmail.com>) |
Responses |
Re: Index Skip Scan
|
List | pgsql-hackers |
On Fri, Mar 1, 2019 at 11:23 AM Jeff Janes <jeff.janes@gmail.com> wrote: > On Thu, Jan 31, 2019 at 1:32 AM Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >> At Wed, 30 Jan 2019 18:19:05 +0100, Dmitry Dolgov <9erthalion6@gmail.com> wrote in <CA+q6zcVP18wYiO=aa+fz3GuncuTF52q1sufB7ise37TJPSDK1w@mail.gmail.com> >> > A bit of adjustment after nodes/relation -> nodes/pathnodes. >> >> I had a look on this. >> >> The name "index skip scan" is a different feature from the >> feature with the name on other prodcuts, which means "index scan >> with postfix key (of mainly of multi column key) that scans >> ignoring the prefixing part" As Thomas suggested I'd suggest that >> we call it "index hop scan". (I can accept Hopscotch, either:p) > > > I think that what we have proposed here is just an incomplete implementation of what other products call a skip scan, nota fundamentally different thing. They don't ignore the prefix part, they use that part in a way to cancel itself outto give the same answer, but faster. I think they would also use this skip method to get distinct values if that is whatis requested. But they would go beyond that to also use it to do something similar to the plan we get with this: Hi Jeff, "Hop scan" was just a stupid joke that occurred to me when I saw that DB2 had gone for "jump scan". I think "skip scan" is a perfectly good name and it's pretty widely used by now (for example, by our friends over at SQLite to blow us away at these kinds of queries). Yes, simple distinct value scans are indeed just the easiest kind of thing to do with this kind of scan-with-fast-forward. As discussed already in this thread and the earlier one there is a whole family of tricks you can do, and the thing that most people call an "index skip scan" is indeed the try-every-prefix case where you can scan an index on (a, b) given a WHERE clause b = x. Perhaps the simple distinct scan could appear as "Distinct Index Skip Scan"? And perhaps the try-every-prefix-scan could appear as just "Index Skip Scan"? Whether these are the same executor node is a good question; at one point I proposed a separate nest-loop like node for the try-every-prefix-scan, but Robert shot that down pretty fast. I now suspect (as he said) that all of this belongs in the index scan node, as different modes. The behaviour is overlapping; for "Distinct Index Skip Scan" you skip to each distinct prefix and emit one tuple, whereas for "Index Skip Scan" you skip to each distinct prefix and then perform a regular scan for the prefix + the suffix emitting matches. > (which currently takes 200ms, rather than the 0.9ms taken for the one benefiting from skip scan) Nice. > I don't think we should give it a different name, just because our initial implementation is incomplete. +1 > Or do you think our implementation of one feature does not really get us closer to implementing the other? My question when lobbing the earlier sketch patches into the mailing list a few years back was: is this simple index AM interface and implementation (once debugged) powerful enough for various kinds of interesting skip-based plans? So far I have the impression that it does indeed work for Distinct Index Skip Scan (demonstrated), Index Skip Scan (no one has yet tried that), and special cases of extrema aggregate queries (foo, MIN(bar) can be performed by skip scan of index on (foo, bar)), but may not work for the semi-related merge join trickery mentioned in a paper posted some time back (though I don't recall exactly why). Another question is whether it should all be done by the index scan node, and I think the answer is yet. -- Thomas Munro https://enterprisedb.com
pgsql-hackers by date: