Re: Index Skip Scan - Mailing list pgsql-hackers

From Dmitry Dolgov
Subject Re: Index Skip Scan
Date
Msg-id CA+q6zcWvEr1EphXnBXuViGYeqrByyn0BoC+GcaeiTm1F-DxgXg@mail.gmail.com
Whole thread Raw
In response to Re: Index Skip Scan  (Jesper Pedersen <jesper.pedersen@redhat.com>)
List pgsql-hackers
>  On Thu, Jul 11, 2019 at 2:40 AM Floris Van Nee <florisvannee@optiver.com> wrote:
> I verified that the backwards index scan is indeed functioning now. However,
> I'm afraid it's not that simple, as I think the cursor case is broken now. I
> think having just the 'scan direction' in the btree code is not enough to get
> this working properly, because we need to know whether we want the minimum or
> maximum element of a certain prefix. There are basically four cases:
>
> - Forward Index Scan + Forward cursor: we want the minimum element within a
>   prefix and we want to skip 'forward' to the next prefix
>
> - Forward Index Scan + Backward cursor: we want the minimum element within a
>   prefix and we want to skip 'backward' to the previous prefix
>
> - Backward Index Scan + Forward cursor: we want the maximum element within a
>   prefix and we want to skip 'backward' to the previous prefix
>
> - Backward Index Scan + Backward cursor: we want the maximum element within a
>   prefix and we want to skip 'forward' to the next prefix
>
> These cases make it rather complicated unfortunately. They do somewhat tie in
> with the previous discussion on this thread about being able to skip to the
> min or max value. If we ever want to support a sort of minmax scan, we'll
> encounter the same issues.

Yes, these four cases are indeed a very good point. I've prepared a new version
of the patch, where they + an index condition and handling of situations when
it eliminated one or more unique elements are addressed. It seems fixes issues
and works also for those hypothetical examples you've mentioned above, but of
course it looks pretty complicated and I need to polish it a bit before
posting.

> On Thu, Jul 11, 2019 at 12:13 PM David Rowley <david.rowley@2ndquadrant.com> wrote:
>
> On Thu, 11 Jul 2019 at 19:41, Floris Van Nee <florisvannee@optiver.com> wrote:
> > SELECT DISTINCT ON (a) a,b,c FROM a WHERE c = 2 (with an index on a,b,c)
> > Data (imagine every tuple here actually occurs 10.000 times in the index to
> > see the benefit of skipping):
> > 1,1,1
> > 1,1,2
> > 1,2,2
> > 1,2,3
> > 2,2,1
> > 2,2,3
> > 3,1,1
> > 3,1,2
> > 3,2,2
> > 3,2,3
> >
> > Creating a cursor on this query and then moving forward, you should get
> > (1,1,2), (3,1,2). In the current implementation of the patch, after
> > bt_first, it skips over (1,1,2) to (2,2,1). It checks quals and moves
> > forward one-by-one until it finds a match. This match only comes at (3,1,2)
> > however. Then it skips to the end.
> >
> > If you move the cursor backwards from the end of the cursor, you should
> > still get (3,1,2) (1,1,2). A possible implementation would start at the end
> > and do a skip to the beginning of the prefix: (3,1,1). Then it needs to
> > move forward one-by-one in order to find the first matching (minimum) item
> > (3,1,2). When it finds it, it needs to skip backwards to the beginning of
> > prefix 2 (2,2,1). It needs to move forwards to find the minimum element,
> > but should stop as soon as it detects that the prefix doesn't match anymore
> > (because there is no match for prefix 2, it will move all the way from
> > (2,2,1) to (3,1,1)). It then needs to skip backwards again to the start of
> > prefix 1: (1,1,1) and scan forward to find (1,1,2).
> > Perhaps anyone can think of an easier way to implement it?
>
> One option is just don't implement it and just change
> ExecSupportsBackwardScan() so that it returns false for skip index
> scans, or perhaps at least implement an index am method to allow the
> planner to be able to determine if the index implementation supports
> it... amcanskipbackward

Yep, it was discussed few times in this thread, and after we've discovered
(thanks to Floris) so many issues I was also one step away from implementing
this idea. But at the time time as Thomas correctly noticed, our implementation
needs to be extensible to handle future use cases, and this particular cursor
juggling seems already like a pretty good example of such "future use case". So
I hope by dealing with it we can also figure out what needs to be extensible.

> On Tue, Jul 16, 2019 at 6:53 PM Jesper Pedersen <jesper.pedersen@redhat.com> wrote:
>
> Here is a patch more in that direction.
>
> Some questions:
>
> 1) Do we really need the UniqueKey struct ?  If it only contains the
> EquivalenceClass field then we could just have a list of those instead.
> That would make the patch simpler.
>
> 2) Do we need both canon_uniquekeys and query_uniquekeys ?  Currently
> the patch only uses canon_uniquekeys because the we attach the list
> directly on the Path node.
>
> I'll clean the patch up based on your feedback, and then start to rebase
> v21 on it.

Thanks! I'll also take a look as soon an I'm finished with the last updates.



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: [RFC] Removing "magic" oids
Next
From: Alexander Korotkov
Date:
Subject: Re: SQL/JSON path issues/questions