Re: Index Skip Scan - Mailing list pgsql-hackers

From James Coleman
Subject Re: Index Skip Scan
Date
Msg-id CAAaqYe8ATjgwJJnUe5O2KBy8HAAm1bPfYaGvaZrBh56-XP21AQ@mail.gmail.com
Whole thread Raw
In response to Re: Index Skip Scan  (Dmitry Dolgov <9erthalion6@gmail.com>)
Responses Re: Index Skip Scan
List pgsql-hackers
On Sat, Feb 8, 2020 at 10:24 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>
> > On Sat, Feb 08, 2020 at 03:22:17PM +0100, Tomas Vondra wrote:
> > So in this case the skip scan is ~15x slower than the usual plan (index
> > only scan + unique). The reason why this happens is pretty simple - to
> > estimate the number of groups we multiply the ndistinct estimates for
> > the two columns (which both have n_distinct = 10000), but then we cap
> > the estimate to 10% of the table. But when the columns are independent
> > with high cardinalities that under-estimates the actual value, making
> > the cost for skip scan much lower than it should be.
>
> The current implementation checks if we can find the next value on the
> same page to do a shortcut instead of tree traversal and improve such
> kind of situations, but I can easily imagine that it's still not enough
> in some extreme situations.

This is almost certainly rehashing already covered ground, but since I
doubt it's been discussed recently, would you be able to summarize
that choice (not to always get the next tuple by scanning from the top
of the tree again) and the performance/complexity tradeoffs?

Thanks,
James



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Marking some contrib modules as trusted extensions
Next
From: Masahiko Sawada
Date:
Subject: Re: Internal key management system