Re: Adding skip scan (including MDAM style range skip scan) to nbtree - Mailing list pgsql-hackers

From Alena Rybakina
Subject Re: Adding skip scan (including MDAM style range skip scan) to nbtree
Date
Msg-id 6e1bdf1a-7728-4320-8762-1017eb1b4389@postgrespro.ru
Whole thread Raw
In response to Re: Adding skip scan (including MDAM style range skip scan) to nbtree  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On 03.04.2025 02:32, Peter Geoghegan wrote:
On Tue, Apr 1, 2025 at 3:08 PM Alena Rybakina <a.rybakina@postgrespro.ru> wrote:
I think it would be useful to show information that we used an index scan but at the same time we skipped the "region" column and I assume we should output how many distinct values the "region" column had.

For example it will look like this "Skip Scan on region (4 distinct values)":

What do you think?
I don't see much value in that. We can sometimes have data skew that
makes the number of distinct values far from representative of how
many index searches were required. We can have 3 distinct prefix
column values within 90% of all leaf pages, while the remaining 10%
all have unique values. Skip scan will work quite well here (at least
compared to a traditional full index scan), but the number of distinct
values makes it look really bad.
Yes, I agree — this could give a misleading impression when trying to evaluate the effectiveness of the feature.
I’ve spent quite some time thinking about whether there’s a better way to present this information, but I haven’t come up with a solid alternative.

To be honest, I’m starting to think that simply displaying the name of the skipped column might be sufficient.
* The availability of opclass skipsupport, which makes skip arrays
generate their element values by addition/subtraction from the current
array element, rather than using NEXT/PRIOR sentinel keys.

The sentinel keys act as probes that get the next real (non-sentinel)
value that we need to look up next. Whereas skip support can often
successfully guess that (for example) the next value in the index
after 268 is 269, saving a primitive scan each time (this might not
happen at all, or it might work only some of the time, or it might
work all of the time).

To be honest, I missed your point here. If possible, could you explain it in more detail?
So, many types don't (and probably can't) offer skip support. Skip
scan still works there. The most common example of this is "text".

We're still using skip arrays when skipping using (say) a text column.
Conceptually, it's still "WHERE a = ANY(<every possible 'a' value>)",
even with these continuous data types. It is useful to "pretend that
we're using a discrete data type", so that everything can work in the
usual way (remember, I hate special cases). We need to invent another
way to "increment" a text datum, that works in the same way (but
doesn't really require understanding the semantics of text, or
whatever the data type may be).

See my explanation about this here:

https://postgr.es/m/CAH2-WznKyHq_W7heu87z80EHyZepQeWbGuAZebcxZHvOXWCU-w@mail.gmail.com

See the part of my email that begins with "I think that you're
probably still a bit confused because the terminology in this area is
a little confusing. There are two ways of explaining the situation
with types like text and numeric (types that lack skip support)...."

I understand it. I agree with you that it should be extended to other types but I'm not sure how.

Maybe we can add an abstract iterator that will helps to get the next distinct value adapted to the type or
it needs to be added similar functions for each type. I think this topic is also for a separate thread)

I agree with you, this is an improvement. "Index Searches: N" shows the number of individual index searches, but
it is still not clear enough. Here you can additionally determine what happened based on the information about
the number of scanned pages, but with large amounts of data this is difficult.
The benefit of using skip scan comes from all of the pages that we
*aren't* reading, that we'd usually have to read. Hard to show that.
I noticed statistics on the number of hit buffers, read buffers, for example, "Buffers: shared hit=3 read=52",
are you talking about this?
Although I am still learning to understand correctly this information in the explain.
By the way, I have long wanted to ask, maybe you can advise something else to read on this topic?
Maybe not in this thread, so as not to overload this.
Let's talk about it off list.
Okay. Thank you)
I think it is worth adding "skip scan" information, without it it is difficult in my opinion to evaluate
whether this index is effective in comparison with another, looking only at the information on
scanned blocks or Index search or I missed something?
I think that it's particularly worth adding something to EXPLAIN
ANALYZE that makes it obvious that the index in question might not be
what the user thinks that it is. It might be an index that does some
skipping, but is far from optimal. They might have simply overlooked
the fact that there is an "extra" column between the columns that
their query predicate actually uses, which is far slower than what is
possible with a separate index that also omits that column.

Basically, I think that the most important goal should be to try to
help the user to understand when they have completely the wrong idea
about the index. I think that it's much less important to help users
to understand exactly how well a skip scan performs, relative to some
theoretical ideal. The theoretical ideal is just too complicated.
Yes, I agree — this information can be valuable, especially for those investigating query performance issues.
In particular, for example, it helps in cases where the optimizer chooses a suboptimal index, and it's not obvious why.

I believe it’s important to display not just what expressions were used during planning, but also what was actually used during execution.
That information might be important when analyzing data skew, deciding whether extended statistics are needed, and
understanding how the planner's assumptions played out. 

But this is a separate thread for discussion.

-- 
Regards,
Alena Rybakina
Postgres Professional

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: pg_recvlogical cannot create slots with failover=true
Next
From: Melanie Plageman
Date:
Subject: Re: Parallel heap vacuum