Re: Improve EXPLAIN output for multicolumn B-Tree Index - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Improve EXPLAIN output for multicolumn B-Tree Index
Date
Msg-id CAH2-Wzm6KRA3Fa6-SSJTXaLm=3vtQY1uyLH=kYkjB-cpui_7cQ@mail.gmail.com
Whole thread Raw
In response to Re: Improve EXPLAIN output for multicolumn B-Tree Index  (Jelte Fennema-Nio <postgres@jeltef.nl>)
Responses RE: Improve EXPLAIN output for multicolumn B-Tree Index
Re: Improve EXPLAIN output for multicolumn B-Tree Index
List pgsql-hackers
On Thu, Jun 27, 2024 at 4:46 PM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
> On Thu, 27 Jun 2024 at 22:02, Peter Geoghegan <pg@bowt.ie> wrote:
> > It's also possible that we should just do something simple, like your
> > patch, even though technically it won't really be accurate in cases
> > where skip scan is used to good effect. Maybe showing the "default
> > working assumption" about how the scan keys/clauses will behave at
> > runtime is actually the right thing to do. Maybe I am just
> > overthinking it.
>
> IIUC, you're saying that your skip scan will improve the situation
> Masahiro describes dramatically in some/most cases.

"Most cases" seems likely to be overstating it. Overall, I doubt that
it makes sense to try to generalize like that.

The breakdown of the cases that we see in the field right now
(whatever it really is) is bound to be strongly influenced by the
current capabilities of Postgres. If something is intolerably slow,
then it just isn't tolerated. If something works adequately, then
users don't usually care why it is so.

> But it still won't
> be as good as a pure index "prefix" scan.

Typically, no, it won't be. But there's really no telling for sure.
The access patterns for a composite index on '(a, b)' with a qual
"WHERE b = 5" are identical to a qual explicitly written "WHERE a =
any(<every possible value in 'a'>) AND b = 5".

> If that's the case then I do think you're overthinking this a bit.
> Because then you'd still want to see this difference between the
> prefix-scan keys and the skip-scan keys. I think the main thing that
> the introduction of the skip scan changes is the name that we should
> show, e.g. instead of "Non-key Filter" we might want to call it "Skip
> Scan Cond"

What about cases where we legitimately have to vary our strategy
during the same index scan? We might very well be able to skip over
many leaf pages when scanning through a low cardinality subset of the
index (low cardinality in respect of a leading column 'a'). Then we
might find that there are long runs on leaf pages where no skipping is
possible.

I don't expect this to be uncommon. I do expect it to happen when the
optimizer wasn't particularly expecting it. Like when a full index
scan was the fastest plan anyway. Or when a skip scan wasn't quite as
good as expected, but nevertheless turned out to be the fastest plan.

> I do think though that in addition to a "Skip Scan Filtered" count for
> ANALYZE, it would be very nice to also get a "Skip Scan Skipped" count
> (if that's possible to measure/estimate somehow). This would allow
> users to determine how effective the skip scan was, i.e. were they
> able to skip over large swaths of the index? Or did they skip over
> nothing because the second column of the index (on which there was no
> filter) was unique within the table

Yeah, EXPLAIN ANALYZE should probably be showing something about
skipping. That provides us with a way of telling the user what really
happened, which could help when EXPLAIN output alone turns out to be
quite misleading.

In fact, that'd make sense even today, without skip scan (just with
the 17 work on nbtree SAOP scans). Even with regular SAOP nbtree index
scans, the number of primitive scans is hard to predict, and quite
indicative of what's really going on with the scan.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: "David E. Wheeler"
Date:
Subject: Re: Proposal: Document ABI Compatibility
Next
From: Peter Smith
Date:
Subject: Re: walsender.c comment with no context is hard to understand