Thread: Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

On Tue, Aug 27, 2024 at 11:16 AM Peter Geoghegan <pg@bowt.ie> wrote:
> On Thu, Aug 15, 2024 at 3:22 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > Attached patch has EXPLAIN ANALYZE display the total number of
> > primitive index scans for all 3 kinds of index scan node.
>
> Attached is v2, which fixes bitrot.
>
> v2 also uses new terminology. EXPLAIN ANALYZE will now show "Index
> Searches: N", not "Primitive Index Scans: N". Although there is
> limited precedent for using the primitive scan terminology, I think
> that it's a bit unwieldy.

I do like "Index Searches" better than "Primitive Index Scans."

But I think Matthias had some good points about this being
btree-specific. I'm not sure whether he was completely correct, but
you seemed to just dismiss his argument and say "well, that can't be
done," which doesn't seem convincing to me at all. If, for non-btree
indexes, the number of index searches will always be the same as the
loop count, then surely there is some way to avoid cluttering the
output for non-btree indexes with output that can never be of any use.

--
Robert Haas
EDB: http://www.enterprisedb.com



On Tue, Aug 27, 2024 at 1:45 PM Robert Haas <robertmhaas@gmail.com> wrote:
> I do like "Index Searches" better than "Primitive Index Scans."
>
> But I think Matthias had some good points about this being
> btree-specific.

It's not B-Tree specific -- not really. Any index scan that can at
least non-natively support ScalarArrayOps (i.e. SAOP scans that the
executor manages using ExecIndexEvalArrayKeys() + bitmap scans) will
show information that is exactly equivalent to what B-Tree will show,
given a similar ScalarArrayOps query.

There is at best one limited sense in which the information shown is
B-Tree specific: it tends to be more interesting in the case of B-Tree
index scans. You cannot trivially derive the number based on the
number of array keys for B-Tree scans, since nbtree is now clever
about not needlessly searching the index anew. It's quite possible
that other index AMs will in the future be enhanced in about the same
way as nbtree was in commit 5bf748b86b, at which point even this will
no longer apply. (Tom speculated about adding something like that to
GiST recently).

> I'm not sure whether he was completely correct, but
> you seemed to just dismiss his argument and say "well, that can't be
> done," which doesn't seem convincing to me at all.

To be clear, any variation that you can think of *can* be done without
much difficulty. I thought that Matthias was unclear about what he
even wanted, is all.

The problem isn't that there aren't any alternatives. The problem, if
any, is that there are a huge number of slightly different
alternatives. There are hopelessly subjective questions about what the
best trade-off between redundancy and consistency is. I'm absolutely
not set on doing things in exactly the way I've laid out.

What do you think should be done? Note that the number of loops
matters here, in addition to the number of SAOP primitive
scans/searches. If you want to suppress the information shown in the
typical "nsearches == 1" case, what does that mean for the less common
"nsearches == 0" case?

> If, for non-btree
> indexes, the number of index searches will always be the same as the
> loop count, then surely there is some way to avoid cluttering the
> output for non-btree indexes with output that can never be of any use.

Even if we assume that a given index/index AM will never use SAOPs,
it's still possible to show more than one "Index Search" per executor
node execution. For example, when an index scan node is the inner side
of a nestloop join.

I see value in making it obvious to users when and how
pg_stat_all_indexes.idx_scan advances. Being able to easily relate it
to EXPLAIN ANALYZE output is useful, independent of whether or not
SAOPs happen to be used. That's probably the single best argument in
favor of showing "Index Searches: N" unconditionally. But I'm
certainly not going to refuse to budge over that.

--
Peter Geoghegan



Peter Geoghegan <pg@bowt.ie> writes:
> I see value in making it obvious to users when and how
> pg_stat_all_indexes.idx_scan advances. Being able to easily relate it
> to EXPLAIN ANALYZE output is useful, independent of whether or not
> SAOPs happen to be used. That's probably the single best argument in
> favor of showing "Index Searches: N" unconditionally. But I'm
> certainly not going to refuse to budge over that.

TBH, I'm afraid that this patch basically is exposing numbers that
nobody but Peter Geoghegan and maybe two or three other hackers
will understand, and even fewer people will find useful (since the
how-many-primitive-scans behavior is not something users have any
control over, IIUC).  I doubt that "it lines up with
pg_stat_all_indexes.idx_scan" is enough to justify the additional
clutter in EXPLAIN.  Maybe we should be going the other direction
and trying to make pg_stat_all_indexes count in a less detailed but
less surprising way, ie once per indexscan plan node invocation.

            regards, tom lane



On Tue, Aug 27, 2024 at 3:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> TBH, I'm afraid that this patch basically is exposing numbers that
> nobody but Peter Geoghegan and maybe two or three other hackers
> will understand, and even fewer people will find useful (since the
> how-many-primitive-scans behavior is not something users have any
> control over, IIUC).

You can make about the same argument against showing "Buffers". It's
not really something that you can address directly, either. It's
helpful only in the context of a specific problem.

> I doubt that "it lines up with
> pg_stat_all_indexes.idx_scan" is enough to justify the additional
> clutter in EXPLAIN.

The scheme laid out in the patch is just a starting point for
discussion. I just think that it's particularly important that we have
this for skip scan -- that's the part that I feel strongly about.

With skip scan in place, every scan of the kind we'd currently call a
"full index scan" will be eligible to skip. Whether and to what extent
we actually skip is determined at runtime. We really need some way of
determining how much skipping has taken place. (There are many
disadvantages to having a dedicated skip scan index path, which I can
go into if you want.)

> Maybe we should be going the other direction
> and trying to make pg_stat_all_indexes count in a less detailed but
> less surprising way, ie once per indexscan plan node invocation.

Is that less surprising, though? I think that it's more surprising.

--
Peter Geoghegan



On Tue, Aug 27, 2024 at 3:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Geoghegan <pg@bowt.ie> writes:
> > I see value in making it obvious to users when and how
> > pg_stat_all_indexes.idx_scan advances. Being able to easily relate it
> > to EXPLAIN ANALYZE output is useful, independent of whether or not
> > SAOPs happen to be used. That's probably the single best argument in
> > favor of showing "Index Searches: N" unconditionally. But I'm
> > certainly not going to refuse to budge over that.
>
> TBH, I'm afraid that this patch basically is exposing numbers that
> nobody but Peter Geoghegan and maybe two or three other hackers
> will understand, and even fewer people will find useful (since the
> how-many-primitive-scans behavior is not something users have any
> control over, IIUC).  I doubt that "it lines up with
> pg_stat_all_indexes.idx_scan" is enough to justify the additional
> clutter in EXPLAIN.  Maybe we should be going the other direction
> and trying to make pg_stat_all_indexes count in a less detailed but
> less surprising way, ie once per indexscan plan node invocation.

I kind of had that reaction too initially, but I think that was mostly
because "Primitive Index Scans" seemed extremely unclear. I think
"Index Searches" is pretty comprehensible, honestly. Why shouldn't
someone be able to figure out what that means?

Might make sense to restrict this to VERBOSE mode, too.

--
Robert Haas
EDB: http://www.enterprisedb.com



On Wed, Aug 28, 2024 at 9:25 AM Robert Haas <robertmhaas@gmail.com> wrote:
> Might make sense to restrict this to VERBOSE mode, too.

If we have to make the new output appear selectively, I'd prefer to do
it this way.

There are lots of small problems with selectively displaying less/no
information based on rules applied against the number of index
searches/loops/whatever. While that general approach works quite well
in the case of the "Buffers" instrumentation, it won't really work
here. After all, the base case is that there is one index search per
index scan node -- not zero searches.

--
Peter Geoghegan



On Wed, Aug 28, 2024 at 9:25 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Aug 27, 2024 at 3:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I kind of had that reaction too initially, but I think that was mostly
> because "Primitive Index Scans" seemed extremely unclear. I think
> "Index Searches" is pretty comprehensible, honestly. Why shouldn't
> someone be able to figure out what that means?

Worth noting that Lukas Fittl made a point of prominently highlighting
the issue with how this works when he explained the Postgres 17 nbtree
work:

https://pganalyze.com/blog/5mins-postgres-17-faster-btree-index-scans

And no, I wasn't asked to give any input to the blog post. Lukas has a
general interest in making the system easier to understand for
ordinary users. Presumably that's why he zeroed in on this one aspect
of the work. It's far from an esoteric implementation detail.

--
Peter Geoghegan