Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans) - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
Date
Msg-id CAH2-Wz=WOtzBpVM8Kw9CpZOQ===Go-7zRsXE2R7nOMzVTGomMg@mail.gmail.com
Whole thread Raw
In response to Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Responses Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
List pgsql-hackers
On Tue, Aug 27, 2024 at 5:03 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> If the counter was put into the BTScanOpaque, rather than the
> IndexScanDesc, then this could trivially be used in an explain AM
> callback, as IndexScanDesc and ->opaque are both still available while
> building the explain output.

Right, "trivial". Except in that it requires inventing a whole new
general purpose infrastructure. Meanwhile, Tom is arguing against even
showing this very basic information in EXPLAIN ANALYZE.You see the
problem?

> As a result, it wouldn't bloat the
> IndexScanDesc for other index AMs who might not be interested in this
> metric.

Why do you persist with the idea that it isn't useful for other index
AMs? I mean it literally works in exactly the same way! It's literally
indistinguishable to users, and works in a way that's consistent with
historical behavior/definitions.

> I don't want anything, or anything done about it, but your statement
> that all index AMs support SAOP (potentially non-natively) is not
> true, as the non-native SAOP support is only for bitmap index scans,
> and index AMs aren't guaranteed to support bitmap index scans (e.g.
> pgvector's IVFFLAT and HNSW are good examples, as they only support
> amgettuple).

Yes, there are some very minor exceptions -- index AMs where even
non-native SAOPs won't be used. What difference does it make?

> > > And, in this case,
> > > the use case seems quite index-specific, at least for IS/IOS nodes.
> >
> > I disagree. It's an existing concept, exposed in system views, and now
> > in EXPLAIN ANALYZE. It's precisely that -- nothing more, nothing less.
>
> To be precise, it is not precisely that, because it's a different
> counter that an AM must update when the pgstat data is updated if it
> wants the explain output to reflect the stats counter accurately.

Why does that matter? I could easily move the counter to the opaque
struct, but that would make the patch longer and more complicated, for
absolutely no benefit.

> When an AM forgets to update one of these metrics (or fails to realize they
> have to both be updated) then they'd be out-of-sync. I'd prefer if an
> AM didn't have to account for it's statistics in more than one place.

I could easily change the pgstat_count_index_scan macro so that index
AMs were forced to do both, or neither. (Not that this is a real
problem.)

> > > Can't you pull that data, instead of inventing a new place
> > > every AMs needs to touch for it's metrics?
> >
> > No. At least not in a way that's scoped to a particular index scan.
>
> Similar per-node counter data is pulled for the global (!) counters of
> pgBufferUsage, so why would it be less possible to gather such metrics
> for just one index's stats here?

I told you why already, when we talked about this privately: there is
no guarantee that it's the index indicated by the scan
instrumentation. For example, due to syscache lookups. There's also
the question of how we maintain the count for things like nestloop
joins, where invocations of different index scan nodes may be freely
woven together. So it just won't work.

Besides, I thought that you wanted me to use some new field in
BTScanOpaque? But now you want me to use a global counter. Which is
it?

> While I do think it won't be easy to
> find a good way to integrate this into EXPLAIN's Instrumentation, I
> imagine other systems (e.g. table scans) may benefit from a better
> integration and explanation of pgstat statistics in EXPLAIN, too. E.g.
> I'd love to be able to explain how many times which function was
> called in a plans' projections, and what the relevant time expendature
> for those functions is in my plans.

Seems completely unrelated.

> Alternatively, you could update the patch so that only the field in
> IndexScan would need to be updated by the index AM by making the
> executor responsible to update the relation's stats at once at the end
> of the query with the data from the IndexScanDesc.

I don't understand why this is an alternative to the other thing that
you said. Or even why it's desirable.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Next
From: Jeff Davis
Date:
Subject: Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM