Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans) - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
Date
Msg-id CAH2-WznHJZKUGL5T96TOcqxYHGXg8MnBwrYSs+j4-PBoRRtWEw@mail.gmail.com
Whole thread Raw
In response to Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
List pgsql-hackers
On Wed, Mar 5, 2025 at 9:37 AM Peter Geoghegan <pg@bowt.ie> wrote:
> Committed just now. Thanks again.

I had to revert this for now, due to issues with debug_parallel_query.
Apologies for the inconvenience.

The immediate problem is that when the parallel leader doesn't
participate, there is no valid IndexScanDescData in planstate to work
off of. There isn't an obvious way to get to shared memory from the
leader process, since that all goes through the
IndexScanDescData.parallel_scan -- there is nothing that points to
shared memory in any of the relevant planstate structs (namely
IndexScanState, IndexOnlyScanState, and BitmapIndexScanState). I was
hoping that you'd be able to provide some guidance on how best to fix
this.

I think that the problem here is similar to the problem with hash
joins and their HashInstrumentation struct -- at least in the
parallel-oblivious case. Here are the points of similarity:

* The information in question is for the node execution as a whole --
it is orthogonal to what might have happened in each individual
worker, and displays the same basic operation-level stats. It is
independent of whether or not the scan happened to use parallel
workers or not.

* For the most part when running a parallel hash join it doesn't
matter what worker EXPLAIN gets its stats from -- they should all
agree on the details (in the parallel-oblivious case, though the
parallel-aware case is still fairly similar). Comments in
show_hash_info explain this.

* However, there are important exceptions: cases where the parallel
leader didn't participate at all, or showed up late, never building
its own hash table. We have to be prepared to get the information from
all workers, iff the leader doesn't have it.

I failed to account for this last point. I wonder if I can fix this
using an approach like the one from bugfix commit 5bcf389ecf. Note
that show_hash_info has since changed; at the time of the commit we
only had parallel oblivious hash joins, so it made sense to loop
through SharedHashInfo for workers and go with the details taken from
the first worker that successfully built a hash table (the hash tables
must be identical anyway).

As I said, a sticking point for this approach is that there is no
existing way to get to someplace in shared memory from the parallel
leader when it never participated. Parallel index scans have their
ParallelIndexScanDesc state stored when they call
index_beginscan_parallel. But that's not happening in a parallel
leader that never participates. Parallel hash join doesn't have that
problem, I think, because the leader will reliably get a pointer to
shared state when ExecHashInitializeDSM() is called. As comments in
its ExecParallelInitializeDSM caller put it, ExecHashInitializeDSM is
called "even when not parallel-aware, for EXPLAIN ANALYZE" -- this
makes it like a few other kinds of nodes, but not like index scan
nodes.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Statistics Import and Export
Next
From: Nathan Bossart
Date:
Subject: Re: Statistics Import and Export