Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans) - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans) |
Date | |
Msg-id | CAH2-WznHJZKUGL5T96TOcqxYHGXg8MnBwrYSs+j4-PBoRRtWEw@mail.gmail.com Whole thread Raw |
In response to | Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans) (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
|
List | pgsql-hackers |
On Wed, Mar 5, 2025 at 9:37 AM Peter Geoghegan <pg@bowt.ie> wrote: > Committed just now. Thanks again. I had to revert this for now, due to issues with debug_parallel_query. Apologies for the inconvenience. The immediate problem is that when the parallel leader doesn't participate, there is no valid IndexScanDescData in planstate to work off of. There isn't an obvious way to get to shared memory from the leader process, since that all goes through the IndexScanDescData.parallel_scan -- there is nothing that points to shared memory in any of the relevant planstate structs (namely IndexScanState, IndexOnlyScanState, and BitmapIndexScanState). I was hoping that you'd be able to provide some guidance on how best to fix this. I think that the problem here is similar to the problem with hash joins and their HashInstrumentation struct -- at least in the parallel-oblivious case. Here are the points of similarity: * The information in question is for the node execution as a whole -- it is orthogonal to what might have happened in each individual worker, and displays the same basic operation-level stats. It is independent of whether or not the scan happened to use parallel workers or not. * For the most part when running a parallel hash join it doesn't matter what worker EXPLAIN gets its stats from -- they should all agree on the details (in the parallel-oblivious case, though the parallel-aware case is still fairly similar). Comments in show_hash_info explain this. * However, there are important exceptions: cases where the parallel leader didn't participate at all, or showed up late, never building its own hash table. We have to be prepared to get the information from all workers, iff the leader doesn't have it. I failed to account for this last point. I wonder if I can fix this using an approach like the one from bugfix commit 5bcf389ecf. Note that show_hash_info has since changed; at the time of the commit we only had parallel oblivious hash joins, so it made sense to loop through SharedHashInfo for workers and go with the details taken from the first worker that successfully built a hash table (the hash tables must be identical anyway). As I said, a sticking point for this approach is that there is no existing way to get to someplace in shared memory from the parallel leader when it never participated. Parallel index scans have their ParallelIndexScanDesc state stored when they call index_beginscan_parallel. But that's not happening in a parallel leader that never participates. Parallel hash join doesn't have that problem, I think, because the leader will reliably get a pointer to shared state when ExecHashInitializeDSM() is called. As comments in its ExecParallelInitializeDSM caller put it, ExecHashInitializeDSM is called "even when not parallel-aware, for EXPLAIN ANALYZE" -- this makes it like a few other kinds of nodes, but not like index scan nodes. -- Peter Geoghegan
pgsql-hackers by date: