EXPLAIN: Non-parallel ancestor plan nodes exclude parallel worker instrumentation - Mailing list pgsql-hackers
From | Maciek Sakrejda |
---|---|
Subject | EXPLAIN: Non-parallel ancestor plan nodes exclude parallel worker instrumentation |
Date | |
Msg-id | CAOtHd0Bw6hsKV-TowuqYeTFhouER+Nds0i-oEd=AFzrKifGNiA@mail.gmail.com Whole thread Raw |
Responses |
Re: EXPLAIN: Non-parallel ancestor plan nodes exclude parallel worker instrumentation
Re: EXPLAIN: Non-parallel ancestor plan nodes exclude parallel worker instrumentation |
List | pgsql-hackers |
Hello, I had some questions about the behavior of some accounting in parallel EXPLAIN plans. Take the following plan: ``` Gather (cost=1000.43..750173.74 rows=2 width=235) (actual time=1665.122..1665.122 rows=0 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=27683 read=239573 I/O Timings: read=687.358 -> Nested Loop (cost=0.43..749173.54 rows=1 width=235) (actual time=1660.095..1660.095 rows=0 loops=3) Inner Unique: true Buffers: shared hit=77579 read=657847 I/O Timings: read=2090.189 Worker 0: actual time=1657.443..1657.443 rows=0 loops=1 Buffers: shared hit=23645 read=208365 I/O Timings: read=702.270 Worker 1: actual time=1658.277..1658.277 rows=0 loops=1 Buffers: shared hit=26251 read=209909 I/O Timings: read=700.560 -> Parallel Seq Scan on public.schema_indices (cost=0.00..748877.88 rows=35 width=235) (actual time=136.744..1659.629 rows=32 loops=3) Filter: ((schema_indices.invalidated_at_snapshot_id IS NULL) AND (NOT schema_indices.is_valid)) Rows Removed by Filter: 703421 Buffers: shared hit=77193 read=657847 I/O Timings: read=2090.189 Worker 0: actual time=69.248..1656.950 rows=32 loops=1 Buffers: shared hit=23516 read=208365 I/O Timings: read=702.270 Worker 1: actual time=260.188..1657.875 rows=27 loops=1 Buffers: shared hit=26140 read=209909 I/O Timings: read=700.560 -> Index Scan using schema_tables_pkey on public.schema_tables (cost=0.43..8.45 rows=1 width=8) (actual time=0.011..0.011 rows=0 loops=95) Index Cond: (schema_tables.id = schema_indices.table_id) Filter: (schema_tables.database_id = 123) Rows Removed by Filter: 1 Buffers: shared hit=386 Worker 0: actual time=0.011..0.011 rows=0 loops=32 Buffers: shared hit=129 Worker 1: actual time=0.011..0.011 rows=0 loops=27 Buffers: shared hit=111 Planning Time: 0.429 ms Execution Time: 1667.373 ms ``` The Nested Loop here aggregates data for metrics like `buffers read` from its workers, and to calculate a metric like `buffers read` for the parallel leader, we can subtract the values recorded in each individual worker. This happens in the Seq Scan and Index Scan children, as well. However, the Gather node appears to only include values from its direct parallel leader child (excluding that child's workers). This leads to the odd situation that the Gather has lower values for some of these metrics than its child (because the child node reporting includes the worker metrics) even though the values are supposed to be cumulative. This is even more surprising for something like I/O timing, where the Gather has a lower `read` value than one of the Nested Loop workers, which doesn't make sense in terms of wall-clock time. Is this behavior intentional? If so, is there an explanation of the reasoning or the trade-offs involved? Would it not make sense to propagate those cumulative parallel costs up the tree all the way to the root, instead of only using the parallel leader metrics under Gather? Thanks, Maciek
pgsql-hackers by date: