EXPLAIN: Non-parallel ancestor plan nodes exclude parallel worker instrumentation - Mailing list pgsql-hackers

From Maciek Sakrejda
Subject EXPLAIN: Non-parallel ancestor plan nodes exclude parallel worker instrumentation
Date
Msg-id CAOtHd0Bw6hsKV-TowuqYeTFhouER+Nds0i-oEd=AFzrKifGNiA@mail.gmail.com
Whole thread Raw
Responses Re: EXPLAIN: Non-parallel ancestor plan nodes exclude parallel worker instrumentation
Re: EXPLAIN: Non-parallel ancestor plan nodes exclude parallel worker instrumentation
List pgsql-hackers
Hello,

I had some questions about the behavior of some accounting in parallel
EXPLAIN plans. Take the following plan:

```
Gather  (cost=1000.43..750173.74 rows=2 width=235) (actual
time=1665.122..1665.122 rows=0 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=27683 read=239573
  I/O Timings: read=687.358
  ->  Nested Loop  (cost=0.43..749173.54 rows=1 width=235) (actual
time=1660.095..1660.095 rows=0 loops=3)
        Inner Unique: true
        Buffers: shared hit=77579 read=657847
        I/O Timings: read=2090.189
        Worker 0: actual time=1657.443..1657.443 rows=0 loops=1
          Buffers: shared hit=23645 read=208365
          I/O Timings: read=702.270
        Worker 1: actual time=1658.277..1658.277 rows=0 loops=1
          Buffers: shared hit=26251 read=209909
          I/O Timings: read=700.560
        ->  Parallel Seq Scan on public.schema_indices
(cost=0.00..748877.88 rows=35 width=235) (actual
time=136.744..1659.629 rows=32 loops=3)
              Filter: ((schema_indices.invalidated_at_snapshot_id IS
NULL) AND (NOT schema_indices.is_valid))
              Rows Removed by Filter: 703421
              Buffers: shared hit=77193 read=657847
              I/O Timings: read=2090.189
              Worker 0: actual time=69.248..1656.950 rows=32 loops=1
                Buffers: shared hit=23516 read=208365
                I/O Timings: read=702.270
              Worker 1: actual time=260.188..1657.875 rows=27 loops=1
                Buffers: shared hit=26140 read=209909
                I/O Timings: read=700.560
        ->  Index Scan using schema_tables_pkey on
public.schema_tables  (cost=0.43..8.45 rows=1 width=8) (actual
time=0.011..0.011 rows=0 loops=95)
              Index Cond: (schema_tables.id = schema_indices.table_id)
              Filter: (schema_tables.database_id = 123)
              Rows Removed by Filter: 1
              Buffers: shared hit=386
              Worker 0: actual time=0.011..0.011 rows=0 loops=32
                Buffers: shared hit=129
              Worker 1: actual time=0.011..0.011 rows=0 loops=27
                Buffers: shared hit=111
Planning Time: 0.429 ms
Execution Time: 1667.373 ms
```

The Nested Loop here aggregates data for metrics like `buffers read`
from its workers, and to calculate a metric like `buffers read` for
the parallel leader, we can subtract the values recorded in each
individual worker. This happens in the Seq Scan and Index Scan
children, as well. However, the Gather node appears to only include
values from its direct parallel leader child (excluding that child's
workers).

This leads to the odd situation that the Gather has lower values for
some of these metrics than its child (because the child node reporting
includes the worker metrics) even though the values are supposed to be
cumulative. This is even more surprising for something like I/O
timing, where the Gather has a lower `read` value than one of the
Nested Loop workers, which doesn't make sense in terms of wall-clock
time.

Is this behavior intentional? If so, is there an explanation of the
reasoning or the trade-offs involved? Would it not make sense to
propagate those cumulative parallel costs up the tree all the way to
the root, instead of only using the parallel leader metrics under
Gather?

Thanks,
Maciek



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Decomposing xml into table
Next
From: Robert Haas
Date:
Subject: Re: Default setting for enable_hashagg_disk