Re: Proposal: Progressive explain - Mailing list pgsql-hackers
From | Rafael Thofehrn Castro |
---|---|
Subject | Re: Proposal: Progressive explain |
Date | |
Msg-id | CAG0ozMpr67NqfsktboC9o=JJ7jSZpMWXpNn45uJ7aZOuT4vhYQ@mail.gmail.com Whole thread Raw |
In response to | Re: Proposal: Progressive explain (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
My bad, I mistakenly did the tests without assertion enabled in the last 2 days,
so couldn't catch that Assertion failure. Was able to reproduce it, thanks.
I guess when the code was designed we were not expecting to be doing explains
in parallel workers.
One comment is that this has nothing to do with instrumentation. So the hacks
done with instrument objects are not to blame here.
What is interesting is that removing that Assert (not saying we should do that)
fixes it and there doesn't seem to be other Asserts complaining anywhere. The
feature works as expected and there are no crashes.
> What I think this means is that this patch needs significant
> rethinking to cope with parallel query. I don't think users are going
> to be happy with a progressive EXPLAIN that just ignores what the
> workers are doing, and I don't think they're going to be happy with N
> separate progressive explains that they have to merge together to get
> an overall picture of what the query is doing, and I'm absolutely
> positive that they're not going to be happy with something that
> crashes. I think there may be a way to come up with a good design here
> that avoids these problems, but we definitely don't have time before
> feature freeze (not that we were necessarily in a great place to think
> of committing this before feature freeze anyway, but it definitely
> doesn't make sense now that I understand this problem).
Yeah, that is a fair point. But I actually envisioned this feature to also
target parallel workers, from the start. I see a lot of value in being able
to debug what each parallel worker is doing in their blackboxes. It could be
that a worker is lagging behind others as it is dealing with non cached
data blocks for example.
According to my tests the parallel workers can actually push instrumentation
to the parent while still running. It all depends on the types of operations
being performed in the plan.
For example, if the parallel worker is doing a parallel seq scan, then it will
continuously send chunks of rows to the parent and instrumentation goes with
the data. So we don't actually need to wait for the workers to finish until
instrumentation on the parent gets updated. Here is what I got from Torikoshi's
test after removing that Assert and enabling instrumented progressive
explain (progressive_explain_interval = '10ms'):
-[ RECORD 1 ]-------------------------------------------------------------------------------------------------------------------------------------------------
datid | 5
datname | postgres
pid | 169555
last_update | 2025-04-01 12:44:07.36775-03
query_plan | Gather (cost=0.02..137998.03 rows=10000002 width=8) (actual time=0.715..868.619 rows=9232106.00 loops=1) (current) +
| Workers Planned: 2 +
| Workers Launched: 2 +
| InitPlan 1 +
| -> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002 rows=1.00 loops=1) +
| InitPlan 2 +
| -> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.000..0.000 rows=1.00 loops=1) +
| -> Parallel Append (cost=0.00..137998.01 rows=4166669 width=8) (actual time=0.364..264.804 rows=1533011.00 loops=1) +
| -> Seq Scan on ab_a2_b1 ab_2 (cost=0.00..0.00 rows=1 width=8) (never executed) +
| Filter: ((b = 1) AND ((a = (InitPlan 1).col1) OR (a = (InitPlan 2).col1))) +
| -> Seq Scan on ab_a3_b1 ab_3 (cost=0.00..0.00 rows=1 width=8) (never executed) +
| Filter: ((b = 1) AND ((a = (InitPlan 1).col1) OR (a = (InitPlan 2).col1))) +
| -> Parallel Seq Scan on ab_a1_b1 ab_1 (cost=0.00..117164.67 rows=4166667 width=8) (actual time=0.361..192.896 rows=1533011.00 loops=1)+
| Filter: ((b = 1) AND ((a = (InitPlan 1).col1) OR (a = (InitPlan 2).col1))) +
|
-[ RECORD 2 ]-------------------------------------------------------------------------------------------------------------------------------------------------
datid | 5
datname | postgres
pid | 169596
last_update | 2025-04-01 12:44:07.361846-03
query_plan | Parallel Append (cost=0.00..137998.01 rows=4166669 width=8) (actual time=0.990..666.845 rows=3839515.00 loops=1) (current) +
| -> Seq Scan on ab_a2_b1 ab_2 (cost=0.00..0.00 rows=1 width=8) (never executed) +
| Filter: ((b = 1) AND ((a = $0) OR (a = $1))) +
| -> Seq Scan on ab_a3_b1 ab_3 (cost=0.00..0.00 rows=1 width=8) (never executed) +
| Filter: ((b = 1) AND ((a = $0) OR (a = $1))) +
| -> Parallel Seq Scan on ab_a1_b1 ab_1 (cost=0.00..117164.67 rows=4166667 width=8) (actual time=0.985..480.531 rows=3839515.00 loops=1) +
| Filter: ((b = 1) AND ((a = $0) OR (a = $1))) +
|
-[ RECORD 3 ]-------------------------------------------------------------------------------------------------------------------------------------------------
datid | 5
datname | postgres
pid | 169597
last_update | 2025-04-01 12:44:07.36181-03
query_plan | Parallel Append (cost=0.00..137998.01 rows=4166669 width=8) (actual time=1.003..669.398 rows=3830293.00 loops=1) (current) +
| -> Seq Scan on ab_a2_b1 ab_2 (cost=0.00..0.00 rows=1 width=8) (never executed) +
| Filter: ((b = 1) AND ((a = $0) OR (a = $1))) +
| -> Seq Scan on ab_a3_b1 ab_3 (cost=0.00..0.00 rows=1 width=8) (actual time=0.019..0.019 rows=0.00 loops=1) +
| Filter: ((b = 1) AND ((a = $0) OR (a = $1))) +
| -> Parallel Seq Scan on ab_a1_b1 ab_1 (cost=0.00..117164.67 rows=4166667 width=8) (actual time=0.979..482.420 rows=3830293.00 loops=1) +
| Filter: ((b = 1) AND ((a = $0) OR (a = $1))) +
|
But yeah, if the parallel worker does a hash join and the HASH node is a huge
block of sub operations, then yes, upstream will not see anything until the HASH
is computed. That is why IMO having visibility on background workers has value
too.
I even designed a query visualizer that combines plans of the parent and
parallel workers in side-by-side windows that includes other stats (per
process wait events, CPU, memory consumption, temp file generation) allowing
us to correlate with specific operations being done in the plans.
But Robert is right, the engineering required to make this happen isn't
easy. Changing that Assert can open a pandora box of other issues and we
have no time to do that for PG18.
Rafael.
so couldn't catch that Assertion failure. Was able to reproduce it, thanks.
I guess when the code was designed we were not expecting to be doing explains
in parallel workers.
One comment is that this has nothing to do with instrumentation. So the hacks
done with instrument objects are not to blame here.
What is interesting is that removing that Assert (not saying we should do that)
fixes it and there doesn't seem to be other Asserts complaining anywhere. The
feature works as expected and there are no crashes.
> What I think this means is that this patch needs significant
> rethinking to cope with parallel query. I don't think users are going
> to be happy with a progressive EXPLAIN that just ignores what the
> workers are doing, and I don't think they're going to be happy with N
> separate progressive explains that they have to merge together to get
> an overall picture of what the query is doing, and I'm absolutely
> positive that they're not going to be happy with something that
> crashes. I think there may be a way to come up with a good design here
> that avoids these problems, but we definitely don't have time before
> feature freeze (not that we were necessarily in a great place to think
> of committing this before feature freeze anyway, but it definitely
> doesn't make sense now that I understand this problem).
Yeah, that is a fair point. But I actually envisioned this feature to also
target parallel workers, from the start. I see a lot of value in being able
to debug what each parallel worker is doing in their blackboxes. It could be
that a worker is lagging behind others as it is dealing with non cached
data blocks for example.
According to my tests the parallel workers can actually push instrumentation
to the parent while still running. It all depends on the types of operations
being performed in the plan.
For example, if the parallel worker is doing a parallel seq scan, then it will
continuously send chunks of rows to the parent and instrumentation goes with
the data. So we don't actually need to wait for the workers to finish until
instrumentation on the parent gets updated. Here is what I got from Torikoshi's
test after removing that Assert and enabling instrumented progressive
explain (progressive_explain_interval = '10ms'):
-[ RECORD 1 ]-------------------------------------------------------------------------------------------------------------------------------------------------
datid | 5
datname | postgres
pid | 169555
last_update | 2025-04-01 12:44:07.36775-03
query_plan | Gather (cost=0.02..137998.03 rows=10000002 width=8) (actual time=0.715..868.619 rows=9232106.00 loops=1) (current) +
| Workers Planned: 2 +
| Workers Launched: 2 +
| InitPlan 1 +
| -> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002 rows=1.00 loops=1) +
| InitPlan 2 +
| -> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.000..0.000 rows=1.00 loops=1) +
| -> Parallel Append (cost=0.00..137998.01 rows=4166669 width=8) (actual time=0.364..264.804 rows=1533011.00 loops=1) +
| -> Seq Scan on ab_a2_b1 ab_2 (cost=0.00..0.00 rows=1 width=8) (never executed) +
| Filter: ((b = 1) AND ((a = (InitPlan 1).col1) OR (a = (InitPlan 2).col1))) +
| -> Seq Scan on ab_a3_b1 ab_3 (cost=0.00..0.00 rows=1 width=8) (never executed) +
| Filter: ((b = 1) AND ((a = (InitPlan 1).col1) OR (a = (InitPlan 2).col1))) +
| -> Parallel Seq Scan on ab_a1_b1 ab_1 (cost=0.00..117164.67 rows=4166667 width=8) (actual time=0.361..192.896 rows=1533011.00 loops=1)+
| Filter: ((b = 1) AND ((a = (InitPlan 1).col1) OR (a = (InitPlan 2).col1))) +
|
-[ RECORD 2 ]-------------------------------------------------------------------------------------------------------------------------------------------------
datid | 5
datname | postgres
pid | 169596
last_update | 2025-04-01 12:44:07.361846-03
query_plan | Parallel Append (cost=0.00..137998.01 rows=4166669 width=8) (actual time=0.990..666.845 rows=3839515.00 loops=1) (current) +
| -> Seq Scan on ab_a2_b1 ab_2 (cost=0.00..0.00 rows=1 width=8) (never executed) +
| Filter: ((b = 1) AND ((a = $0) OR (a = $1))) +
| -> Seq Scan on ab_a3_b1 ab_3 (cost=0.00..0.00 rows=1 width=8) (never executed) +
| Filter: ((b = 1) AND ((a = $0) OR (a = $1))) +
| -> Parallel Seq Scan on ab_a1_b1 ab_1 (cost=0.00..117164.67 rows=4166667 width=8) (actual time=0.985..480.531 rows=3839515.00 loops=1) +
| Filter: ((b = 1) AND ((a = $0) OR (a = $1))) +
|
-[ RECORD 3 ]-------------------------------------------------------------------------------------------------------------------------------------------------
datid | 5
datname | postgres
pid | 169597
last_update | 2025-04-01 12:44:07.36181-03
query_plan | Parallel Append (cost=0.00..137998.01 rows=4166669 width=8) (actual time=1.003..669.398 rows=3830293.00 loops=1) (current) +
| -> Seq Scan on ab_a2_b1 ab_2 (cost=0.00..0.00 rows=1 width=8) (never executed) +
| Filter: ((b = 1) AND ((a = $0) OR (a = $1))) +
| -> Seq Scan on ab_a3_b1 ab_3 (cost=0.00..0.00 rows=1 width=8) (actual time=0.019..0.019 rows=0.00 loops=1) +
| Filter: ((b = 1) AND ((a = $0) OR (a = $1))) +
| -> Parallel Seq Scan on ab_a1_b1 ab_1 (cost=0.00..117164.67 rows=4166667 width=8) (actual time=0.979..482.420 rows=3830293.00 loops=1) +
| Filter: ((b = 1) AND ((a = $0) OR (a = $1))) +
|
But yeah, if the parallel worker does a hash join and the HASH node is a huge
block of sub operations, then yes, upstream will not see anything until the HASH
is computed. That is why IMO having visibility on background workers has value
too.
I even designed a query visualizer that combines plans of the parent and
parallel workers in side-by-side windows that includes other stats (per
process wait events, CPU, memory consumption, temp file generation) allowing
us to correlate with specific operations being done in the plans.
But Robert is right, the engineering required to make this happen isn't
easy. Changing that Assert can open a pandora box of other issues and we
have no time to do that for PG18.
Rafael.
pgsql-hackers by date: