Re: Proposal: Progressive explain - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Proposal: Progressive explain |
Date | |
Msg-id | CA+Tgmobd5RH156_QEmTqHtgbtqDy=G2WJ1aJ0m-BOW4qqWVAfA@mail.gmail.com Whole thread Raw |
In response to | Re: Proposal: Progressive explain (torikoshia <torikoshia@oss.nttdata.com>) |
Responses |
Re: Proposal: Progressive explain
|
List | pgsql-hackers |
On Tue, Apr 1, 2025 at 9:38 AM torikoshia <torikoshia@oss.nttdata.com> wrote: > Could you please check if you can reproduce this? I can, and I now see that this patch has a pretty big design problem. The assertion failure occurs when a background worker tries to call ruleutils.c's get_parameter(), which tries to find the expression from which the parameter was computed. To do that, it essentially looks upward in the plan tree until it finds the source of the parameter. For example, if the parameter came from the bar_id column of relation foo, we would then deparse the parameter as foo.bar_id, rather than as $1 or similar. However, this doesn't work when the deparsing is attempted from within a parallel worker, because the parallel worker only gets the portion of the plan it's attempting to execute, not the whole thing. In your test case, the whole plan looks like this: Aggregate InitPlan 1 -> Result InitPlan 2 -> Result -> Gather Workers Planned: 2 -> Parallel Append -> Parallel Seq Scan on ab_a1_b2 ab_1 Filter: ((b = 2) AND ((a = (InitPlan 1).col1) OR (a = (InitPlan 2).col1))) -> Parallel Seq Scan on ab_a2_b2 ab_2 Filter: ((b = 2) AND ((a = (InitPlan 1).col1) OR (a = (InitPlan 2).col1))) -> Parallel Seq Scan on ab_a3_b2 ab_3 Filter: ((b = 2) AND ((a = (InitPlan 1).col1) OR (a = (InitPlan 2).col1))) Those references to (InitPlan 1).col1 are actually Params. I think what's happening here is approximately that the worker tries to find the source of those Params, but (InitPlan 1) is above the Gather node and thus not available to the worker, and so the worker can't find it and the assertion fails. In one sense, it is not hard to fix this: the workers shouldn't really be doing progressive_explain at all, because then we get a progressive_explain from each process individually instead of one for the query as a whole, so we could just think of having the workers ignore the progressive_explain GUC. However, one thing I realized earlier this morning is that the progressive EXPLAIN can't show any of the instrumentation that is relayed back from workers to the leader only at the end of the execution. See the code in ParallelQueryMain() just after ExecutorFinish(). What I think this means is that this patch needs significant rethinking to cope with parallel query. I don't think users are going to be happy with a progressive EXPLAIN that just ignores what the workers are doing, and I don't think they're going to be happy with N separate progressive explains that they have to merge together to get an overall picture of what the query is doing, and I'm absolutely positive that they're not going to be happy with something that crashes. I think there may be a way to come up with a good design here that avoids these problems, but we definitely don't have time before feature freeze (not that we were necessarily in a great place to think of committing this before feature freeze anyway, but it definitely doesn't make sense now that I understand this problem). -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: