Re: RFC: Logging plan of the running query - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: RFC: Logging plan of the running query |
Date | |
Msg-id | CA+TgmoatMzMedKRYAK5s3euxUohd6m00szL4Q6GGQj+NhF813A@mail.gmail.com Whole thread Raw |
In response to | Re: RFC: Logging plan of the running query (torikoshia <torikoshia@oss.nttdata.com>) |
Responses |
Re: RFC: Logging plan of the running query
Re: RFC: Logging plan of the running query |
List | pgsql-hackers |
On Fri, Mar 21, 2025 at 8:40 AM torikoshia <torikoshia@oss.nttdata.com> wrote: > Rebased it again. > > On 2025-03-10 14:10, torikoshia wrote: > > BTW the patch adds about 400 lines to explain.c and it may be better > > to split the file as well as 9173e8b6046, but I leave it as it is for > > now. > > This part remains unchanged. Looking at ExplainAssembleLogOutput() is making me realize that auto_explain is in serious need of some cleanup. That's not really the fault of this patch, but the hack whereby we overwrite the [] that would have surrounded the JSON output with {} is not very nice. I also think that the auto_explain GUCs need rethinking. In theory, new EXPLAIN options should be mirrored into auto_explain, but if you compare ExplainAssembleLogOutput() to ExplainOnePlan(), you can see that they are diverging. The PLANNING, SUMMARY, and SERIALIZE options that are known to regular EXPLAIN aren't known to auto_explain, and any customizable options that use explain_per_plan_hook won't be able to work with auto_explain, either. Changing this is non-trivial because SERIALIZE, for example, can't work the same way for auto_explain as it does for EXPLAIN, and a full solution might also require user-visible changes like replacing auto_explain.log_<whatever> with auto_explain.explain, so I don't really know. Maybe we should just live with it the way it is for now, but it doesn't look very nice. Rafael and I were just discussing off-list to what extent the parallel query problems with his patch also apply to yours. His design makes the problem easier to hit, I think, because setting progressive_explain = on makes every backend attempt to dump a query plan, whereas you'd have to hit the worker process with a signal and just the right time. But the fundamental problem appears to be the same. Another interesting wrinkle is that he settled on showing the outermost running query whereas you settled on the innermost. If you were showing the outermost, perhaps you could just prohibit dump-the-query-plan on a parallel worker, but you don't really want to prohibit it categorically, because the worker could be running an inner query that could be dumped. So maybe you want to avoid setting ActiveQueryDesc at the toplevel and then set/clear it for inner queries. However, that's a bit weird, too. If we wanted to undertake a bigger redesign here, we could try pushing the entire query plan (including all subqueries) down to the worker, and just tell it which part it's actually supposed to execute. However, that would have some overhead and would arguably open up some risk of future bugs, since passing subqueries as NULL is, I think, intended partly as a guard against accidentally executing the wrong subqueries. I don't think it's a good idea to put the logic to update ActiveQueryDesc into ExecutorRun. I think that function should really just call the hook, or standard_ExecutorRun. I don't know for sure whether this work should be moved down into ExecutorRun or up into the caller, but I don't think it should stay where it is. My comment about the naming of WrapMultiExecProcNodesWithExplain() on the other thread also applies here: MultiExecProcNode() is an unrelated function. Likewise, WrapExecProcNodeWithExplain() misses walking the node's subplan list. Also, I don't think an ereport(DEBUG1, ...) is appropriate here. Do we really need es->signaled? Couldn't we test es->analyze instead? Do we really need ExecProcNodeOriginal? Can we find some way to reuse ExecProcNodeReal instead of making the structure bigger? I do think we should try to move some of this new code out into a separate source file, but I'm not yet sure what we should call it. We might want to share infrastructure with something what Rafael's patch, which he called progressive EXPLAIN but which is really closer to a general query progress facility, albeit perhaps too expensive to have enabled by default. Does the documentation typically mention when a function is superuser-only? If so, it should do so here, too, using similar wording. It seems unnecessary to remark that you can't log a query plan after a subtransaction abort, because the query wouldn't be running any more. It's also true that you can't log a query after a toplevel abort, even if you're still waiting for the aborted transaction to roll back. But it also seems like once the aborted subtransaction is popped off the stack and we return to the parent transaction, we should be able to log any query running at the outer level. If this is meant to imply that something doesn't work in this scenario, perhaps there is something about the design that needs fixing. Does ActiveQueryDesc really need to be exposed to the whole system? Could it be a file-level variable? -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: