Re: Proposal: Progressive explain - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Proposal: Progressive explain |
Date | |
Msg-id | CA+TgmobrzeDep+Z1BPQqGNsCqTQ8M58wNNKJB_8Lwpwbqbz3GQ@mail.gmail.com Whole thread Raw |
In response to | Re: Proposal: Progressive explain (Rafael Thofehrn Castro <rafaelthca@gmail.com>) |
Responses |
Re: Proposal: Progressive explain
|
List | pgsql-hackers |
On Wed, Mar 26, 2025 at 5:58 PM Rafael Thofehrn Castro <rafaelthca@gmail.com> wrote: > So implementation was done based on transaction nested level. Cleanup is only > performed when AbortSubTransaction() is called in the same transaction nested > level as the one where the query is running. This covers both PL/pgSQL exception > blocks and savepoints. Right. I think this is much closer to being correct. However, I actually think it should look more like this: void AtEOSubXact_ProgressiveExplain(bool isCommit, int nestDepth) { if (activeQueryDesc != NULL && activeQueryXactNestLevel >= nestDepth) { if (isCommit) elog(WARNING, "leaked progressive explain query descriptor"); ProgressiveExplainCleanup(NULL); } } By including the is-commit case in there, we can catch any bugs where things aren't cleaned up properly before a transaction is committed. We generally want to test >= nestDepth instead of == nestDepth in case multiple subtransaction levels abort all at once; I'm not sure it matters here, but even if it isn't, it's best to be consistent with the practice elsewhere. Having {Commit,Abort}SubTransaction pass the nestDepth instead of calling GetCurrentTransactionNestLevel() also has precedent e.g. see AtEOSubXact_HashTables. As a further refinement, consider initializing activeQueryXactNestLevel to -1 and resetting it to that value each time you end a progressive EXPLAIN, so that activeQueryDesc != NULL if and only if activeQueryXactNestLevel >= 0. Then the test above can be simplified to just if (activeQueryXactNestLevel >= nestDepth). The comment for ProgressiveExplainIsActive() is a copy-paste from another function that you forgot to update. Also, the function body could be reduced to a one-liner: return queryDesc == activeQueryDesc; There is a comment in ExplainOnePlan() that says "Handle progressive explain cleanup manually if enabled as it is not called as part of ExecutorFinish," but standard_ExecutorFinish does indeed call ProgressiveExplainFinish(), so either the comment is misleading or the code is wrong. standard_ExecutorFinish() makes its call to ProgressiveExplainFinish() dependent on the value of the progressive_explain GUC, but that GUC could be changed in mid-query via set_config() or a plpgsql function that calls SET, which could result in skipping the cleanup even when it's needed. I think you should make the call unconditional and make it entirely the job of ProgressiveExplainFinish() to decide whether any cleanup is needed. ProgressiveExplainFinish() calls ProgressiveExplainCleanup() in most cases, but sometimes just disables the timeout instead. I think this is weird. I think you should just always call ProgressiveExplainCleanup() and make sure it's robust and cleans up however much or little is appropriate in all cases. On the flip side, I can't really see why dsa_detach(queryDesc->pestate->pe_a) needs to be done while holding ProgressiveExplainHashLock. Why not just have ProgressiveExplainFinish() call ProgressiveExplainCleanup(), and then afterwards it can do the dsa_detach() in the caller? Then ProgressiveExplainCleanup() no longer needs an argument. ProgressiveExplainPrint() can save a level of indentation in a large chunk of the function by understanding that elog(ERROR) does not return. You don't need to wrap everything that follows in else {}. In the documentation table called pg-stat-progress-explain-view, some descriptions end with a period and others do not. Calling ProgressiveExplainTrigger() directly from ProgressiveExplainStartupTimeoutHandler() seems extremely scary -- we've tried hard to make our signal handlers do only very simple things like setting a flag, and this one runs around the entire PlanState tree modifying Very Important Things. I fear that's going to be hard to make robust. Like, what happens if we're going around trying to change ExecProcNode pointers while the calling code was also going around trying to change ExecProcNode pointers? I can't quite see how this won't result in chaos. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: