Re: RFC: Logging plan of the running query - Mailing list pgsql-hackers
From | torikoshia |
---|---|
Subject | Re: RFC: Logging plan of the running query |
Date | |
Msg-id | c25ae6015be96a1964eddd964657660b@oss.nttdata.com Whole thread Raw |
In response to | Re: RFC: Logging plan of the running query (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>) |
Responses |
Re: RFC: Logging plan of the running query
|
List | pgsql-hackers |
On 2023-11-06 15:32, Ashutosh Bapat wrote: Thanks for the suggestion and example. > On Fri, Nov 3, 2023 at 7:31 PM Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: >> >> I have following questions related to the functionality. (Please point >> me to the relevant discussion if this has been already discussed.) >> >> 1. When a backend is running nested queries, we will see the plan of >> the innermost query. That query may not be the actual culprit if the >> user query is running slowly. E.g a query being run as part of inner >> side nested loop when the nested loop itself is the bottleneck. I >> think it will be useful to print plans of all the whole query stack. This was discussed in previous threads[1] and we thought it'd be useful but since it needed some extra works, we stopped widening the scope. > To further explain this point, consider following scenario > > -- A completely useless function which executes two SQL statements > which take small amount individually > #create function multstmt() returns int > language sql as $$ > select count(*) from pg_class where reltype = 12345; > select count(*) from pg_type limit 10; > $$; > > -- force a suboptimal plan > #set enable_hashjoin to false; > #set enable_mergejoin to false; > > -- A completely useless but long running query > #select c.oid, t.oid from pg_class c, pg_type t where multstmt(c.oid) > = multstmt(t.oid); > > This take a few minutes on my laptop. > > In a separate session query pg_stat_activity. We will see the original > query > #select pid, query, backend_type from pg_stat_activity where pid = > 349330; > pid | query > | backend_type > --------+-----------------------------------------------------------------------------------------+---------------- > 349330 | select c.oid, t.oid from pg_class c, pg_type t where > multstmt(c.oid) = multstmt(t.oid); | client backend > (1 row) > > Run the plan output function a few times > #select pg_log_query_plan(349330); > > You will observe different plans based on which of the innermost query > is runnning > LOG: query plan running on backend with PID 349330 is: > Query Text: > select count(*) from pg_class where reltype = typeid; > select count(*) from pg_type where oid = typeid; > > Query Parameters: $1 = '600' > Aggregate (cost=18.18..18.19 rows=1 width=8) > Output: count(*) > -> Seq Scan on pg_catalog.pg_class (cost=0.00..18.18 rows=2 > width=0) > Output: oid, relname, relnamespace, reltype, > reloftype, relowner, relam, relfilenode, reltablespace, relpages, > reltuples, relallvisible, reltoastrelid, relhasindex, relisshared, > relpersistence, relkind, relnatts, relchecks, relhasrules, > relhastriggers, relhassubclass, relrowsecurity, relforcerowsecurity, > relispopulated, relreplident, relispartition, relrewrite, > relfrozenxid, relminmxid, relacl, reloptions, relpartbound > Filter: (pg_class.reltype = $1) > Settings: enable_hashjoin = 'off', enable_mergejoin = 'off' > 2023-11-06 11:52:25.610 IST [349330] LOG: query plan running on > backend with PID 349330 is: > Query Text: > select count(*) from pg_class where reltype = typeid; > select count(*) from pg_type where oid = typeid; > > Query Parameters: $1 = '2203' > Aggregate (cost=4.30..4.31 rows=1 width=4) > Output: count(*) > -> Index Only Scan using pg_type_oid_index on > pg_catalog.pg_type (cost=0.28..4.29 rows=1 width=0) > Output: oid > Index Cond: (pg_type.oid = $1) > Settings: enable_hashjoin = 'off', enable_mergejoin = 'off' > > Individual pieces are confusing here since the query run by the > backend is not the one being EXPLAINed. A user may not know that the > queries being EXPLAINed arise from the function call multstmt(). So > they won't be able to stitch the pieces together unless they see plan > of the outermost query with loops and costs. What might help if we > explain each query in the hierarchy. > > I think we can start with what auto_explain is doing. Always print the > plan of the outermost query; the query found in pg_stat_activity. In a > later version we might find a way to print plans of all the queries in > the stack and do so in a readable manner. Agreed there are cases printing plan of the outermost query is more useful. > > This makes tracking activeQueryDesc a bit tricky. My guess is that the > outermost query's descriptor will be available through ActivePortal > most of the time. But there are cases when ExecutorRun is called by > passing a queryDesc directly. So may be there are some cases where > that's not true. Yeah, actually the original version of the patch got the plan from ActivePortal, but it failed logging plan when the query was something like this[2]: DO $$ BEGIN PERFORM pg_sleep(100); END$$; > 2. When a query is running in parallel worker do we want to print that > query? It may or may not be interesting given the situation. If the > overall plan itself is faulty, output of the parallel worker query is > not useful. If the plan is fine but a given worker's leg is running > slowly it may be interesting. I think it can be useful. I'm wondering if we can add this after the first patch for this feature is committed. > As a side note, you may want to fix the indentation in > ExplainAssembleLogOutput(). Thanks, modified it. Since the documentation was obsoleted, attached patch also updated it. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Attachment
pgsql-hackers by date: