Home > mailing lists

Re: RFC: Logging plan of the running query - Mailing list pgsql-hackers

From	torikoshia
Subject	Re: RFC: Logging plan of the running query
Date	November 9, 2023 09:33:36
Msg-id	c25ae6015be96a1964eddd964657660b@oss.nttdata.com Whole thread Raw
In response to	Re: RFC: Logging plan of the running query (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Responses	Re: RFC: Logging plan of the running query
List	pgsql-hackers

Tree view

On 2023-11-06 15:32, Ashutosh Bapat wrote:

Thanks for the suggestion and example.

> On Fri, Nov 3, 2023 at 7:31 PM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
>> 
>> I have following questions related to the functionality. (Please point
>> me to the relevant discussion if this has been already discussed.)
>> 
>> 1. When a backend is running nested queries, we will see the plan of
>> the innermost query. That query may not be the actual culprit if the
>> user query is running slowly. E.g a query being run as part of inner
>> side nested loop when the nested loop itself is the bottleneck. I
>> think it will be useful to print plans of all the whole query stack.

This was discussed in previous threads[1] and we thought it'd be useful 
but since it needed some extra works, we stopped widening the scope.

> To further explain this point, consider following scenario
> 
> -- A completely useless function which executes two SQL statements
> which take small amount individually
> #create function multstmt() returns int
> language sql as $$
> select count(*) from pg_class where reltype = 12345;
> select count(*) from pg_type limit 10;
> $$;
> 
> -- force a suboptimal plan
> #set enable_hashjoin to false;
> #set enable_mergejoin to false;
> 
> -- A completely useless but long running query
> #select c.oid, t.oid from pg_class c, pg_type t where multstmt(c.oid)
> = multstmt(t.oid);
> 
> This take a few minutes on my laptop.
> 
> In a separate session query pg_stat_activity. We will see the original 
> query
> #select pid, query, backend_type from pg_stat_activity where pid = 
> 349330;
>   pid   |                                          query
>                            |  backend_type
> --------+-----------------------------------------------------------------------------------------+----------------
>  349330 | select c.oid, t.oid from pg_class c, pg_type t where
> multstmt(c.oid) = multstmt(t.oid); | client backend
> (1 row)
> 
> Run the plan output function a few times
> #select pg_log_query_plan(349330);
> 
> You will observe different plans based on which of the innermost query
> is runnning
> LOG:  query plan running on backend with PID 349330 is:
>         Query Text:
>         select count(*) from pg_class where reltype = typeid;
>         select count(*) from pg_type where oid = typeid;
> 
>         Query Parameters: $1 = '600'
>         Aggregate  (cost=18.18..18.19 rows=1 width=8)
>           Output: count(*)
>           ->  Seq Scan on pg_catalog.pg_class  (cost=0.00..18.18 rows=2 
> width=0)
>                 Output: oid, relname, relnamespace, reltype,
> reloftype, relowner, relam, relfilenode, reltablespace, relpages,
> reltuples, relallvisible, reltoastrelid, relhasindex, relisshared,
> relpersistence, relkind, relnatts, relchecks, relhasrules,
> relhastriggers, relhassubclass, relrowsecurity, relforcerowsecurity,
> relispopulated, relreplident, relispartition, relrewrite,
> relfrozenxid, relminmxid, relacl, reloptions, relpartbound
>                 Filter: (pg_class.reltype = $1)
>         Settings: enable_hashjoin = 'off', enable_mergejoin = 'off'
> 2023-11-06 11:52:25.610 IST [349330] LOG:  query plan running on
> backend with PID 349330 is:
>         Query Text:
>         select count(*) from pg_class where reltype = typeid;
>         select count(*) from pg_type where oid = typeid;
> 
>         Query Parameters: $1 = '2203'
>         Aggregate  (cost=4.30..4.31 rows=1 width=4)
>           Output: count(*)
>           ->  Index Only Scan using pg_type_oid_index on
> pg_catalog.pg_type  (cost=0.28..4.29 rows=1 width=0)
>                 Output: oid
>                 Index Cond: (pg_type.oid = $1)
>         Settings: enable_hashjoin = 'off', enable_mergejoin = 'off'
> 
> Individual pieces are confusing here since the query run by the
> backend is not the one being EXPLAINed. A user may not know that the
> queries being EXPLAINed arise from the function call multstmt(). So
> they won't be able to stitch the pieces together unless they see plan
> of the outermost query with loops and costs. What might help if we
> explain each query in the hierarchy.
> 
> I think we can start with what auto_explain is doing. Always print the
> plan of the outermost query; the query found in pg_stat_activity. In a
> later version we might find a way to print plans of all the queries in
> the stack and do so in a readable manner.

Agreed there are cases printing plan of the outermost query is more 
useful.

> 
> This makes tracking activeQueryDesc a bit tricky. My guess is that the
> outermost query's descriptor will be available through ActivePortal
> most of the time. But there are cases when ExecutorRun is called by
> passing a queryDesc directly. So may be there are some cases where
> that's not true.

Yeah, actually the original version of the patch got the plan from 
ActivePortal, but it failed logging plan when the query was something 
like this[2]:

      DO $$
      BEGIN
      PERFORM pg_sleep(100);
      END$$;

> 2. When a query is running in parallel worker do we want to print that
> query? It may or may not be interesting given the situation. If the
> overall plan itself is faulty, output of the parallel worker query is
> not useful. If the plan is fine but a given worker's leg is running
> slowly it may be interesting.

I think it can be useful.
I'm wondering if we can add this after the first patch for this feature 
is committed.

> As a side note, you may want to fix the indentation in
> ExplainAssembleLogOutput().

Thanks, modified it.
Since the documentation was obsoleted, attached patch also updated it.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Attachment

v33-0001-Add-function-to-log-the-plan-of-the-query.patch

pgsql-hackers by date:

From: Amit Kapila
Date: 09 November 2023, 09:30:59
Subject: Re: A recent message added to pg_upgade

From: Kyotaro Horiguchi
Date: 09 November 2023, 09:42:30
Subject: Re: A recent message added to pg_upgade

Re: RFC: Logging plan of the running query - Mailing list pgsql-hackers

Attachment

Previous

Next