Re: Planning counters in pg_stat_statements (using pgss_store) - Mailing list pgsql-hackers
From | Julien Rouhaud |
---|---|
Subject | Re: Planning counters in pg_stat_statements (using pgss_store) |
Date | |
Msg-id | 20200326132242.GA80836@nol Whole thread Raw |
In response to | Re: Planning counters in pg_stat_statements (using pgss_store) (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Responses |
Re: Planning counters in pg_stat_statements (using pgss_store)
|
List | pgsql-hackers |
On Thu, Mar 26, 2020 at 08:08:35PM +0900, Fujii Masao wrote: > > On 2020/03/25 22:45, Julien Rouhaud wrote: > > On Wed, Mar 25, 2020 at 10:09:37PM +0900, Fujii Masao wrote: > > > + /* > > > + * We can't process the query if no query_text is provided, as pgss_store > > > + * needs it. We also ignore query without queryid, as it would be treated > > > + * as a utility statement, which may not be the case. > > > + */ > > > > > > Could you tell me why the planning stats are not tracked when executing > > > utility statements? In some utility statements like REFRESH MATERIALIZED VIEW, > > > the planner would work. > > > > I explained that in [1]. The problem is that the underlying statement doesn't > > get the proper stmt_location and stmt_len, so you eventually end up with two > > different entries. > > It's not problematic to have two different entries in that case. Right? I will unnecessarily bloat the entries, and makes users life harder too. This example is quite easy to deal with, but if the application is sending multi-query statements, you'll just end up with a mess impossible to properly handle. > The actual problem is that the statements reported in those entries are > very similar? For example, when "create table test as select 1;" is executed, > it's strange to get the following two entries, as you explained. > > create table test as select 1; > create table test as select 1 > > But it seems valid to get the following two entries in that case? > > select 1 > create table test as select 1 > > The former is the nested statement and the latter is the top statement. I think that there should only be 1 entry, the utility command. It seems easy to correlate the planning time to the underlying query, but I'm not entirely sure that the execution counters won't be impacted by the fact is being run in a utilty statements. Also, for now current pgss behavior is to always merge underlying optimisable statements in the utility command, and it seems a bit late in this release cycle to revisit that. I'd be happy to work on improving that for the next release, among other things. For instance the total lack of normalization for utility commands [2] is also something that has been bothering me for a long time. In some workloads, you can end up with the entries almost entirely filled with 1-time-execution commands, just because it's using random identifiers, so you have no other choice than to disable track_utility, although it would have been useful for other commands. > Here are other comments. > > - if (jstate) > + if (kind == PGSS_JUMBLE) > > Why is PGSS_JUMBLE necessary? ISTM that we can still use jstate here, instead. > > If it's ok to remove PGSS_JUMBLE, we can define PGSS_NUM_KIND(=2) instead > and replace 2 in, e.g., total_time[2] with PGSS_NUM_KIND. Thought? Yes, we could be using jstate here. I originally used that to avoid passing PGSS_EXEC (or the other one) as a way to say "ignore this information as there's the jstate which says it's yet another meaning". If that's not an issue, I can change that as PGSS_NUM_KIND will clearly improve the explicit "2" all over the place. > + <entry><structfield>total_time</structfield></entry> > + <entry><type>double precision</type></entry> > + <entry></entry> > + <entry> > + Total time spend planning and executing the statement, in milliseconds > + </entry> > + </row> > > pg_stat_statements view has this column but the function not. > We should make both have the column or not at all, for consistency? > I'm not sure if it's good thing to expose the sum of total_plan_time > and total_exec_time as total_time. If some users want that, they can > easily calculate it from total_plan_time and total_exec_time by using > their own logic. I think we originally added it as a way to avoid too much compatibility break, and also because it seems like a field most users will be interested in anyway. Now that I'm thinking about it again, I indeed think it was a mistake to have that in view part only. Not mainly for consistency, but for users who would be interested in the total_time field while not wanting to pay the overhead of retrieving the query text if they don't need it. So I'll change that! > + nested_level++; > + PG_TRY(); > > In old thread [1], Tom Lane commented the usage of nested_level > in the planner hook. There seems no reply to that so far. What's > your opinion about that comment? > > [1] https://www.postgresql.org/message-id/28980.1515803777@sss.pgh.pa.us Oh thanks, I didn't noticed this part of the discussion. I agree with Tom's concern, and I think that having a specific nesting level variable for the planner is the best workaround, so I'll implement that. [2] https://twitter.com/fujii_masao/status/1242978261572837377
pgsql-hackers by date: