RE: Planning counters in pg_stat_statements (using pgss_store) - Mailing list pgsql-hackers

From imai.yoshikazu@fujitsu.com
Subject RE: Planning counters in pg_stat_statements (using pgss_store)
Date
Msg-id OSBPR01MB46160534A6DF796D180FA88E94700@OSBPR01MB4616.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Planning counters in pg_stat_statements (using pgss_store)  (Julien Rouhaud <rjuju123@gmail.com>)
Responses Re: Planning counters in pg_stat_statements (using pgss_store)
List pgsql-hackers
On Wed, Nov 13, 2019 at 10:50 AM, Julien Rouhaud wrote:
> On Tue, Nov 12, 2019 at 5:41 AM imai.yoshikazu@fujitsu.com <imai.yoshikazu@fujitsu.com> wrote:
> >
> > On Sat, Nov 9, 2019 at 1:36 PM, Julien Rouhaud wrote:
> > >
> > > I attach v3 patches implementing those counters.
> >
> > Thanks for updating the patch! Now I can see min/max/mean/stddev plan time.
> >
> >
> > > Note that to avoid duplicating some code (related to Welford's
> > > method), I switched some of the counters to arrays rather than
> > > scalar variables.  It unfortunately makes
> > > pg_stat_statements_internal() a little bit messy, but I hope that it's still acceptable.
> >
> > Yeah, I also think it's acceptable, but I think the codes like below
> > one is more understandable than using for loop in
> > pg_stat_statements_internal(), although some codes will be duplicated.
> >
> > pg_stat_statements_internal():
> >
> > if (api_version >= PGSS_V1_8)
> > {
> >     kind = PGSS_PLAN;
> >     values[i++] = Int64GetDatumFast(tmp.calls[kind]);
> >     values[i++] = Float8GetDatumFast(tmp.total_time[kind]);
> >     values[i++] = Float8GetDatumFast(tmp.min_time[kind]);
> >     values[i++] = Float8GetDatumFast(tmp.max_time[kind]);
> >     values[i++] = Float8GetDatumFast(tmp.mean_time[kind]);
> >     values[i++] = Float8GetDatumFast(stddev(tmp)); }
> >
> > kind = PGSS_EXEC;
> > values[i++] = Int64GetDatumFast(tmp.calls[kind]);
> > values[i++] = Float8GetDatumFast(tmp.total_time[kind]);
> > if (api_version >= PGSS_V1_3)
> > {
> >     values[i++] = Float8GetDatumFast(tmp.min_time[kind]);
> >     values[i++] = Float8GetDatumFast(tmp.max_time[kind]);
> >     values[i++] = Float8GetDatumFast(tmp.mean_time[kind]);
> >     values[i++] = Float8GetDatumFast(stddev(tmp)); }
> >
> >
> > stddev(Counters counters)
> > {
> >     /*
> >      * Note we are calculating the population variance here, not the
> >      * sample variance, as we have data for the whole population, so
> >      * Bessel's correction is not used, and we don't divide by
> >      * tmp.calls - 1.
> >      */
> >     if (counters.calls[kind] > 1)
> >         return stddev = sqrt(counters.sum_var_time[kind] / counters.calls[kind]);
> >
> >     return 0.0;
> > }
> 
> Yes, that's also a possibility (though this should also take the
> "kind" as parameter).  I wanted to avoid adding a new function and
> save some redundant code, but I can change it in the next version of
> the patch if needed.

Okay. It's not much a serious problem, so we can leave it as it is.


> > > While doing this refactoring
> > > I saw that previous patches were failing to accumulate the buffers used during planning, which is now fixed.
> >
> > Checked.
> > Now buffer usage columns include buffer usage during planning and executing,
> > but if we turn off track_planning, buffer usage during planning is also not
> > tracked which is good for users who don't want to take into account of that.
> 
> Indeed.  Note that there's a similar discussion on adding planning
> buffer counters to explain in [1].  I'm unsure if merging planning and
> execution counters in the same columns is ok or not.

Storing buffer usage to different columns is useful to detect the cause of the problems if there are the cases many
buffersare used during planning, but I'm also unsure those cases actually exist. 
 


> > What I'm concerned about is column names will not be backward-compatible.
> > {total, min, max, mean, stddev}_{plan, exec}_time are the best names which
> > correctly show the meaning of its value, but we can't use
> > {total, min, max, mean, stddev}_time anymore and it might be harmful for
> > some users.
> > I don't come up with any good idea for that...
> 
> Well, perhaps keeping the old {total, min, max, mean, stddev}_time
> would be ok, as those historically meant "execution".  I don't have a
> strong opinion here.

Actually I also don't have strong opinion but I thought someone would complain about renaming of those columns and also
sometools like monitoring which use those columns will not work. If we use {total, min, max, mean, stddev}_time,
someonemight mistakenly understand {total, min, max, mean, stddev}_time mean {total, min, max, mean, stddev} of
planningand execution. 
 
If I need to choose {total, min, max, mean, stddev}_time or {total, min, max, mean, stddev}_exec_time, I choose former
onebecause choosing best name is not worth destructing the existing scripts or tools.
 

Thanks.
--
Yoshikazu Imai 

pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: format of pg_upgrade loadable_libraries warning
Next
From: Michael Paquier
Date:
Subject: Re: SKIP_LOCKED test causes random buildfarm failures