Re: pg_stat_statements: calls under-estimation propagation - Mailing list pgsql-hackers

From Daniel Farina
Subject Re: pg_stat_statements: calls under-estimation propagation
Date
Msg-id CAAZKuFY=pxC=+imBkMjF+vk2WK2v+ZyaQpg4dFx08tGdf-9ukA@mail.gmail.com
Whole thread Raw
In response to Re: pg_stat_statements: calls under-estimation propagation  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: pg_stat_statements: calls under-estimation propagation  (Sameer Thakur <samthakur74@gmail.com>)
List pgsql-hackers
On Fri, Oct 4, 2013 at 7:22 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Thu, Oct 3, 2013 at 5:11 PM, Sameer Thakur <samthakur74@gmail.com> wrote:
>> On Wed, Oct 2, 2013 at 6:40 PM, Sameer Thakur <samthakur74@gmail.com> wrote:
>>>>
>>>> Looks pretty good. Do you want to package up the patch with your
>>>> change and do the honors and re-submit it? Thanks for helping out so
>>>> much!
>>> Sure, will do. Need to add a bit of documentation explaining
>>> statistics session as well.
>>> I did some more basic testing around pg_stat_statements.max, now that
>>> we have clarity from Peter about its value being legitimate below 100.
>>> Seems to work fine, with pg_stat_statements =4 the max unique queries
>>> in the view are 4. On the 5th query the view holds just the latest
>>> unique query discarding the previous 4. Fujii had reported a
>>> segmentation fault in this scenario.
>>> Thank you for the patch
>>
>> Please find the patch attached
>
> Thanks for the patch! Here are the review comments:
>
> +    OUT session_start timestamptz,
> +    OUT introduced timestamptz,
>
> The patch exposes these columns in pg_stat_statements view.
> These should be documented.
>
> I don't think that session_start should be exposed in every
> rows in pg_stat_statements because it's updated only when
> all statistics are reset, i.e., session_start of all entries
> in pg_stat_statements indicate the same.

Dunno.  I agree it'd be less query traffic and noise.  Maybe hidden
behind a UDF?  I thought "stats_reset" on pg_database may be prior
art, but realized that the statistics there differ depending on stats
resets per database (maybe a name change of 'session' to 'stats_reset'
would be useful to avoid too much in-cohesion, though).

I didn't want to bloat the taxonomy of exposed API/symbols too much
for pg_stat_statements, but perhaps in this instance it is reasonable.Also, isn't the interlock with the result set is
perhapsmore
 
precise/fine-grained with the current solution?  Yet, that's awfully
corner-casey.

I'm on the fence because the simplicity and precision of the current
regime for aggregation tools is nice, but avoiding the noise for
inspecting humans in the common case is also nice.  I don't see a
reason right now to go strongly either way, so if you feel moderately
strongly that the repetitive column should be stripped then I am happy
to relent there and help out.  Let me know of your detailed thoughts
(or modify the patch) with your idea.

> +    OUT query_id int8,
>
> query_id or queryid? I like the latter. Also the document
> uses the latter.

Not much opinion.  I prefer expending the _ character because most
compound words in postgres performance statistics catalogs seem to use
an underscore, though.



pgsql-hackers by date:

Previous
From: Kohei KaiGai
Date:
Subject: Re: Custom Plan node
Next
From: Amit Kapila
Date:
Subject: Re: Patch: FORCE_NULL option for copy COPY in CSV mode