Re: pg_stat_statements: calls under-estimation propagation - Mailing list pgsql-hackers

From Sameer Thakur
Subject Re: pg_stat_statements: calls under-estimation propagation
Date
Msg-id CABzZFEuxkqd7_5HrEVYA1TuUrkX_iYqNJqo6EJmraUsNQ6yvGw@mail.gmail.com
Whole thread Raw
In response to Re: pg_stat_statements: calls under-estimation propagation  (Daniel Farina <daniel@heroku.com>)
Responses Re: pg_stat_statements: calls under-estimation propagation  (Sameer Thakur <samthakur74@gmail.com>)
List pgsql-hackers
On Sat, Oct 5, 2013 at 1:38 PM, Daniel Farina <daniel@heroku.com> wrote:
> On Fri, Oct 4, 2013 at 7:22 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Thu, Oct 3, 2013 at 5:11 PM, Sameer Thakur <samthakur74@gmail.com> wrote:
>>> On Wed, Oct 2, 2013 at 6:40 PM, Sameer Thakur <samthakur74@gmail.com> wrote:
>>>>>
>>>>> Looks pretty good. Do you want to package up the patch with your
>>>>> change and do the honors and re-submit it? Thanks for helping out so
>>>>> much!
>>>> Sure, will do. Need to add a bit of documentation explaining
>>>> statistics session as well.
>>>> I did some more basic testing around pg_stat_statements.max, now that
>>>> we have clarity from Peter about its value being legitimate below 100.
>>>> Seems to work fine, with pg_stat_statements =4 the max unique queries
>>>> in the view are 4. On the 5th query the view holds just the latest
>>>> unique query discarding the previous 4. Fujii had reported a
>>>> segmentation fault in this scenario.
>>>> Thank you for the patch
>>>
>>> Please find the patch attached
>>
>> Thanks for the patch! Here are the review comments:
>>
>> +    OUT session_start timestamptz,
>> +    OUT introduced timestamptz,
>>
>> The patch exposes these columns in pg_stat_statements view.
>> These should be documented.
>>
>> I don't think that session_start should be exposed in every
>> rows in pg_stat_statements because it's updated only when
>> all statistics are reset, i.e., session_start of all entries
>> in pg_stat_statements indicate the same.
>
> Dunno.  I agree it'd be less query traffic and noise.  Maybe hidden
> behind a UDF?  I thought "stats_reset" on pg_database may be prior
> art, but realized that the statistics there differ depending on stats
> resets per database (maybe a name change of 'session' to 'stats_reset'
> would be useful to avoid too much in-cohesion, though).
>
> I didn't want to bloat the taxonomy of exposed API/symbols too much
> for pg_stat_statements, but perhaps in this instance it is reasonable.
>  Also, isn't the interlock with the result set is perhaps more
> precise/fine-grained with the current solution?  Yet, that's awfully
> corner-casey.
>
> I'm on the fence because the simplicity and precision of the current
> regime for aggregation tools is nice, but avoiding the noise for
> inspecting humans in the common case is also nice.  I don't see a
> reason right now to go strongly either way, so if you feel moderately
> strongly that the repetitive column should be stripped then I am happy
> to relent there and help out.  Let me know of your detailed thoughts
> (or modify the patch) with your idea.
>

Thinking a bit more, if its just a question of a repeating value we
have the same situation for userid and dbid. They would be the same
for a user across multiple queries in same statistics session. So
userid,dbid and session_start do repeat across rows. Not sure why
treatment for session_start be different. I also checked pg_stat_plans
@ https://github.com/2ndQuadrant/pg_stat_plans and did not see any
special treatment given for a particular field in terms of access i.e.
the granularity of api wrt pg_stat_statements has been maintained.

regards
Sameer



pgsql-hackers by date:

Previous
From: Oskari Saarenmaa
Date:
Subject: Re: [PATCH] pg_upgrade: support for btrfs copy-on-write clones
Next
From: Tomas Vondra
Date:
Subject: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption