Re: pg_stat_statements: calls under-estimation propagation - Mailing list pgsql-hackers

From Daniel Farina
Subject Re: pg_stat_statements: calls under-estimation propagation
Date
Msg-id CAAZKuFZweuupnQqRhW8pKPoPkUMEMcS85tgV=HksmvhScZE16w@mail.gmail.com
Whole thread Raw
In response to Re: pg_stat_statements: calls under-estimation propagation  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-hackers
On Thu, Oct 10, 2013 at 10:12 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Fri, Oct 11, 2013 at 12:19 AM, Daniel Farina <daniel@heroku.com> wrote:
>> Probably.
>>
>> The idea is that without those fields it's, to wit, impossible to
>> explain non-monotonic movement in metrics of those queries for precise
>> use in tools that insist on monotonicity of the fields, which are all
>> cumulative to date I think.
>>
>> pg_stat_statements_reset() or crashing loses the session, so one
>> expects "ncalls" may decrease.  In addition, a query that is bouncing
>> in and out of the hash table will have its statistics be lost, so its
>> statistics will also decrease.  This can cause un-resolvable artifact
>> in analysis tools.
>>
>> The two fields allow for precise understanding of when the statistics
>> for a given query_id are continuous since the last time a program
>> inspected it.
>
> Thanks for elaborating them! Since 'introduced' is reset even when
> the statistics is reset, maybe we can do without 'session_start' for
> that purpose?

There is a small loss of precision.

The original reason was that if one wanted to know, given two samples
of pg_stat_statements, when the query_id is going to remain stable for
a given query.

For example:

If the session changes on account of a reset, then it is known that
all query_ids one's external program is tracking are no longer going
to be updated, and the program can take account for the fact that the
same query text is going to have a new query_id.

Given the new idea of mixing in the point release number:

If the code is changed to instead mixing in the full version of
Postgres, as you suggested recently, this can probably be removed.
The caveat there is then the client is going to have to do something a
bit weird like ask for the point release and perhaps even compile
options of Postgres to know when the query_id is going to have a
different value for a given query text.  But, maybe this is an
acceptable compromise to remove one field.

>>> +            /*
>>> +             * The role calling this function is unable to see
>>> +             * sensitive aspects of this tuple.
>>> +             *
>>> +             * Nullify everything except the "insufficient privilege"
>>> +             * message for this entry
>>> +             */
>>> +            memset(nulls, 1, sizeof nulls);
>>> +
>>> +            nulls[i]  = 0;
>>> +            values[i] = CStringGetTextDatum("<insufficient privilege>");
>>>
>>> Why do we need to hide *all* the fields in pg_stat_statements, when
>>> it's accessed by improper user? This is a big change of pg_stat_statements
>>> behavior, and it might break the compatibility.
>>
>> It seems like an information leak that grows more major if query_id is
>> exposed and, at any point, one can determine the query_id for a query
>> text.
>
> So hiding only query and query_id is enough?

Yeah, I think so.  The other fields feel a bit weird to leave hanging
around as well, so I thought I'd just "fix" it in one shot, but doing
the minimum or only applying this idea to new fields is safer.  It's
shorter to hit all the fields, though, which is why I was tempted to
do that.

Perhaps not a good economy for potential subtle breaks in the next version.



pgsql-hackers by date:

Previous
From: Christopher Browne
Date:
Subject: Re: Auto-tuning work_mem and maintenance_work_mem
Next
From: Robert Haas
Date:
Subject: Re: Auto-tuning work_mem and maintenance_work_mem