Thread: Sample values for pg_stat_statements

Sample values for pg_stat_statements

From
Vik Fearing
Date:
Often when looking through pg_stat_statements, it would be nice to have
some sample values for the constants and parameters.  This patch
implements that by taking the values from the first execution of the
normalized query.

To keep things reasonable, there is a limit on how big the parameters
can be.

This patch is based off of 5303ffe71b.
-- 
Vik Fearing                                          +33 6 46 75 15 36
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Attachment

Re: Sample values for pg_stat_statements

From
Julian Markwort
Date:
Hi Vik,

this is my review of your patch. I hope I've ticked all the necessary
boxes.


Submission review:
Patch has context, applies cleanly, make and make check run
successfully, patch contains tests for the added functionality.
The patch doesn't seem to contain any documentation regarding the new
columns.

I think the patch could be shorter due to some not really necessary
whitespace changes, e.g. lines 414, ff. in the pgss.1.patch file.
Modifying the first test for '!entry' to '!entry || need_params' in
line 628 and adding another test for '!entry' later in the file leads
to many unneccessarily changed lines, because they are simply indented
one step further (Prominently noticeable with comments, eg. line 672-
678 and 733-739.
I'd like to see the check for 'need_params' have it's own block,
leaving the existing block largely intact.
This could happen after the original 'if(!entry)'' block, at which
point you can be sure that an entry already exists, so there is no need
for the duplicated code that stores params and consts in the query text
file and their references in the entry.


Usability review:
The patch fulfills it's goal. The new columns consist of arrays of text
as well as an array of regtypes. Unfortunately, this makes the
pg_stat_statements view even more cluttered and confusing. (The view
was very cluttered before your patch, the best solution is probably to
not 'SELECT * FROM pg_stat_statements;'...)

Regarding the security implications that I can think of, this patch
behaves in similar fashion to the rest of pg_stat_statements, showing
the consts, params, and param_types only to users with proper access
rights and if the showtext flag is set.


Feature test:
The feature works as advertised and does not seem to lead to any assert
failures or memory management errors.
Manual testing indicates that data is properly persisted through
database shutdowns and restarts.




If the intended purpose is to have some basic idea of the kinds of
values that are used with certain statements, I'd like to suggest that
you take a look at my own patch, which implements the tracking of good
and bad plans in pg_stat_statements, in the current commitfest. My
approach not only shows the values that where used when the statement
was executed for the first time (regarding the lifetime of the
pg_stat_statements tracked data), but it shows values of possibly more
current executions of the statements and offers the possibility to
identify values leading to very good or very poor performance.

Maybe we can combine our efforts; Here is one idea that came to my
mind:
- Store the parameters of a statement if the execution led to a new
slower or faster plan.
- Provide a function that allows users to take the jumbled query
expression and have the database explain it, based on the parameters
that were recorded previously.

Kind regards
Julian Markwort
Attachment

Re: Sample values for pg_stat_statements

From
Andres Freund
Date:
Hi,

On 2017-12-31 12:34:17 +0100, Vik Fearing wrote:
> Often when looking through pg_stat_statements, it would be nice to have
> some sample values for the constants and parameters.  This patch
> implements that by taking the values from the first execution of the
> normalized query.
> 
> To keep things reasonable, there is a limit on how big the parameters
> can be.

Hm. Isn't this going to blow up the size of the file in cases with a
number of parameters quite considerably, a file limit notwithstanding?
Wonder if the size limit wouldn't have to be across all params.

I'm also pretty sure that not everyone will be happy, for privacy / data
minimalism reasons, that some bind parameters are suddenly preserved?
That could very well include passwords and whatnot!  I don't think we
can have the same view as done already show these, without causing
issues for users that do want to grant wider access to pgss, but not
show bind parameters for everyone.

Greetings,

Andres Freund


Re: Sample values for pg_stat_statements

From
Vik Fearing
Date:
On 03/01/2018 07:26 PM, Andres Freund wrote:
> Hm. Isn't this going to blow up the size of the file in cases with a
> number of parameters quite considerably, a file limit notwithstanding?
> Wonder if the size limit wouldn't have to be across all params.

It is across all params (per queryid).
-- 
Vik Fearing                                          +33 6 46 75 15 36
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Re: Sample values for pg_stat_statements

From
Tomas Vondra
Date:
Hi,

I've looked at this patch today. I like the idea / intent in general, as
it helps with some investigation tasks. That being said, I have a couple
of questions/comments based on read through the patch:


1) I see you've renamed the .sql script from 1.4 to 1.6. I thought we've
abandoned that approach some time ago, and are now only doing the
upgrade scripts. That is, keep 1.4 script and then 1.4--1.5 and 1.5-1.6.
That's how other extensions are doing it now, at least - see btree_gin
for example. But maybe pg_stat_statements has to do it this way for some
reason, not sure?

2) The patch should have updated doc/src/sgml/pgstatstatements.sgml

3) Do we really need both collect_consts and collect_params? I can't
really imagine wanting to set only one of those options. In any case,
the names seem unnecessarily abbreviated - just use collect_constants
and collect_parameters.

4) The width_threshold GUC name seems rather weird. I mean, I wouldn't
use "threshold" in this context, and it's really unclear size of what is
it referring to. We do have a precedent, though, as pg_stat_activity has
track_activity_query_size, so I suggest using either parameters_size or
max_parameters_size (prefixed by "pg_stat_statements." of course).

5) I don't quite see why keeping the first set of parameter values we
happen to see would be particularly useful. For example, I'm way more
interested in values for the longest execution - why not to keep those?

6) I suggest to use the same naming style as the existing functions, so
for example CollectParams should be pgss_CollectParams (and it's missing
a comment too).

7) There are a couple of places where the code style violates project
rules, e.g. by placing {} around a single command in if-statement.

8) I see Andres mentioned possible privacy issues (not quite sure what
is 'data minimalism', mentioned by Andres). I'm not sure it's a problem,
considering it can be disabled and it's subject to the usual role check
(susperuser/role_read_all_stats). Unfortunately we can't use the same
approach as pg_stat_activity (only showing data for user's own queries).
Well, we could keep per-user samples, but that might considerably
inflate the file size.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Sample values for pg_stat_statements

From
Greg Stark
Date:
I've often wanted something similar. But I've struggled to come up
with a good way to decide which parameters to keep. And as someone
mentioned, there's the question of how to deal with very large
constants.

The other day I was poking around with pg_stat_statements and jsonlog
and I thought of another way to tackle this problem that I think may
be a better approach. If we logged the queryid in slow error messages
and slow query logs that would let you deal with larger data and also
keep more history without burdening the live system.

If the queryid was a column in the CSV logs (or a field in json logs,
etc) then you people who load their logs into a database for handling
would be able to index that column and quickly look up example
queries, sort them by time taken, or analyze them in other ways. Using
jsonlog you could do the same thing in Elasticsearch/Kibana.

I tried to hack this together quickly but it's actually a bit of a
pain for mundane reasons. Our current slow query logs are actually
slow *statement* logs which makes it a bit of an impedance mismatch
with pg_stat_statements which works per planned query. I think the
solution to this would be to drop the slow statement logs and have
pg_stat_statements log slow queries directly in the ExecutorEnd hook.

It would be nice to have the queryid be accessible for other logs as
well like debug_query_str is. I'm not sure the right way to do that
though. I tried just peeking in ActivePortal->plannedstmt but that's
not always set (and in particular is not set at the point slow
statements are logged). And it was causing crashes, presumably
ActivePortal is left uninitialized in some background worker that
doesn't need it?


Re: Re: Sample values for pg_stat_statements

From
David Steele
Date:
Hi Vik,

On 3/10/18 9:02 AM, Tomas Vondra wrote:
> 
> I've looked at this patch today. I like the idea / intent in general, as
> it helps with some investigation tasks. That being said, I have a couple
> of questions/comments based on read through the patch:

It looks like there are some privacy concerns with this patch as well as
the suggestions in the review from Tomas.

Will you be providing an updated patch for this CF?  If not, I think we
should mark the entry Returned with Feedback for now and let you
resubmit when you have an updated patch.

Regards,
-- 
-David
david@pgmasters.net


Re: Sample values for pg_stat_statements

From
legrand legrand
Date:
+1

If pgss had a PlanId column (just after QueryId), that would be wonderfull
;o)

Question: Is there a simple way to "un-normalize" the query (I mean rebuild
the original query as it was before normalization) ? 

Regards
PAscal



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


Re: Sample values for pg_stat_statements

From
David Steele
Date:
On 3/21/18 1:31 PM, David Steele wrote:
> 
> On 3/10/18 9:02 AM, Tomas Vondra wrote:
>>
>> I've looked at this patch today. I like the idea / intent in general, as
>> it helps with some investigation tasks. That being said, I have a couple
>> of questions/comments based on read through the patch:
> 
> It looks like there are some privacy concerns with this patch as well as
> the suggestions in the review from Tomas.
> 
> Will you be providing an updated patch for this CF?  If not, I think we
> should mark the entry Returned with Feedback for now and let you
> resubmit when you have an updated patch.

Since it doesn't appear that a new patch will be ready for this CF I
have marked this entry Returned with Feedback.

Regards,
-- 
-David
david@pgmasters.net


Re: Sample values for pg_stat_statements

From
Vik Fearing
Date:
On 03/10/2018 03:02 PM, Tomas Vondra wrote:
> Hi,
> 
> I've looked at this patch today. I like the idea / intent in general, as
> it helps with some investigation tasks. That being said, I have a couple
> of questions/comments based on read through the patch:

Thanks!  Attached is a patch addressing your concerns.

> 1) I see you've renamed the .sql script from 1.4 to 1.6. I thought we've
> abandoned that approach some time ago, and are now only doing the
> upgrade scripts. That is, keep 1.4 script and then 1.4--1.5 and 1.5-1.6.
> That's how other extensions are doing it now, at least - see btree_gin
> for example. But maybe pg_stat_statements has to do it this way for some
> reason, not sure?

I wrote this over a year ago.  I have now changed it to conform to
modern style.

> 2) The patch should have updated doc/src/sgml/pgstatstatements.sgml

It did, but somehow missed inclusion in the patch.  That's fixed, too.

> 3) Do we really need both collect_consts and collect_params? I can't
> really imagine wanting to set only one of those options. In any case,
> the names seem unnecessarily abbreviated - just use collect_constants
> and collect_parameters.

I can't see wanting parameters without constants, but I can see wanting
constants without parameters, so I think the two are justified.
Abbreviations removed.

> 4) The width_threshold GUC name seems rather weird. I mean, I wouldn't
> use "threshold" in this context, and it's really unclear size of what is
> it referring to. We do have a precedent, though, as pg_stat_activity has
> track_activity_query_size, so I suggest using either parameters_size or
> max_parameters_size (prefixed by "pg_stat_statements." of course).

Fixed.

> 5) I don't quite see why keeping the first set of parameter values we
> happen to see would be particularly useful. For example, I'm way more
> interested in values for the longest execution - why not to keep those?

For one, it's much harder to keep those because by the time you know how
long it took, you've lost the values and would have to re-jumble.  At
least, that's the impression I got while writing the patch.  If I am
mistaken, I will gladly look at it more.

I think having a random example is still quite valuable, and the longest
version is quite likely in the logs, too.

> 6) I suggest to use the same naming style as the existing functions, so
> for example CollectParams should be pgss_CollectParams (and it's missing
> a comment too).

Fixed.

> 7) There are a couple of places where the code style violates project
> rules, e.g. by placing {} around a single command in if-statement.

I'm pretty sure there are places where we do that to distinguish the if
code from the if condition but I couldn't readily find one.  I do find
examples of using braces when there is also a comment in there, so I
just moved the comment.

> 8) I see Andres mentioned possible privacy issues (not quite sure what
> is 'data minimalism', mentioned by Andres). I'm not sure it's a problem,
> considering it can be disabled and it's subject to the usual role check
> (susperuser/role_read_all_stats). Unfortunately we can't use the same
> approach as pg_stat_activity (only showing data for user's own queries).
> Well, we could keep per-user samples, but that might considerably
> inflate the file size.

I think I handle that well enough with permission checking, but I'm open
to more debate on it.
-- 
Vik Fearing                                          +33 6 46 75 15 36
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Attachment

Re: Sample values for pg_stat_statements

From
Daniel Gustafsson
Date:
> On 17 Apr 2018, at 17:58, Vik Fearing <vik.fearing@2ndquadrant.com> wrote:

> Thanks!  Attached is a patch addressing your concerns.

I happened to notice that this patch was moved from Returned with Feedback to
Needs Review after the CF closed, which means it’s now sitting open in a closed
CF.  The intended flow after RWF is that the patch is resubmitted, rather than
status changed, but since it’s there now I guess we might as well move it to
the current CF to make sure it’s not overlooked (which it will be in the
current state)?

cheers ./daniel

Re: Sample values for pg_stat_statements

From
Michael Paquier
Date:
On Thu, Jun 21, 2018 at 12:28:01PM +0200, Daniel Gustafsson wrote:
> I happened to notice that this patch was moved from Returned with Feedback to
> Needs Review after the CF closed, which means it’s now sitting open in a closed
> CF.  The intended flow after RWF is that the patch is resubmitted, rather than
> status changed, but since it’s there now I guess we might as well move it to
> the current CF to make sure it’s not overlooked (which it will be in the
> current state)?

Indeed, done.
--
Michael

Attachment

Re: Sample values for pg_stat_statements

From
Michael Paquier
Date:
On Tue, Apr 17, 2018 at 05:58:44PM +0200, Vik Fearing wrote:
> I think I handle that well enough with permission checking, but I'm open
> to more debate on it.

The recent version bump in pg_stat_statements (Sorry my fault!) is
causing this patch to not apply anymore.  I have moved it to next CF
with waiting on author as status.
--
Michael

Attachment

Re: Sample values for pg_stat_statements

From
Dmitry Dolgov
Date:
> On Mon, Oct 1, 2018 at 8:30 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Apr 17, 2018 at 05:58:44PM +0200, Vik Fearing wrote:
> > I think I handle that well enough with permission checking, but I'm open
> > to more debate on it.
>
> The recent version bump in pg_stat_statements (Sorry my fault!) is
> causing this patch to not apply anymore.  I have moved it to next CF
> with waiting on author as status.

Unfortunately, the patch is still needs to be rebased.


Re: Sample values for pg_stat_statements

From
Andres Freund
Date:
On 2018-11-29 16:58:05 +0100, Dmitry Dolgov wrote:
> > On Mon, Oct 1, 2018 at 8:30 AM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > On Tue, Apr 17, 2018 at 05:58:44PM +0200, Vik Fearing wrote:
> > > I think I handle that well enough with permission checking, but I'm open
> > > to more debate on it.
> >
> > The recent version bump in pg_stat_statements (Sorry my fault!) is
> > causing this patch to not apply anymore.  I have moved it to next CF
> > with waiting on author as status.
> 
> Unfortunately, the patch is still needs to be rebased.

As nothing has happened since, I'm marking this as returned with
feedback.

Greetings,

Andres Freund