Thread: pg_stat_statements

pg_stat_statements

From
"Dirschel, Steve"
Date:

Hello,

 

I’m not sure if this is the correct distribution list for this type of question but I’ll try anyways.  We have an app that uses multiple schemas.  It will do a set schema ‘schema_name’ and execute queries.  The queries executed are the same regardless of the schema the connection set.

 

In pg_stat_statements the exact same query will get a different queryid for each schema that executes the query. 

 

I’m unable to determine which queryid comes from which schema the query was executed under.  Is anyone aware of a way to determine this?

 

Thanks in advance.

 

Steve

This e-mail is for the sole use of the intended recipient and contains information that may be privileged and/or confidential. If you are not an intended recipient, please notify the sender by return e-mail and delete this e-mail and any attachments. Certain required legal entity disclosures can be accessed on our website: https://www.thomsonreuters.com/en/resources/disclosures.html

Re: pg_stat_statements

From
Julien Rouhaud
Date:
Hi,

On Tue, Jan 11, 2022 at 03:04:14PM +0000, Dirschel, Steve wrote:
>
> I'm not sure if this is the correct distribution list for this type of
> question but I'll try anyways.  We have an app that uses multiple schemas.
> It will do a set schema 'schema_name' and execute queries.  The queries
> executed are the same regardless of the schema the connection set.
>
> In pg_stat_statements the exact same query will get a different queryid for
> each schema that executes the query.
>
> I'm unable to determine which queryid comes from which schema the query was
> executed under.  Is anyone aware of a way to determine this?
>

Unfortunately this is a known limitation.

There were some previous discussions (e.g. [1] and [2] more recently), but I
don't think there was a real consensus on how to solve that problem.

Storing a query text with fully qualified names (either the current query or a
new field) is not practical for performance purpose, but there were no
objections to storing additional information, like the active search_path when
the entry was created.  But as noted, while it technically gives the
information you need it's far from being convenient to use, which is probably
why it was never implemented.

[1] https://www.postgresql.org/message-id/flat/8f54c609-17c6-945b-fe13-8b07c0866420%40dalibo.com
[2] https://www.postgresql.org/message-id/flat/9baf5c06-d6ab-c688-010c-843348e3d98c%40gmail.com



Re: pg_stat_statements

From
Simon Riggs
Date:
On Wed, 12 Jan 2022 at 03:03, Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> Hi,
>
> On Tue, Jan 11, 2022 at 03:04:14PM +0000, Dirschel, Steve wrote:
> >
> > I'm not sure if this is the correct distribution list for this type of
> > question but I'll try anyways.  We have an app that uses multiple schemas.
> > It will do a set schema 'schema_name' and execute queries.  The queries
> > executed are the same regardless of the schema the connection set.
> >
> > In pg_stat_statements the exact same query will get a different queryid for
> > each schema that executes the query.
> >
> > I'm unable to determine which queryid comes from which schema the query was
> > executed under.  Is anyone aware of a way to determine this?
> >
>
> Unfortunately this is a known limitation.

I see this as a beneficial feature.

If the same SQL is executed against different sets of tables, each
with different indexes, probably different data, the performance could
vary dramatically and might need different tuning on each. So having
separate rows in the pg_stat_statements output makes sense.

> There were some previous discussions (e.g. [1] and [2] more recently), but I
> don't think there was a real consensus on how to solve that problem.

To differentiate, run each schema using a different user, so you can
tell them apart.

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: pg_stat_statements

From
Julien Rouhaud
Date:
On Wed, Jan 12, 2022 at 10:22:38AM +0000, Simon Riggs wrote:
> On Wed, 12 Jan 2022 at 03:03, Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > Unfortunately this is a known limitation.
> 
> I see this as a beneficial feature.
> 
> If the same SQL is executed against different sets of tables, each
> with different indexes, probably different data, the performance could
> vary dramatically and might need different tuning on each. So having
> separate rows in the pg_stat_statements output makes sense.

Yes, having different rows seems like a good thing.  But being unable to tell
which row apply to which schema is *not* a good thing.

> > There were some previous discussions (e.g. [1] and [2] more recently), but I
> > don't think there was a real consensus on how to solve that problem.
> 
> To differentiate, run each schema using a different user, so you can
> tell them apart.

This isn't always possible.  For instance, once you reach enough schema it will
be problematic to do proper pooling.



Re: pg_stat_statements

From
Simon Riggs
Date:
On Wed, 12 Jan 2022 at 10:31, Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Wed, Jan 12, 2022 at 10:22:38AM +0000, Simon Riggs wrote:
> > On Wed, 12 Jan 2022 at 03:03, Julien Rouhaud <rjuju123@gmail.com> wrote:
> > >
> > > Unfortunately this is a known limitation.
> >
> > I see this as a beneficial feature.
> >
> > If the same SQL is executed against different sets of tables, each
> > with different indexes, probably different data, the performance could
> > vary dramatically and might need different tuning on each. So having
> > separate rows in the pg_stat_statements output makes sense.
>
> Yes, having different rows seems like a good thing.  But being unable to tell
> which row apply to which schema is *not* a good thing.
>
> > > There were some previous discussions (e.g. [1] and [2] more recently), but I
> > > don't think there was a real consensus on how to solve that problem.
> >
> > To differentiate, run each schema using a different user, so you can
> > tell them apart.
>
> This isn't always possible.  For instance, once you reach enough schema it will
> be problematic to do proper pooling.

True, perhaps we should fix SET SESSION AUTHORIZATION to be allowed by
non-superusers. Then set the user and search_path at same time.

I was going to suggest adding a comment to the front of each SQL that
contains the schema, but that doesn't work either (and looks like a
bug, but how normalization works is not documented).

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: pg_stat_statements

From
Simon Riggs
Date:
On Wed, 12 Jan 2022 at 13:17, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
>
> On Wed, 12 Jan 2022 at 10:31, Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > On Wed, Jan 12, 2022 at 10:22:38AM +0000, Simon Riggs wrote:
> > > On Wed, 12 Jan 2022 at 03:03, Julien Rouhaud <rjuju123@gmail.com> wrote:
> > > >
> > > > Unfortunately this is a known limitation.
> > >
> > > I see this as a beneficial feature.
> > >
> > > If the same SQL is executed against different sets of tables, each
> > > with different indexes, probably different data, the performance could
> > > vary dramatically and might need different tuning on each. So having
> > > separate rows in the pg_stat_statements output makes sense.
> >
> > Yes, having different rows seems like a good thing.  But being unable to tell
> > which row apply to which schema is *not* a good thing.
> >
> > > > There were some previous discussions (e.g. [1] and [2] more recently), but I
> > > > don't think there was a real consensus on how to solve that problem.
> > >
> > > To differentiate, run each schema using a different user, so you can
> > > tell them apart.
> >
> > This isn't always possible.  For instance, once you reach enough schema it will
> > be problematic to do proper pooling.
>
> True, perhaps we should fix SET SESSION AUTHORIZATION to be allowed by
> non-superusers. Then set the user and search_path at same time.

But SET ROLE works.

-- 
Simon Riggs                http://www.EnterpriseDB.com/