Thread: stored procedures vs pg_stat_statements

stored procedures vs pg_stat_statements

From

Merlin Moncure

Date:

24 December 2024, 00:31:32

Hello,

I've noticed that in pg_stat_statements, stored procedures are almost always given a unique query id unless the query is textually identical. pg_stat_statements.c makes this pretty clear as procedures are considered utility statements and no normalization analysis is done against them. Client side invoked statements appear to be able to be parameterized so that they might be normalized correctly, but AFAICT there is no way to do this from the server or any non-parameterizing client (say, dblink).

Suffice it to say, pg_stat_statements is an administrator's dream and heavily procedure wrapped databases might struggle to generate useful statistics leading to lack of insight.

From pg_stat_statements.c in ProcessUtility():

/*
* Force utility statements to get queryId zero. We do this even in cases
* where the statement contains an optimizable statement for which a
* queryId could be derived (such as EXPLAIN or DECLARE CURSOR). For such
* cases, runtime control will first go through ProcessUtility and then
* the executor, and we don't want the executor hooks to do anything,
* since we are already measuring the statement's costs at the utility
* level.
*
* Note that this is only done if pg_stat_statements is enabled and
* configured to track utility statements, in the unlikely possibility
* that user configured another extension to handle utility statements
* only.
*/

Can this simply be disabled for stored procedures as a special case? I'm hoping this might do something useful that is also safe. Curious if anyone has any thoughts on this.

merlin

Re: stored procedures vs pg_stat_statements

From

Tom Lane

Date:

24 December 2024, 00:58:13

Merlin Moncure <mmoncure@gmail.com> writes:
> Can this simply be disabled for stored procedures as a special case?  I'm
> hoping this might do something useful that is also safe.  Curious if anyone
> has any thoughts on this.

No, I don't think that would help.  The restriction on utility
statements would cover CREATE PROCEDURE/FUNCTION, not calls
of those things which is what I suppose you care about.

Do you have pg_stat_statements.track set to "all"?  That should
allow statements within stored procedures to be tracked, which
again is what I'm guessing you care about.

            regards, tom lane

Re: stored procedures vs pg_stat_statements

From

Merlin Moncure

Date:

24 December 2024, 07:06:58

On Mon, Dec 23, 2024 at 3:58 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Merlin Moncure <mmoncure@gmail.com> writes:
> Can this simply be disabled for stored procedures as a special case? I'm
> hoping this might do something useful that is also safe. Curious if anyone
> has any thoughts on this.

No, I don't think that would help. The restriction on utility
statements would cover CREATE PROCEDURE/FUNCTION, not calls
of those things which is what I suppose you care about.

Do you have pg_stat_statements.track set to "all"? That should
allow statements within stored procedures to be tracked, which
again is what I'm guessing you care about.

I'm aware of that and will set it -- it's the only option if I'm following you. The way I've been doing things lately for bulk processing is a lot of orchestrated procedures that are organized for purposes of monitoring and easy administration, and telemetry would tend to be at that level, but more granular tracking will get the job done and ought to be perfectly fine as long as overhead is reasonable.

Mainly, I was curious if the behavior not to parse constants out of stored procedure invocations was an unintentional artifact of the utility statement approach. I guess it might be, but also that there is nothing to solve here. Thanks for taking the time to respond.

merlin

Re: stored procedures vs pg_stat_statements

From

Merlin Moncure

Date:

24 December 2024, 17:23:57

On Mon, Dec 23, 2024 at 11:01 PM Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Dec 23, 2024 at 10:06:58PM -0600, Merlin Moncure wrote:
> I'm aware of that and will set it -- it's the only option if I'm following
> you. The way I've been doing things lately for bulk processing is a lot
> of orchestrated procedures that are organized for purposes of monitoring
> and easy administration, and telemetry would tend to be at that level, but
> more granular tracking will get the job done and ought to be perfectly fine
> as long as overhead is reasonable.
>
> Mainly, I was curious if the behavior not to parse constants out of stored
> procedure invocations was an unintentional artifact of the
> utility statement approach. I guess it might be, but also that there is
> nothing to solve here.

Hmm. So you mean that a combination of track_utility = off and track
= 'all' leads to the internals of CALL to not be normalized on first
call, while if we have track_utility = on and track = 'all' then the
internals of CALL are normalized. The behavior is different depending
on if the procedure is cached or not, as well, the second call
following the traces of the first call and we ignore the combinations
of the two GUCs. This kind of inconsistency is what I would call a
bug. I'm pretty sure that it is saner to say that we should apply
normalization all the time if we can, not avoid normalization in some
cases like the one you are pointing at.

Actually, I hadn't gotten that far yet; I was just noticing that:

CALL foo(1,2,3);

CALL foo(2,3,4);

...resolved to different queryids and if that was expected, and if not, if some logic tune-ups in the extension improve behavior without deeper changes.

With client side preparation, you can work around this (e.g. CALL foo($1, $2)) but it's impossible from any non-paramaterizing client or the server (for top level) since explicit prepared statements are not allowed for stored procedures -- that's pretty limiting IMNSHO.

If there are issues with the non-top level approach -- that makes it worse. For my part, I'm going to tweak the extension to see if there's any relief there. Thanks for taking a look.

merlin