Thread: Display individual query in pg_stat_activity

Display individual query in pg_stat_activity

From

"Drouvot, Bertrand"

Date:

27 July 2020, 07:36:59

Hi hackers,

I've attached a patch to display individual query in the pg_stat_activity query field when multiple SQL statements are currently displayed.

Motivation:

When multiple statements are displayed then we don’t know which one is currently running.

For example:

psql -c "select pg_sleep(10);select pg_sleep(20);" is currently displayed as:

postgres=# select backend_type,query from pg_stat_activity;         backend_type         |                      query
------------------------------+-------------------------------------------------- client backend               | select pg_sleep(10);select pg_sleep(20);

Showing which statement is currently being executed would be more helpful.

Technical context and proposal:

There is 2 points in this patch:

modifying the current behavior in “exec_simple_query”
modifying the current behavior in “ExecInitParallelPlan”

So that we could see for example:

         backend_type         |                      query
------------------------------+-------------------------------------------------- client backend               | select pg_sleep(10);

and then

         backend_type         |                      query
------------------------------+-------------------------------------------------- client backend               | select pg_sleep(20);

instead of the multiple sql statement described in the “motivation” section.

Another example: parallel worker being triggered while executing a function:

create or replace function test()
returns void as $$select count() as "first" from foo;select pg_sleep(10);select count() as "second" from foo;select pg_sleep(11);select pg_sleep(10)
$$
language sql;

We currently see:

         backend_type         |                                                                query
------------------------------+-------------------------------------------------------------------------------------------------------------------------------------- client backend               | select test(); parallel worker              | select count(*) as "first" from foo;select pg_sleep(10);select count(*) as "second" from foo;select pg_sleep(11);select pg_sleep(10)+                              | parallel worker              | select count(*) as "first" from foo;select pg_sleep(10);select count(*) as "second" from foo;select pg_sleep(11);select pg_sleep(10)+                              |

while the attached patch would provide:

          backend_type         |                      query                                                                                                                                                                                                                                                  [217/1938]
------------------------------+-------------------------------------------------- client backend               | select test(); parallel worker              | select count(*) as "first" from foo; parallel worker              | select count(*) as "first" from foo;

and then:

         backend_type         |                      query
------------------------------+-------------------------------------------------- client backend               | select test(); parallel worker              | select count(*) as "second" from foo; parallel worker              | select count(*) as "second" from foo;

I will add this patch to the next commitfest. I look forward to your feedback about the idea and/or implementation.

Regards,
Bertrand

Attachment

v1-0001-pg_stat_activity_individual_query.patch

Re: Display individual query in pg_stat_activity

From

Dave Page

Date:

27 July 2020, 14:57:10

On Mon, Jul 27, 2020 at 3:40 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

Hi hackers,

I've attached a patch to display individual query in the pg_stat_activity query field when multiple SQL statements are currently displayed.

Motivation:

When multiple statements are displayed then we don’t know which one is currently running.

I'm not sure I'd want that to happen, as it could make it much harder to track the activity back to a query in the application layer or server logs.

Perhaps a separate field could be added for the current statement, or a value to indicate what the current statement number in the query is?

Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: http://www.enterprisedb.com

Re: Display individual query in pg_stat_activity

From

Jeremy Schneider

Date:

27 July 2020, 15:28:07

On 7/27/20 07:57, Dave Page wrote:

I'm not sure I'd want that to happen, as it could make it much harder to track the activity back to a query in the application layer or server logs.

Perhaps a separate field could be added for the current statement, or a value to indicate what the current statement number in the query is?

Might be helpful to give some specifics about circumstances where strings can appear in pg_stat_activity.query with multiple statements.

1) First of all, IIUC multiple statements are only supported in the first place by the simple protocol and PLs. Anyone using parameterized statements (bind variables) should be unaffected by this.

2) My read of the official pg JDBC driver is that even for batch operations it currently iterates and sends each statement individually. I don't think the JDBC driver has the capability to send multiple statements, so java apps using this driver should be unaffected.

3) psql -c will always send the string as a single "simple protocol" request. Scripts will be impacted.

4) PLs also seem to have a code path that can put multiple statements in pg_stat_activity when parallel slaves are launched. PL code will be impacted.

5) pgAdmin uses the simple protocol and when a user executes a block of statements, pgAdmin seems to send the whole block as a single "simple protocol" request. Tools like pgAdmin will be impacted.

At the application layer, it doesn't seem problematic to me if PostgreSQL reports each query one at a time. IMO most people will find this to be a more useful behavior and they will still find their queries in their app code or app logs.

However at the PostgreSQL logging layer this is a good call-out. I just did a quick test on 14devel to double-check my assumption and it does seem that PostgreSQL logs the entire combined query for psql -c. I think it would be better for PostgreSQL to report queries individually in the log too - for example pgBadger summaries will be even more useful if they report information for each individual query rather than a single big block of multiple queries.

Given how small this patch is, it seems worthwhile to at least investigate whether the logging component could be addressed just as easily.

-Jeremy

-- 
Jeremy Schneider
Database Engineer
Amazon Web Services

Re: Display individual query in pg_stat_activity

From

Dave Page

Date:

27 July 2020, 16:00:17

On Mon, Jul 27, 2020 at 4:28 PM Jeremy Schneider <schnjere@amazon.com> wrote:

On 7/27/20 07:57, Dave Page wrote:
I'm not sure I'd want that to happen, as it could make it much harder to track the activity back to a query in the application layer or server logs.

Perhaps a separate field could be added for the current statement, or a value to indicate what the current statement number in the query is?

Might be helpful to give some specifics about circumstances where strings can appear in pg_stat_activity.query with multiple statements.

1) First of all, IIUC multiple statements are only supported in the first place by the simple protocol and PLs. Anyone using parameterized statements (bind variables) should be unaffected by this.

2) My read of the official pg JDBC driver is that even for batch operations it currently iterates and sends each statement individually. I don't think the JDBC driver has the capability to send multiple statements, so java apps using this driver should be unaffected.

That is just one of a number of different popular drivers of course.

3) psql -c will always send the string as a single "simple protocol" request. Scripts will be impacted.

4) PLs also seem to have a code path that can put multiple statements in pg_stat_activity when parallel slaves are launched. PL code will be impacted.

5) pgAdmin uses the simple protocol and when a user executes a block of statements, pgAdmin seems to send the whole block as a single "simple protocol" request. Tools like pgAdmin will be impacted.

It does. It also prepends some queries with comments, specifically to allow users to filter them out when they're analysing logs (a feature requested by users, not just something we thought was a good idea). I'm assuming that this patch would also strip those?

At the application layer, it doesn't seem problematic to me if PostgreSQL reports each query one at a time. IMO most people will find this to be a more useful behavior and they will still find their queries in their app code or app logs.

I think there are arguments to be made for both approaches.

However at the PostgreSQL logging layer this is a good call-out. I just did a quick test on 14devel to double-check my assumption and it does seem that PostgreSQL logs the entire combined query for psql -c. I think it would be better for PostgreSQL to report queries individually in the log too - for example pgBadger summaries will be even more useful if they report information for each individual query rather than a single big block of multiple queries.

Given how small this patch is, it seems worthwhile to at least investigate whether the logging component could be addressed just as easily.

-Jeremy
-- 
Jeremy Schneider
Database Engineer
Amazon Web Services

Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: http://www.enterprisedb.com

Re: Display individual query in pg_stat_activity

From

Jim Nasby

Date:

29 July 2020, 19:24:45

On 7/27/20 9:57 AM, Dave Page wrote:

On Mon, Jul 27, 2020 at 3:40 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

<snip>

When multiple statements are displayed then we don’t know which one is currently running.

I'm not sure I'd want that to happen, as it could make it much harder to track the activity back to a query in the application layer or server logs.

Perhaps a separate field could be added for the current statement, or a value to indicate what the current statement number in the query is?

Perhaps turn query into text[]. That would make it easy to concatenate back together if desired.

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: http://www.enterprisedb.com

Re: Display individual query in pg_stat_activity

From

"Drouvot, Bertrand"

Date:

06 August 2020, 10:10:47

Hi,

On 7/27/20 4:57 PM, Dave Page wrote:

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

Hi

On Mon, Jul 27, 2020 at 3:40 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
Hi hackers,

I've attached a patch to display individual query in the pg_stat_activity query field when multiple SQL statements are currently displayed.

Motivation:

When multiple statements are displayed then we don’t know which one is currently running.

I'm not sure I'd want that to happen, as it could make it much harder to track the activity back to a query in the application layer or server logs.

Perhaps a separate field could be added for the current statement, or a value to indicate what the current statement number in the query is?

Thanks for he feedback.

I like the idea of adding extra information without changing the current behavior.

A value to indicate what the current statement number is, would need parsing the query field by the user to get the individual statement.

I think the separate field makes sense (though it come with an extra memory price) as it will not change the existing behavior and would just provide extra information (without any extra parsing needed for the user).

I attached a mock up v2 patch that adds this new field.

Outcome Examples:

backend_type | query | individual_query
----------------+---------------------------------------------------------------------------------------------+----------------------
client backend | select backend_type, query, individual_query from pg_stat_activity where length(query) > 0; |
client backend | select pg_sleep(10);select pg_sleep(20); | select pg_sleep(20);

backend_type   |                                                                                              query                                                                                               |           individual_query
-----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------
client backend | select backend_type, query, individual_query from pg_stat_activity where length(query) > 0;                                                                                                      |
client backend | select test();                                                                                                                                                                                   |
parallel worker | select count(*) as "first" from foo;select pg_sleep(10);create index bdtidx on foo(generate_series);select count(*) as "second" from foo;select pg_sleep(11);select count(*) as "third" from foo | select count(*) as "second" from foo;
parallel worker | select count(*) as "first" from foo;select pg_sleep(10);create index bdtidx on foo(generate_series);select count(*) as "second" from foo;select pg_sleep(11);select count(*) as "third" from foo | select count(*) as "second" from foo;

As you can see the individual_query field is populated only when the query field is a multiple statements one.

Regards,

Bertrand

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: http://www.enterprisedb.com

Attachment

v2-0001-pg_stat_activity_individual_query.patch

Re: Display individual query in pg_stat_activity

From

Magnus Hagander

Date:

06 August 2020, 10:24:46

On Thu, Aug 6, 2020 at 12:17 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

Hi,
On 7/27/20 4:57 PM, Dave Page wrote:
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

Hi

On Mon, Jul 27, 2020 at 3:40 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
Hi hackers,

I've attached a patch to display individual query in the pg_stat_activity query field when multiple SQL statements are currently displayed.

Motivation:

When multiple statements are displayed then we don’t know which one is currently running.

I'm not sure I'd want that to happen, as it could make it much harder to track the activity back to a query in the application layer or server logs.

Perhaps a separate field could be added for the current statement, or a value to indicate what the current statement number in the query is?
Thanks for he feedback.
I like the idea of adding extra information without changing the current behavior.
A value to indicate what the current statement number is, would need parsing the query field by the user to get the individual statement.
I think the separate field makes sense (though it come with an extra memory price) as it will not change the existing behavior and would just provide extra information (without any extra parsing needed for the user).

Idle though without having considered it too much -- you might reduce the memory overhead by just storing a start/end offset into the combined query string instead of a copy of the query. That way the cost would only be paid when doing the reading of pg_stat_activity (by extracting the piece of the string), which I'd argue is done orders of magnitude fewer times than the query changes at least on busy systems. Care would have to be taken for the case of the current executing query actually being entirely past the end of the query string buffer of course, but I don't think that's too hard to define a useful behaviour for. (The user interface would stay the same, showing the actual string and thus not requiring the user to do any parsing)

Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Re: Display individual query in pg_stat_activity

From

"Drouvot, Bertrand"

Date:

17 August 2020, 05:49:12

Hi,

On 8/6/20 12:24 PM, Magnus Hagander wrote:

On Thu, Aug 6, 2020 at 12:17 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
Hi,
On 7/27/20 4:57 PM, Dave Page wrote:

Hi

On Mon, Jul 27, 2020 at 3:40 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
Hi hackers,

I've attached a patch to display individual query in the pg_stat_activity query field when multiple SQL statements are currently displayed.

Motivation:

When multiple statements are displayed then we don’t know which one is currently running.

I'm not sure I'd want that to happen, as it could make it much harder to track the activity back to a query in the application layer or server logs.

Perhaps a separate field could be added for the current statement, or a value to indicate what the current statement number in the query is?
Thanks for he feedback.
I like the idea of adding extra information without changing the current behavior.
A value to indicate what the current statement number is, would need parsing the query field by the user to get the individual statement.
I think the separate field makes sense (though it come with an extra memory price) as it will not change the existing behavior and would just provide extra information (without any extra parsing needed for the user).

Idle though without having considered it too much -- you might reduce the memory overhead by just storing a start/end offset into the combined query string instead of a copy of the query.

Good point, thanks for the feedback.

The new attached patch is making use of stmt_len and stmt_location (instead of a copy of the query).

That way the cost would only be paid when doing the reading of pg_stat_activity (by extracting the piece of the string), which I'd argue is done orders of magnitude fewer times than the query changes at least on busy systems.

The individual query extraction (making use of stmt_len and stmt_location) has been moved to pg_stat_get_activity() in the new attached patch (as opposed to pgstat_report_activity() in the previous patch version).

Care would have to be taken for the case of the current executing query actually being entirely past the end of the query string buffer of course, but I don't think that's too hard to define a useful behaviour for. (The user interface would stay the same, showing the actual string and thus not requiring the user to do any parsing)

As a proposal the new attached patch does not display the individual query if length + location is greater than pgstat_track_activity_query_size (anyway it could not, as the query field that might contain multiple statements is already <= pgstat_track_activity_query_size in pg_stat_get_activity()).

Bertrand

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Attachment

v2-0002-pg_stat_activity_individual_query.patch

Re: Display individual query in pg_stat_activity

From

Masahiro Ikeda

Date:

18 August 2020, 06:54:02

Hi,

> I've attached a patch to display individual query in the
> pg_stat_activity query field when multiple SQL statements are
> currently displayed.
> 
> Motivation:
> 
> When multiple statements are displayed then we don’t know which
> one is currently running.
> 
> I'm not sure I'd want that to happen, as it could make it much
> harder to track the activity back to a query in the application
> layer or server logs.
> 
> Perhaps a separate field could be added for the current statement,
> or a value to indicate what the current statement number in the
> query is?

As a user, I think this feature is useful to users.

It would be nice that pg_stat_activity also show currently running query
in a user defined function(PL/pgSQL) .

I understood that this patch is not for user defined functions.
Please let me know if it's better to make another thread.

In general, PL/pgSQL functions have multiple queries,
and users want to know the progress of query execution, doesn't it?

-- 
Masahiro Ikeda
NTT DATA CORPORATION

Re: Display individual query in pg_stat_activity

From

Pavel Stehule

Date:

18 August 2020, 07:35:05

út 18. 8. 2020 v 8:54 odesílatel Masahiro Ikeda <ikedamsh@oss.nttdata.com> napsal:

Hi,

> I've attached a patch to display individual query in the
> pg_stat_activity query field when multiple SQL statements are
> currently displayed.
>
> Motivation:
>
> When multiple statements are displayed then we don’t know which
> one is currently running.
>
> I'm not sure I'd want that to happen, as it could make it much
> harder to track the activity back to a query in the application
> layer or server logs.
>
> Perhaps a separate field could be added for the current statement,
> or a value to indicate what the current statement number in the
> query is?

As a user, I think this feature is useful to users.

It would be nice that pg_stat_activity also show currently running query
in a user defined function(PL/pgSQL) .

I understood that this patch is not for user defined functions.
Please let me know if it's better to make another thread.

In general, PL/pgSQL functions have multiple queries,
and users want to know the progress of query execution, doesn't it?

I am afraid of the significant performance impact of this feature. In this case you have to copy all nested queries to the stat collector process. Very common usage of PL is a glue of very fast queries. Sure, it is used like glue for very slow queries too.

Just I thinking about two features:

1. extra interface for auto_explain, that allows you to get a stack of statements assigned to some pid (probably these informations should be stored inside shared memory and collected before any query execution). Sometimes some slow function is slow due repeated execution of relatively fast queries. In this case, the deeper nested level is not too interesting. You need to see a stack of calls and you are searching the first slow level in the stack.

2. can be nice to have a status column in pg_stat_activity, and status GUC for sending a custom information from deep levels to the user. Now, users use application_name, but some special variables can be better for this purpose. This value of status can be refreshed periodically and can substitute some tags. So developer can set

BEGIN

-- before slow long query

SET status TO 'slow query calculation xxy %d';

...

It is a alternative to RAISE NOTICE, but with different format - with format that is special for reading from pg_stat_activity

For long (slow) queries usually you need to see the sum of all times of all levels from the call stack to get valuable information.

Regards

Pavel

p.s. pg_stat_activity is maybe too wide table already, and probably is not good to enhance this table too much

--
Masahiro Ikeda
NTT DATA CORPORATION

Re: Display individual query in pg_stat_activity

From

"Drouvot, Bertrand"

Date:

19 August 2020, 05:48:24

Hi,

On 8/18/20 9:35 AM, Pavel Stehule wrote:

Hi

út 18. 8. 2020 v 8:54 odesílatel Masahiro Ikeda <ikedamsh@oss.nttdata.com> napsal:
Hi,

> I've attached a patch to display individual query in the
> pg_stat_activity query field when multiple SQL statements are
> currently displayed.
>
> Motivation:
>
> When multiple statements are displayed then we don’t know which
> one is currently running.
>
> I'm not sure I'd want that to happen, as it could make it much
> harder to track the activity back to a query in the application
> layer or server logs.
>
> Perhaps a separate field could be added for the current statement,
> or a value to indicate what the current statement number in the
> query is?

As a user, I think this feature is useful to users.

It would be nice that pg_stat_activity also show currently running query
in a user defined function(PL/pgSQL) .

I understood that this patch is not for user defined functions.
Please let me know if it's better to make another thread.

Yeah I think it would be nice to have.

I also think it would be better to create a dedicated thread (specially looking at Pavel's comment below)

In general, PL/pgSQL functions have multiple queries,
and users want to know the progress of query execution, doesn't it?

I am afraid of the significant performance impact of this feature. In this case you have to copy all nested queries to the stat collector process. Very common usage of PL is a glue of very fast queries. Sure, it is used like glue for very slow queries too.

Just I thinking about two features:

1. extra interface for auto_explain, that allows you to get a stack of statements assigned to some pid (probably these informations should be stored inside shared memory and collected before any query execution). Sometimes some slow function is slow due repeated execution of relatively fast queries. In this case, the deeper nested level is not too interesting. You need to see a stack of calls and you are searching the first slow level in the stack.

2. can be nice to have a status column in pg_stat_activity, and status GUC for sending a custom information from deep levels to the user. Now, users use application_name, but some special variables can be better for this purpose. This value of status can be refreshed periodically and can substitute some tags. So developer can set

BEGIN
-- before slow long query
SET status TO 'slow query calculation xxy %d';
...

It is a alternative to RAISE NOTICE, but with different format - with format that is special for reading from pg_stat_activity

For long (slow) queries usually you need to see the sum of all times of all levels from the call stack to get valuable information.

Regards

Pavel

p.s. pg_stat_activity is maybe too wide table already, and probably is not good to enhance this table too much

Thanks

Bertrand

--
Masahiro Ikeda
NTT DATA CORPORATION

Re: Display individual query in pg_stat_activity

From

Masahiro Ikeda

Date:

28 August 2020, 08:06:12

On 2020-08-19 14:48, Drouvot, Bertrand wrote:
> Hi,
> On 8/18/20 9:35 AM, Pavel Stehule wrote:
> 
>> Hi
>> 
>> út 18. 8. 2020 v 8:54 odesílatel Masahiro Ikeda
>> <ikedamsh@oss.nttdata.com> napsal:
>> 
>>> Hi,
>>> 
>>>> I've attached a patch to display individual query in the
>>>> pg_stat_activity query field when multiple SQL statements are
>>>> currently displayed.
>>>> 
>>>> Motivation:
>>>> 
>>>> When multiple statements are displayed then we don’t know
>>> which
>>>> one is currently running.
>>>> 
>>>> I'm not sure I'd want that to happen, as it could make it much
>>>> harder to track the activity back to a query in the application
>>>> layer or server logs.
>>>> 
>>>> Perhaps a separate field could be added for the current
>>> statement,
>>>> or a value to indicate what the current statement number in the
>>>> query is?
>>> 
>>> As a user, I think this feature is useful to users.
>>> 
>>> It would be nice that pg_stat_activity also show currently running
>>> query
>>> in a user defined function(PL/pgSQL) .
>>> 
>>> I understood that this patch is not for user defined functions.
>>> Please let me know if it's better to make another thread.
> 
> Yeah I think it would be nice to have.
> 
> I also think it would be better to create a dedicated thread
> (specially looking at Pavel's comment below)

Thank you. I will.

>>> In general, PL/pgSQL functions have multiple queries,
>>> and users want to know the progress of query execution, doesn't
>>> it?
>> 
>> I am afraid of the significant performance impact of this feature.
>> In this case you have to copy all nested queries to the stat
>> collector process. Very common usage of PL is a glue of very fast
>> queries. Sure, it is used like glue for very slow queries too.
>> Just I thinking about two features:

OK, thanks for much advice and show alternative solutions.

>> 1. extra interface for auto_explain, that allows you to get a stack
>> of statements assigned to some pid (probably these informations
>> should be stored inside shared memory and collected before any query
>> execution). Sometimes some slow function is slow due repeated
>> execution of relatively fast queries. In this case, the deeper
>> nested level is not too interesting. You need to see a stack of
>> calls and you are searching the first slow level in the stack.

Thanks. I didn't know auto_explain module.
I agreed when only requested, it copy the stack of statements.

>> 2. can be nice to have a status column in pg_stat_activity, and
>> status GUC for sending a custom information from deep levels to the
>> user. Now, users use application_name, but some special variables
>> can be better for this purpose.  This value of status can be
>> refreshed periodically and can substitute some tags. So developer
>> can set
>> 
>> BEGIN
>> -- before slow long query
>> SET status TO 'slow query calculation xxy %d';
>> ...
>> 
>> It is a alternative to RAISE NOTICE, but with different format -
>> with format that is special for reading from pg_stat_activity
>> 
>> For long (slow) queries usually you need to see the sum of all times
>> of all levels from the call stack to get valuable information.

In comparison to 1, user must implements logging statement to
their query but user can control what he/she wants to know.

I worry which solution is best.

>> p.s. pg_stat_activity is maybe too wide table already, and probably
>> is not good to enhance this table too much

Thanks. I couldn't think from this point of view.

After I make some PoC patches, I will create a dedicated thread.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION

Re: Display individual query in pg_stat_activity

From

Pavel Stehule

Date:

28 August 2020, 08:42:39

pá 28. 8. 2020 v 10:06 odesílatel Masahiro Ikeda <ikedamsh@oss.nttdata.com> napsal:

On 2020-08-19 14:48, Drouvot, Bertrand wrote:
> Hi,
> On 8/18/20 9:35 AM, Pavel Stehule wrote:
>
>> Hi
>>
>> út 18. 8. 2020 v 8:54 odesílatel Masahiro Ikeda
>> <ikedamsh@oss.nttdata.com> napsal:
>>
>>> Hi,
>>>
>>>> I've attached a patch to display individual query in the
>>>> pg_stat_activity query field when multiple SQL statements are
>>>> currently displayed.
>>>>
>>>> Motivation:
>>>>
>>>> When multiple statements are displayed then we don’t know
>>> which
>>>> one is currently running.
>>>>
>>>> I'm not sure I'd want that to happen, as it could make it much
>>>> harder to track the activity back to a query in the application
>>>> layer or server logs.
>>>>
>>>> Perhaps a separate field could be added for the current
>>> statement,
>>>> or a value to indicate what the current statement number in the
>>>> query is?
>>>
>>> As a user, I think this feature is useful to users.
>>>
>>> It would be nice that pg_stat_activity also show currently running
>>> query
>>> in a user defined function(PL/pgSQL) .
>>>
>>> I understood that this patch is not for user defined functions.
>>> Please let me know if it's better to make another thread.
>
> Yeah I think it would be nice to have.
>
> I also think it would be better to create a dedicated thread
> (specially looking at Pavel's comment below)

Thank you. I will.

>>> In general, PL/pgSQL functions have multiple queries,
>>> and users want to know the progress of query execution, doesn't
>>> it?
>>
>> I am afraid of the significant performance impact of this feature.
>> In this case you have to copy all nested queries to the stat
>> collector process. Very common usage of PL is a glue of very fast
>> queries. Sure, it is used like glue for very slow queries too.
>> Just I thinking about two features:

OK, thanks for much advice and show alternative solutions.

>> 1. extra interface for auto_explain, that allows you to get a stack
>> of statements assigned to some pid (probably these informations
>> should be stored inside shared memory and collected before any query
>> execution). Sometimes some slow function is slow due repeated
>> execution of relatively fast queries. In this case, the deeper
>> nested level is not too interesting. You need to see a stack of
>> calls and you are searching the first slow level in the stack.

Thanks. I didn't know auto_explain module.
I agreed when only requested, it copy the stack of statements.

>> 2. can be nice to have a status column in pg_stat_activity, and
>> status GUC for sending a custom information from deep levels to the
>> user. Now, users use application_name, but some special variables
>> can be better for this purpose. This value of status can be
>> refreshed periodically and can substitute some tags. So developer
>> can set
>>
>> BEGIN
>> -- before slow long query
>> SET status TO 'slow query calculation xxy %d';
>> ...
>>
>> It is a alternative to RAISE NOTICE, but with different format -
>> with format that is special for reading from pg_stat_activity
>>
>> For long (slow) queries usually you need to see the sum of all times
>> of all levels from the call stack to get valuable information.

In comparison to 1, user must implements logging statement to
their query but user can control what he/she wants to know.

I worry which solution is best.

There is no best solution - @1 doesn't need manual work, but @1 is not too useful when queries are similar (first n chars) and are long. In this case custom messages are much more practical.

I don't think so we can implement only one design - in this case we can support more tools with similar purpose but different behaviors in corner cases.

>> p.s. pg_stat_activity is maybe too wide table already, and probably
>> is not good to enhance this table too much

Thanks. I couldn't think from this point of view.

After I make some PoC patches, I will create a dedicated thread.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

Re: Display individual query in pg_stat_activity

From

"Drouvot, Bertrand"

Date:

10 September 2020, 14:06:17

On 8/17/20 7:49 AM, Drouvot, Bertrand wrote:

Hi,
On 8/6/20 12:24 PM, Magnus Hagander wrote:

On Thu, Aug 6, 2020 at 12:17 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
Hi,
On 7/27/20 4:57 PM, Dave Page wrote:

Hi

On Mon, Jul 27, 2020 at 3:40 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
Hi hackers,

I've attached a patch to display individual query in the pg_stat_activity query field when multiple SQL statements are currently displayed.

Motivation:

When multiple statements are displayed then we don’t know which one is currently running.

I'm not sure I'd want that to happen, as it could make it much harder to track the activity back to a query in the application layer or server logs.

Perhaps a separate field could be added for the current statement, or a value to indicate what the current statement number in the query is?
Thanks for he feedback.
I like the idea of adding extra information without changing the current behavior.
A value to indicate what the current statement number is, would need parsing the query field by the user to get the individual statement.
I think the separate field makes sense (though it come with an extra memory price) as it will not change the existing behavior and would just provide extra information (without any extra parsing needed for the user).

Idle though without having considered it too much -- you might reduce the memory overhead by just storing a start/end offset into the combined query string instead of a copy of the query.
Good point, thanks for the feedback.
The new attached patch is making use of stmt_len and stmt_location (instead of a copy of the query).
That way the cost would only be paid when doing the reading of pg_stat_activity (by extracting the piece of the string), which I'd argue is done orders of magnitude fewer times than the query changes at least on busy systems.
The individual query extraction (making use of stmt_len and stmt_location) has been moved to pg_stat_get_activity() in the new attached patch (as opposed to pgstat_report_activity() in the previous patch version).
Care would have to be taken for the case of the current executing query actually being entirely past the end of the query string buffer of course, but I don't think that's too hard to define a useful behaviour for. (The user interface would stay the same, showing the actual string and thus not requiring the user to do any parsing)

As a proposal the new attached patch does not display the individual query if length + location is greater than pgstat_track_activity_query_size (anyway it could not, as the query field that might contain multiple statements is already <= pgstat_track_activity_query_size in pg_stat_get_activity()).
Bertrand

Attaching a new version as the previous one was not passing the Patch Tester anymore.

Bertrand

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Attachment

v2-0003-pg_stat_activity_individual_query.patch

Re: Display individual query in pg_stat_activity

From

Michael Paquier

Date:

24 September 2020, 03:29:38

On Thu, Sep 10, 2020 at 04:06:17PM +0200, Drouvot, Bertrand wrote:
> Attaching a new version as the previous one was not passing the Patch Tester
> anymore.

Ditto, the CF bot is complaining again.  Could you send a rebase?
--
Michael

Attachment

signature.asc

Re: Display individual query in pg_stat_activity

From

"Drouvot, Bertrand"

Date:

26 September 2020, 07:37:49

On 9/24/20 5:29 AM, Michael Paquier wrote:
> On Thu, Sep 10, 2020 at 04:06:17PM +0200, Drouvot, Bertrand wrote:
>> Attaching a new version as the previous one was not passing the Patch Tester
>> anymore.
> Ditto, the CF bot is complaining again.  Could you send a rebase?

Thanks for letting me know.

Attached a new version.

Bertrand

Attachment

v2-0004-pg_stat_activity_individual_query.patch

Re: Display individual query in pg_stat_activity

From

Georgios Kokolatos

Date:

10 November 2020, 15:05:28

Hi,

I noticed that this patch is failing on the cfbot.
For this, I changed the status to: 'Waiting on Author'

Cheers,
Georgios

The new status of this patch is: Waiting on Author

Re: Display individual query in pg_stat_activity

From

Georgios Kokolatos

Date:

01 December 2020, 09:10:38

This patch fails in the cfbot for quite some time now.
I shall close it as Return With Feedback and not move it to the next CF.
Please feel free to register an updated version afresh for the next CF.

Cheers,
//Georgios