Thread: Display individual query in pg_stat_activity
I've attached a patch to display individual query in the pg_stat_activity query field when multiple SQL statements are currently displayed.
Motivation:
When multiple statements are displayed then we don’t know which one is currently running.
For example:
psql -c "select pg_sleep(10);select pg_sleep(20);" is currently displayed as:
postgres=# select backend_type,query from pg_stat_activity; backend_type | query ------------------------------+-------------------------------------------------- client backend | select pg_sleep(10);select pg_sleep(20);Showing which statement is currently being executed would be more helpful.
Technical context and proposal:
There is 2 points in this patch:
- modifying the current behavior in “exec_simple_query”
- modifying the current behavior in “ExecInitParallelPlan”
So that we could see for example:
backend_type | query ------------------------------+-------------------------------------------------- client backend | select pg_sleep(10);and then
backend_type | query ------------------------------+-------------------------------------------------- client backend | select pg_sleep(20);instead of the multiple sql statement described in the “motivation” section.
Another example: parallel worker being triggered while executing a function:
create or replace function test() returns void as $$select count() as "first" from foo;select pg_sleep(10);select count() as "second" from foo;select pg_sleep(11);select pg_sleep(10) $$ language sql;We currently see:
backend_type | query ------------------------------+-------------------------------------------------------------------------------------------------------------------------------------- client backend | select test(); parallel worker | select count(*) as "first" from foo;select pg_sleep(10);select count(*) as "second" from foo;select pg_sleep(11);select pg_sleep(10)+ | parallel worker | select count(*) as "first" from foo;select pg_sleep(10);select count(*) as "second" from foo;select pg_sleep(11);select pg_sleep(10)+ |while the attached patch would provide:
backend_type | query [217/1938] ------------------------------+-------------------------------------------------- client backend | select test(); parallel worker | select count(*) as "first" from foo; parallel worker | select count(*) as "first" from foo;and then:
backend_type | query ------------------------------+-------------------------------------------------- client backend | select test(); parallel worker | select count(*) as "second" from foo; parallel worker | select count(*) as "second" from foo;I will add this patch to the next commitfest. I look forward to your feedback about the idea and/or implementation.
Regards,
Bertrand
Attachment
Hi hackers,
I've attached a patch to display individual query in the pg_stat_activity query field when multiple SQL statements are currently displayed.
Motivation:
When multiple statements are displayed then we don’t know which one is currently running.
I'm not sure I'd want that to happen, as it could make it much harder to track the activity back to a query in the application layer or server logs.Perhaps a separate field could be added for the current statement, or a value to indicate what the current statement number in the query is?
Might be helpful to give some specifics about circumstances where strings can appear in pg_stat_activity.query with multiple statements.
1) First of all, IIUC multiple statements are only supported in the first place by the simple protocol and PLs. Anyone using parameterized statements (bind variables) should be unaffected by this.
2) My read of the official pg JDBC driver is that even for batch operations it currently iterates and sends each statement individually. I don't think the JDBC driver has the capability to send multiple statements, so java apps using this driver should be unaffected.
3) psql -c will always send the string as a single "simple protocol" request. Scripts will be impacted.
4) PLs also seem to have a code path that can put multiple statements in pg_stat_activity when parallel slaves are launched. PL code will be impacted.
5) pgAdmin uses the simple protocol and when a user executes a block of statements, pgAdmin seems to send the whole block as a single "simple protocol" request. Tools like pgAdmin will be impacted.
At the application layer, it doesn't seem problematic to me if PostgreSQL reports each query one at a time. IMO most people will find this to be a more useful behavior and they will still find their queries in their app code or app logs.
However at the PostgreSQL logging layer this is a good call-out. I just did a quick test on 14devel to double-check my assumption and it does seem that PostgreSQL logs the entire combined query for psql -c. I think it would be better for PostgreSQL to report queries individually in the log too - for example pgBadger summaries will be even more useful if they report information for each individual query rather than a single big block of multiple queries.
Given how small this patch is, it seems worthwhile to at least investigate whether the logging component could be addressed just as easily.
-Jeremy
-- Jeremy Schneider Database Engineer Amazon Web Services
On 7/27/20 07:57, Dave Page wrote:I'm not sure I'd want that to happen, as it could make it much harder to track the activity back to a query in the application layer or server logs.Perhaps a separate field could be added for the current statement, or a value to indicate what the current statement number in the query is?
Might be helpful to give some specifics about circumstances where strings can appear in pg_stat_activity.query with multiple statements.
1) First of all, IIUC multiple statements are only supported in the first place by the simple protocol and PLs. Anyone using parameterized statements (bind variables) should be unaffected by this.
2) My read of the official pg JDBC driver is that even for batch operations it currently iterates and sends each statement individually. I don't think the JDBC driver has the capability to send multiple statements, so java apps using this driver should be unaffected.
3) psql -c will always send the string as a single "simple protocol" request. Scripts will be impacted.
4) PLs also seem to have a code path that can put multiple statements in pg_stat_activity when parallel slaves are launched. PL code will be impacted.
5) pgAdmin uses the simple protocol and when a user executes a block of statements, pgAdmin seems to send the whole block as a single "simple protocol" request. Tools like pgAdmin will be impacted.
At the application layer, it doesn't seem problematic to me if PostgreSQL reports each query one at a time. IMO most people will find this to be a more useful behavior and they will still find their queries in their app code or app logs.
However at the PostgreSQL logging layer this is a good call-out. I just did a quick test on 14devel to double-check my assumption and it does seem that PostgreSQL logs the entire combined query for psql -c. I think it would be better for PostgreSQL to report queries individually in the log too - for example pgBadger summaries will be even more useful if they report information for each individual query rather than a single big block of multiple queries.
Given how small this patch is, it seems worthwhile to at least investigate whether the logging component could be addressed just as easily.
-Jeremy-- Jeremy Schneider Database Engineer Amazon Web Services
<snip>On Mon, Jul 27, 2020 at 3:40 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
Perhaps turn query into text[]. That would make it easy to concatenate back together if desired.When multiple statements are displayed then we don’t know which one is currently running.I'm not sure I'd want that to happen, as it could make it much harder to track the activity back to a query in the application layer or server logs.Perhaps a separate field could be added for the current statement, or a value to indicate what the current statement number in the query is?
--
Hi,
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
HiOn Mon, Jul 27, 2020 at 3:40 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:Hi hackers,
I've attached a patch to display individual query in the pg_stat_activity query field when multiple SQL statements are currently displayed.
Motivation:
When multiple statements are displayed then we don’t know which one is currently running.I'm not sure I'd want that to happen, as it could make it much harder to track the activity back to a query in the application layer or server logs.Perhaps a separate field could be added for the current statement, or a value to indicate what the current statement number in the query is?
Thanks for he feedback.
I like the idea of adding extra information without changing the current behavior.
A value to indicate what the current statement number is, would need parsing the query field by the user to get the individual statement.
I think the separate field makes sense (though it come with an extra memory price) as it will not change the existing behavior and would just provide extra information (without any extra parsing needed for the user).
I attached a mock up v2 patch that adds this new field.
Outcome Examples:
backend_type | query | individual_query
----------------+---------------------------------------------------------------------------------------------+----------------------
client backend | select backend_type, query, individual_query from pg_stat_activity where length(query) > 0; |
client backend | select pg_sleep(10);select pg_sleep(20); | select pg_sleep(20);
or
backend_type | query | individual_query
-----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------
client backend | select backend_type, query, individual_query from pg_stat_activity where length(query) > 0; |
client backend | select test(); |
parallel worker | select count(*) as "first" from foo;select pg_sleep(10);create index bdtidx on foo(generate_series);select count(*) as "second" from foo;select pg_sleep(11);select count(*) as "third" from foo | select count(*) as "second" from foo;
parallel worker | select count(*) as "first" from foo;select pg_sleep(10);create index bdtidx on foo(generate_series);select count(*) as "second" from foo;select pg_sleep(11);select count(*) as "third" from foo | select count(*) as "second" from foo;
As you can see the individual_query field is populated only when the query field is a multiple statements one.
Regards,
Bertrand
Attachment
Hi,
On 7/27/20 4:57 PM, Dave Page wrote:
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
HiOn Mon, Jul 27, 2020 at 3:40 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:Hi hackers,
I've attached a patch to display individual query in the pg_stat_activity query field when multiple SQL statements are currently displayed.
Motivation:
When multiple statements are displayed then we don’t know which one is currently running.I'm not sure I'd want that to happen, as it could make it much harder to track the activity back to a query in the application layer or server logs.Perhaps a separate field could be added for the current statement, or a value to indicate what the current statement number in the query is?Thanks for he feedback.
I like the idea of adding extra information without changing the current behavior.
A value to indicate what the current statement number is, would need parsing the query field by the user to get the individual statement.
I think the separate field makes sense (though it come with an extra memory price) as it will not change the existing behavior and would just provide extra information (without any extra parsing needed for the user).
Hi,
On Thu, Aug 6, 2020 at 12:17 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:Hi,
On 7/27/20 4:57 PM, Dave Page wrote:HiOn Mon, Jul 27, 2020 at 3:40 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:Hi hackers,
I've attached a patch to display individual query in the pg_stat_activity query field when multiple SQL statements are currently displayed.
Motivation:
When multiple statements are displayed then we don’t know which one is currently running.I'm not sure I'd want that to happen, as it could make it much harder to track the activity back to a query in the application layer or server logs.Perhaps a separate field could be added for the current statement, or a value to indicate what the current statement number in the query is?Thanks for he feedback.
I like the idea of adding extra information without changing the current behavior.
A value to indicate what the current statement number is, would need parsing the query field by the user to get the individual statement.
I think the separate field makes sense (though it come with an extra memory price) as it will not change the existing behavior and would just provide extra information (without any extra parsing needed for the user).
Idle though without having considered it too much -- you might reduce the memory overhead by just storing a start/end offset into the combined query string instead of a copy of the query.
Good point, thanks for the feedback.
The new attached patch is making use of stmt_len and stmt_location (instead of a copy of the query).
That way the cost would only be paid when doing the reading of pg_stat_activity (by extracting the piece of the string), which I'd argue is done orders of magnitude fewer times than the query changes at least on busy systems.
The individual query extraction (making use of stmt_len and stmt_location) has been moved to pg_stat_get_activity() in the new attached patch (as opposed to pgstat_report_activity() in the previous patch version).
Care would have to be taken for the case of the current executing query actually being entirely past the end of the query string buffer of course, but I don't think that's too hard to define a useful behaviour for. (The user interface would stay the same, showing the actual string and thus not requiring the user to do any parsing)
As a proposal the new attached patch does not display the individual query if length + location is greater than pgstat_track_activity_query_size (anyway it could not, as the query field that might contain multiple statements is already <= pgstat_track_activity_query_size in pg_stat_get_activity()).
Bertrand
--
Attachment
Hi, > I've attached a patch to display individual query in the > pg_stat_activity query field when multiple SQL statements are > currently displayed. > > Motivation: > > When multiple statements are displayed then we don’t know which > one is currently running. > > I'm not sure I'd want that to happen, as it could make it much > harder to track the activity back to a query in the application > layer or server logs. > > Perhaps a separate field could be added for the current statement, > or a value to indicate what the current statement number in the > query is? As a user, I think this feature is useful to users. It would be nice that pg_stat_activity also show currently running query in a user defined function(PL/pgSQL) . I understood that this patch is not for user defined functions. Please let me know if it's better to make another thread. In general, PL/pgSQL functions have multiple queries, and users want to know the progress of query execution, doesn't it? -- Masahiro Ikeda NTT DATA CORPORATION
Hi,
> I've attached a patch to display individual query in the
> pg_stat_activity query field when multiple SQL statements are
> currently displayed.
>
> Motivation:
>
> When multiple statements are displayed then we don’t know which
> one is currently running.
>
> I'm not sure I'd want that to happen, as it could make it much
> harder to track the activity back to a query in the application
> layer or server logs.
>
> Perhaps a separate field could be added for the current statement,
> or a value to indicate what the current statement number in the
> query is?
As a user, I think this feature is useful to users.
It would be nice that pg_stat_activity also show currently running query
in a user defined function(PL/pgSQL) .
I understood that this patch is not for user defined functions.
Please let me know if it's better to make another thread.
In general, PL/pgSQL functions have multiple queries,
and users want to know the progress of query execution, doesn't it?
--
Masahiro Ikeda
NTT DATA CORPORATION
Hi,
Hiút 18. 8. 2020 v 8:54 odesílatel Masahiro Ikeda <ikedamsh@oss.nttdata.com> napsal:Hi,
> I've attached a patch to display individual query in the
> pg_stat_activity query field when multiple SQL statements are
> currently displayed.
>
> Motivation:
>
> When multiple statements are displayed then we don’t know which
> one is currently running.
>
> I'm not sure I'd want that to happen, as it could make it much
> harder to track the activity back to a query in the application
> layer or server logs.
>
> Perhaps a separate field could be added for the current statement,
> or a value to indicate what the current statement number in the
> query is?
As a user, I think this feature is useful to users.
It would be nice that pg_stat_activity also show currently running query
in a user defined function(PL/pgSQL) .
I understood that this patch is not for user defined functions.
Please let me know if it's better to make another thread.
Yeah I think it would be nice to have.
I also think it would be better to create a dedicated thread (specially looking at Pavel's comment below)
In general, PL/pgSQL functions have multiple queries,
and users want to know the progress of query execution, doesn't it?I am afraid of the significant performance impact of this feature. In this case you have to copy all nested queries to the stat collector process. Very common usage of PL is a glue of very fast queries. Sure, it is used like glue for very slow queries too.Just I thinking about two features:1. extra interface for auto_explain, that allows you to get a stack of statements assigned to some pid (probably these informations should be stored inside shared memory and collected before any query execution). Sometimes some slow function is slow due repeated execution of relatively fast queries. In this case, the deeper nested level is not too interesting. You need to see a stack of calls and you are searching the first slow level in the stack.2. can be nice to have a status column in pg_stat_activity, and status GUC for sending a custom information from deep levels to the user. Now, users use application_name, but some special variables can be better for this purpose. This value of status can be refreshed periodically and can substitute some tags. So developer can setBEGIN-- before slow long querySET status TO 'slow query calculation xxy %d';...It is a alternative to RAISE NOTICE, but with different format - with format that is special for reading from pg_stat_activityFor long (slow) queries usually you need to see the sum of all times of all levels from the call stack to get valuable information.RegardsPavelp.s. pg_stat_activity is maybe too wide table already, and probably is not good to enhance this table too much
Thanks
Bertrand
--
Masahiro Ikeda
NTT DATA CORPORATION
On 2020-08-19 14:48, Drouvot, Bertrand wrote: > Hi, > On 8/18/20 9:35 AM, Pavel Stehule wrote: > >> Hi >> >> út 18. 8. 2020 v 8:54 odesílatel Masahiro Ikeda >> <ikedamsh@oss.nttdata.com> napsal: >> >>> Hi, >>> >>>> I've attached a patch to display individual query in the >>>> pg_stat_activity query field when multiple SQL statements are >>>> currently displayed. >>>> >>>> Motivation: >>>> >>>> When multiple statements are displayed then we don’t know >>> which >>>> one is currently running. >>>> >>>> I'm not sure I'd want that to happen, as it could make it much >>>> harder to track the activity back to a query in the application >>>> layer or server logs. >>>> >>>> Perhaps a separate field could be added for the current >>> statement, >>>> or a value to indicate what the current statement number in the >>>> query is? >>> >>> As a user, I think this feature is useful to users. >>> >>> It would be nice that pg_stat_activity also show currently running >>> query >>> in a user defined function(PL/pgSQL) . >>> >>> I understood that this patch is not for user defined functions. >>> Please let me know if it's better to make another thread. > > Yeah I think it would be nice to have. > > I also think it would be better to create a dedicated thread > (specially looking at Pavel's comment below) Thank you. I will. >>> In general, PL/pgSQL functions have multiple queries, >>> and users want to know the progress of query execution, doesn't >>> it? >> >> I am afraid of the significant performance impact of this feature. >> In this case you have to copy all nested queries to the stat >> collector process. Very common usage of PL is a glue of very fast >> queries. Sure, it is used like glue for very slow queries too. >> Just I thinking about two features: OK, thanks for much advice and show alternative solutions. >> 1. extra interface for auto_explain, that allows you to get a stack >> of statements assigned to some pid (probably these informations >> should be stored inside shared memory and collected before any query >> execution). Sometimes some slow function is slow due repeated >> execution of relatively fast queries. In this case, the deeper >> nested level is not too interesting. You need to see a stack of >> calls and you are searching the first slow level in the stack. Thanks. I didn't know auto_explain module. I agreed when only requested, it copy the stack of statements. >> 2. can be nice to have a status column in pg_stat_activity, and >> status GUC for sending a custom information from deep levels to the >> user. Now, users use application_name, but some special variables >> can be better for this purpose. This value of status can be >> refreshed periodically and can substitute some tags. So developer >> can set >> >> BEGIN >> -- before slow long query >> SET status TO 'slow query calculation xxy %d'; >> ... >> >> It is a alternative to RAISE NOTICE, but with different format - >> with format that is special for reading from pg_stat_activity >> >> For long (slow) queries usually you need to see the sum of all times >> of all levels from the call stack to get valuable information. In comparison to 1, user must implements logging statement to their query but user can control what he/she wants to know. I worry which solution is best. >> p.s. pg_stat_activity is maybe too wide table already, and probably >> is not good to enhance this table too much Thanks. I couldn't think from this point of view. After I make some PoC patches, I will create a dedicated thread. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
On 2020-08-19 14:48, Drouvot, Bertrand wrote:
> Hi,
> On 8/18/20 9:35 AM, Pavel Stehule wrote:
>
>> Hi
>>
>> út 18. 8. 2020 v 8:54 odesílatel Masahiro Ikeda
>> <ikedamsh@oss.nttdata.com> napsal:
>>
>>> Hi,
>>>
>>>> I've attached a patch to display individual query in the
>>>> pg_stat_activity query field when multiple SQL statements are
>>>> currently displayed.
>>>>
>>>> Motivation:
>>>>
>>>> When multiple statements are displayed then we don’t know
>>> which
>>>> one is currently running.
>>>>
>>>> I'm not sure I'd want that to happen, as it could make it much
>>>> harder to track the activity back to a query in the application
>>>> layer or server logs.
>>>>
>>>> Perhaps a separate field could be added for the current
>>> statement,
>>>> or a value to indicate what the current statement number in the
>>>> query is?
>>>
>>> As a user, I think this feature is useful to users.
>>>
>>> It would be nice that pg_stat_activity also show currently running
>>> query
>>> in a user defined function(PL/pgSQL) .
>>>
>>> I understood that this patch is not for user defined functions.
>>> Please let me know if it's better to make another thread.
>
> Yeah I think it would be nice to have.
>
> I also think it would be better to create a dedicated thread
> (specially looking at Pavel's comment below)
Thank you. I will.
>>> In general, PL/pgSQL functions have multiple queries,
>>> and users want to know the progress of query execution, doesn't
>>> it?
>>
>> I am afraid of the significant performance impact of this feature.
>> In this case you have to copy all nested queries to the stat
>> collector process. Very common usage of PL is a glue of very fast
>> queries. Sure, it is used like glue for very slow queries too.
>> Just I thinking about two features:
OK, thanks for much advice and show alternative solutions.
>> 1. extra interface for auto_explain, that allows you to get a stack
>> of statements assigned to some pid (probably these informations
>> should be stored inside shared memory and collected before any query
>> execution). Sometimes some slow function is slow due repeated
>> execution of relatively fast queries. In this case, the deeper
>> nested level is not too interesting. You need to see a stack of
>> calls and you are searching the first slow level in the stack.
Thanks. I didn't know auto_explain module.
I agreed when only requested, it copy the stack of statements.
>> 2. can be nice to have a status column in pg_stat_activity, and
>> status GUC for sending a custom information from deep levels to the
>> user. Now, users use application_name, but some special variables
>> can be better for this purpose. This value of status can be
>> refreshed periodically and can substitute some tags. So developer
>> can set
>>
>> BEGIN
>> -- before slow long query
>> SET status TO 'slow query calculation xxy %d';
>> ...
>>
>> It is a alternative to RAISE NOTICE, but with different format -
>> with format that is special for reading from pg_stat_activity
>>
>> For long (slow) queries usually you need to see the sum of all times
>> of all levels from the call stack to get valuable information.
In comparison to 1, user must implements logging statement to
their query but user can control what he/she wants to know.
I worry which solution is best.
>> p.s. pg_stat_activity is maybe too wide table already, and probably
>> is not good to enhance this table too much
Thanks. I couldn't think from this point of view.
After I make some PoC patches, I will create a dedicated thread.
Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION
Hi,
On 8/6/20 12:24 PM, Magnus Hagander wrote:On Thu, Aug 6, 2020 at 12:17 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:Hi,
On 7/27/20 4:57 PM, Dave Page wrote:HiOn Mon, Jul 27, 2020 at 3:40 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:Hi hackers,
I've attached a patch to display individual query in the pg_stat_activity query field when multiple SQL statements are currently displayed.
Motivation:
When multiple statements are displayed then we don’t know which one is currently running.I'm not sure I'd want that to happen, as it could make it much harder to track the activity back to a query in the application layer or server logs.Perhaps a separate field could be added for the current statement, or a value to indicate what the current statement number in the query is?Thanks for he feedback.
I like the idea of adding extra information without changing the current behavior.
A value to indicate what the current statement number is, would need parsing the query field by the user to get the individual statement.
I think the separate field makes sense (though it come with an extra memory price) as it will not change the existing behavior and would just provide extra information (without any extra parsing needed for the user).
Idle though without having considered it too much -- you might reduce the memory overhead by just storing a start/end offset into the combined query string instead of a copy of the query.Good point, thanks for the feedback.
The new attached patch is making use of stmt_len and stmt_location (instead of a copy of the query).
That way the cost would only be paid when doing the reading of pg_stat_activity (by extracting the piece of the string), which I'd argue is done orders of magnitude fewer times than the query changes at least on busy systems.The individual query extraction (making use of stmt_len and stmt_location) has been moved to pg_stat_get_activity() in the new attached patch (as opposed to pgstat_report_activity() in the previous patch version).
Care would have to be taken for the case of the current executing query actually being entirely past the end of the query string buffer of course, but I don't think that's too hard to define a useful behaviour for. (The user interface would stay the same, showing the actual string and thus not requiring the user to do any parsing)As a proposal the new attached patch does not display the individual query if length + location is greater than pgstat_track_activity_query_size (anyway it could not, as the query field that might contain multiple statements is already <= pgstat_track_activity_query_size in pg_stat_get_activity()).
Bertrand
Attaching a new version as the previous one was not passing the Patch Tester anymore.
Bertrand
--
Attachment
On Thu, Sep 10, 2020 at 04:06:17PM +0200, Drouvot, Bertrand wrote: > Attaching a new version as the previous one was not passing the Patch Tester > anymore. Ditto, the CF bot is complaining again. Could you send a rebase? -- Michael
Attachment
On 9/24/20 5:29 AM, Michael Paquier wrote: > On Thu, Sep 10, 2020 at 04:06:17PM +0200, Drouvot, Bertrand wrote: >> Attaching a new version as the previous one was not passing the Patch Tester >> anymore. > Ditto, the CF bot is complaining again. Could you send a rebase? Thanks for letting me know. Attached a new version. Bertrand
Attachment
Hi, I noticed that this patch is failing on the cfbot. For this, I changed the status to: 'Waiting on Author' Cheers, Georgios The new status of this patch is: Waiting on Author
This patch fails in the cfbot for quite some time now. I shall close it as Return With Feedback and not move it to the next CF. Please feel free to register an updated version afresh for the next CF. Cheers, //Georgios