Thread: [PATCH] Query Jumbling for CALL and SET utility statements
Hi hackers,
While query jumbling is provided for function calls that’s currently not the case for procedures calls.
The reason behind this is that all utility statements are currently discarded for jumbling.
We’ve recently seen performance impacts (LWLock contention) due to the lack of jumbling on procedure calls with pg_stat_statements and pg_stat_statements.track_utility enabled (think an application with a high rate of procedure calls with unique parameters for each call).
Jeremy has had this conversation on twitter (see https://twitter.com/jer_s/status/1560003560116342785) and Nikolay reported that he also had to work on a similar performance issue with SET being used.
That’s why we think it would make sense to allow jumbling for those 2 utility statements: CALL and SET.
Please find attached a patch proposal for doing so.
With the attached patch we would get things like:CALL MINUS_TWO(3);
CALL MINUS_TWO(7);
CALL SUM_TWO(3, 8);
CALL SUM_TWO(7, 5);
set enable_seqscan=false;
set enable_seqscan=true;
set seq_page_cost=2.0;
set seq_page_cost=1.0;
postgres=# SELECT query, calls, rows FROM pg_stat_statements;
query | calls | rows
-----------------------------------+-------+------
set seq_page_cost=$1 | 2 | 0
CALL MINUS_TWO($1) | 2 | 0
set enable_seqscan=$1 | 2 | 0
CALL SUM_TWO($1, $2) | 2 | 0
Looking forward to your feedback,
Thanks,
Jeremy & Bertrand
-- Bertrand Drouvot Amazon Web Services: https://aws.amazon.com
Attachment
Hi hackers,
While query jumbling is provided for function calls that’s currently not the case for procedures calls.
The reason behind this is that all utility statements are currently discarded for jumbling.
We’ve recently seen performance impacts (LWLock contention) due to the lack of jumbling on procedure calls with pg_stat_statements and pg_stat_statements.track_utility enabled (think an application with a high rate of procedure calls with unique parameters for each call).
Jeremy has had this conversation on twitter (see https://twitter.com/jer_s/status/1560003560116342785) and Nikolay reported that he also had to work on a similar performance issue with SET being used.
That’s why we think it would make sense to allow jumbling for those 2 utility statements: CALL and SET.
Please find attached a patch proposal for doing so.
With the attached patch we would get things like:CALL MINUS_TWO(3);
CALL MINUS_TWO(7);
CALL SUM_TWO(3, 8);
CALL SUM_TWO(7, 5);
set enable_seqscan=false;
set enable_seqscan=true;
set seq_page_cost=2.0;
set seq_page_cost=1.0;postgres=# SELECT query, calls, rows FROM pg_stat_statements;
query | calls | rows
-----------------------------------+-------+------
set seq_page_cost=$1 | 2 | 0
CALL MINUS_TWO($1) | 2 | 0
set enable_seqscan=$1 | 2 | 0
CALL SUM_TWO($1, $2) | 2 | 0Looking forward to your feedback,
Hi, On 2022-08-31 17:33:44 +0200, Drouvot, Bertrand wrote: > While query jumbling is provided for function calls that’s currently not the > case for procedures calls. > The reason behind this is that all utility statements are currently > discarded for jumbling. > [...] > That’s why we think it would make sense to allow jumbling for those 2 > utility statements: CALL and SET. Yes, I've seen this be an issue repeatedly. Although more heavily for PREPARE / EXECUTE than either of the two cases you handle here. IME not tracking PREPARE / EXECUTE can distort statistics substantially - there's appears to be a surprising number of applications / frameworks resorting to them. Basically requiring that track utility is turned on. I suspect we should carve out things like CALL, PREPARE, EXECUTE from track_utility - it's more or less an architectural accident that they're utility statements. It's a bit less clear that SET should be dealt with that way. > @@ -383,6 +384,30 @@ JumbleExpr(JumbleState *jstate, Node *node) > APP_JUMB(var->varlevelsup); > } > break; > + case T_CallStmt: > + { > + CallStmt *stmt = (CallStmt *) node; > + FuncExpr *expr = stmt->funcexpr; > + > + APP_JUMB(expr->funcid); > + JumbleExpr(jstate, (Node *) expr->args); > + } > + break; Why do we need to take the arguments into account? > + case T_VariableSetStmt: > + { > + VariableSetStmt *stmt = (VariableSetStmt *) node; > + > + APP_JUMB_STRING(stmt->name); > + JumbleExpr(jstate, (Node *) stmt->args); > + } > + break; Same? > + case T_A_Const: > + { > + int loc = ((const A_Const *) node)->location; > + > + RecordConstLocation(jstate, loc); > + } > + break; I suspect we only need this because of the jumbling of unparsed arguments I questioned above? If we do end up needing it, shouldn't we include the type in the jumbling? Greetings, Andres Freund
Hist 31. 8. 2022 v 17:34 odesílatel Drouvot, Bertrand <bdrouvot@amazon.com> napsal:Hi hackers,
While query jumbling is provided for function calls that’s currently not the case for procedures calls.
The reason behind this is that all utility statements are currently discarded for jumbling.
We’ve recently seen performance impacts (LWLock contention) due to the lack of jumbling on procedure calls with pg_stat_statements and pg_stat_statements.track_utility enabled (think an application with a high rate of procedure calls with unique parameters for each call).
Jeremy has had this conversation on twitter (see https://twitter.com/jer_s/status/1560003560116342785) and Nikolay reported that he also had to work on a similar performance issue with SET being used.
That’s why we think it would make sense to allow jumbling for those 2 utility statements: CALL and SET.
Please find attached a patch proposal for doing so.
With the attached patch we would get things like:CALL MINUS_TWO(3);
CALL MINUS_TWO(7);
CALL SUM_TWO(3, 8);
CALL SUM_TWO(7, 5);
set enable_seqscan=false;
set enable_seqscan=true;
set seq_page_cost=2.0;
set seq_page_cost=1.0;postgres=# SELECT query, calls, rows FROM pg_stat_statements;
query | calls | rows
-----------------------------------+-------+------
set seq_page_cost=$1 | 2 | 0
CALL MINUS_TWO($1) | 2 | 0
set enable_seqscan=$1 | 2 | 0
CALL SUM_TWO($1, $2) | 2 | 0Looking forward to your feedback,
The idea is good, but I think you should use pg_stat_functions instead. Maybe it is supported already (I didn't test it). I am not sure so SET statement should be traced in pg_stat_statements - it is usually pretty fast, and without context it says nothing. It looks like just overhead.
RegardsPavel
On 8/31/22 8:33 AM, Drouvot, Bertrand wrote: > > We’ve recently seen performance impacts (LWLock contention) due to the > lack of jumbling on procedure calls with pg_stat_statements and > pg_stat_statements.track_utility enabled (think an application with a > high rate of procedure calls with unique parameters for each call). I ran some performance tests with the patch that Bertrand wrote to get numbers. From my perspective, this patch is scoped very minimally and is low risk; I don’t think it should need an enormous amount of validation. It does appear to address the issues with both SET and CALL statements that Nikolay and I respectively encountered. Honestly, this almost seems like it was just an minor oversight in the original patch that added support for CALL and procedures. I used an r5.large EC2 instance running Linux and tested Bertrand’s patch using the PostgreSQL 14.4 code base, compiled without and with Bertrand’s patch. The difference is a lot more extreme on big servers with lots of cores, but the difference is obvious even on a small instance like this one. As a side note: while I certainly don't want to build a database primarily based on benchmarks, it's nice when benchmarks showcase the database's strength. Without this patch, HammerDB completely falls over in stored procedure mode, since one of the procedure arguments is a time-based unique value on every call. Someone else at Amazon running HammerDB was how I originally became aware of this problem. -Jeremy ===== Setup: $ psql -c "create procedure test(x int) as 'begin return; end' language plpgsql;" CREATE PROCEDURE $ echo -e "\set x random(1,100000000) \n call test(:x)" >test-call.pgbench $ echo -e "\set x random(1,100000000) \n set application_name=':x'" >test-set.pgbench ===== CALL results without patch: [postgres@ip-172-31-44-176 ~]$ pgbench -n -c 100 -j 100 -T15 -r -f test-set.pgbench pgbench (14.4) transaction type: test-set.pgbench scaling factor: 1 query mode: simple number of clients: 100 number of threads: 100 duration: 15 s number of transactions actually processed: 728748 latency average = 2.051 ms initial connection time = 91.844 ms tps = 48755.446492 (without initial connection time) statement latencies in milliseconds: 0.000 \set x random(1,100000000) 2.046 set application_name=':x' pg-14.4 rw postgres@postgres=# select wait_event, count(*) from pg_stat_activity group by wait_event; \watch 1 ... Tue 30 Aug 2022 08:26:35 PM UTC (every 1s) wait_event | count ---------------------+------- [NULL] | 6 pg_stat_statements | 95 BgWriterMain | 1 ArchiverMain | 1 WalWriterMain | 1 AutoVacuumMain | 1 CheckpointerMain | 1 LogicalLauncherMain | 1 (8 rows) ... ===== CALL results with patch: [postgres@ip-172-31-44-176 ~]$ pgbench -n -c 100 -j 100 -T15 -r -f test-call.pgbench pgbench (14.4) transaction type: test-call.pgbench scaling factor: 1 query mode: simple number of clients: 100 number of threads: 100 duration: 15 s number of transactions actually processed: 1098776 latency average = 1.361 ms initial connection time = 89.002 ms tps = 73491.904878 (without initial connection time) statement latencies in milliseconds: 0.000 \set x random(1,100000000) 1.383 call test(:x) pg-14.4 rw postgres@postgres=# select wait_event, count(*) from pg_stat_activity group by wait_event; \watch 1 ... Tue 30 Aug 2022 08:42:51 PM UTC (every 1s) wait_event | count ---------------------+------- [NULL] | 99 BgWriterHibernate | 1 ArchiverMain | 1 WalWriterMain | 1 AutoVacuumMain | 1 CheckpointerMain | 1 ClientRead | 2 LogicalLauncherMain | 1 (8 rows) ... ===== SET results without patch: [postgres@ip-172-31-44-176 ~]$ pgbench -n -c 100 -j 100 -T15 -r -f test-set.pgbench pgbench (14.4) transaction type: test-set.pgbench scaling factor: 1 query mode: simple number of clients: 100 number of threads: 100 duration: 15 s number of transactions actually processed: 728748 latency average = 2.051 ms initial connection time = 91.844 ms tps = 48755.446492 (without initial connection time) statement latencies in milliseconds: 0.000 \set x random(1,100000000) 2.046 set application_name=':x' pg-14.4 rw postgres@postgres=# select wait_event, count(*) from pg_stat_activity group by wait_event; \watch 1 ... Tue 30 Aug 2022 08:26:35 PM UTC (every 1s) wait_event | count ---------------------+------- [NULL] | 6 pg_stat_statements | 95 BgWriterMain | 1 ArchiverMain | 1 WalWriterMain | 1 AutoVacuumMain | 1 CheckpointerMain | 1 LogicalLauncherMain | 1 (8 rows) ... ===== SET results with patch: [postgres@ip-172-31-44-176 ~]$ pgbench -n -c 100 -j 100 -T15 -r -f test-set.pgbench pgbench (14.4) transaction type: test-set.pgbench scaling factor: 1 query mode: simple number of clients: 100 number of threads: 100 duration: 15 s number of transactions actually processed: 1178844 latency average = 1.268 ms initial connection time = 89.159 ms tps = 78850.178814 (without initial connection time) statement latencies in milliseconds: 0.000 \set x random(1,100000000) 1.270 set application_name=':x' pg-14.4 rw postgres@postgres=# select wait_event, count(*) from pg_stat_activity group by wait_event; \watch 1 ... Tue 30 Aug 2022 08:44:30 PM UTC (every 1s) wait_event | count ---------------------+------- [NULL] | 101 BgWriterHibernate | 1 ArchiverMain | 1 WalWriterMain | 1 AutoVacuumMain | 1 CheckpointerMain | 1 LogicalLauncherMain | 1 (7 rows) ... -- Jeremy Schneider Database Engineer Amazon Web Services
On 8/31/22 9:08 AM, Andres Freund wrote: > > I suspect we should carve out things like CALL, PREPARE, EXECUTE from > track_utility - it's more or less an architectural accident that they're > utility statements. It's a bit less clear that SET should be dealt with that > way. Regarding SET, the compelling use case was around "application_name" whose purpose is to provide a label in pg_stat_activity and on log lines, which can be used to improve observability and connect queries to their source in application code. Nikolay's incident (on peak shopping day for an eCommerce corp) evidently involved an application which leveraged this, but as a result the contention on the pg_stat_statements LWLock in exclusive mode effectively caused an outage for the retailer? Or nearly did? My description here is based on Nikolay's public twitter comment. I've seen a lot of applications that make heavy use of temp tables, where DDL would be pretty important to track as part of the regular workload. So that probably should be added to the list alongside prepared statements. And I'd want to spend a little more time thinking about what other use cases might be missing. I'm hesitant about the general idea of carving out some utility statements away from this "track_utility" GUC. Personally, at this point, I think pg_stat_statements is critical infrastructure for anyone running PostgreSQL at scale. The information it provides is indispensable. I don't think it's really defensible to tell people that if they want to scale, then they need to fly blind on any utility statements. -Jeremy -- Jeremy Schneider Database Engineer Amazon Web Services
Hi, On 2022-08-31 11:00:05 -0700, Jeremy Schneider wrote: > On 8/31/22 9:08 AM, Andres Freund wrote: > > > > I suspect we should carve out things like CALL, PREPARE, EXECUTE from > > track_utility - it's more or less an architectural accident that they're > > utility statements. It's a bit less clear that SET should be dealt with that > > way. > > Regarding SET, the compelling use case was around "application_name" > whose purpose is to provide a label in pg_stat_activity and on log > lines, which can be used to improve observability and connect queries to > their source in application code. I wasn't saying that SET shouldn't be jumbled, just that it seems more reasonable to track it only when track_utility is enabled, rather than doing so even when that's disabled. Which I do think makes sense for executing a prepared statement and calling a procedure, since they're really only utility statements by accident. > Personally, at this point, I think pg_stat_statements is critical > infrastructure for anyone running PostgreSQL at scale. The information > it provides is indispensable. I don't think it's really defensible to > tell people that if they want to scale, then they need to fly blind on > any utility statements. I wasn't suggesting doing so at all. Greetings, Andres Freund
Regarding SET, the compelling use case was around "application_name" whose purpose is to provide a label in pg_stat_activity and on log lines, which can be used to improve observability and connect queries to their source in application code.I wasn't saying that SET shouldn't be jumbled, just that it seems more reasonable to track it only when track_utility is enabled, rather than doing so even when that's disabled. Which I do think makes sense for executing a prepared statement and calling a procedure, since they're really only utility statements by accident.
Hey Andres, sorry for misunderstanding your email!
Based on this quick test I just now ran (transcript below), I think that PREPARE/EXECUTE is already excluded from track_utility?
I get your point about CALL, maybe it does make sense to also exclude this. It might also be worth a small update to the doc for track_utility about how it behaves, in this regard.
https://www.postgresql.org/docs/14/pgstatstatements.html#id-1.11.7.39.9
Example updated sentence:
>
pg_stat_statements.track_utility
controls whether <<most>> utility commands are tracked by the module. Utility commands are all those other than SELECT
, INSERT
, UPDATE
and DELETE
<<, but this parameter does not disable tracking of PREPARE, EXECUTE or CALL>>. The default value is on
. Only superusers can change this setting.=====
pg-14.4 rw root@db1=# set pg_stat_statements.track_utility=on;
SET
pg-14.4 rw root@db1=# select pg_stat_statements_reset();
pg_stat_statements_reset
--------------------------
(1 row)
pg-14.4 rw root@db1=# prepare test as select /* unique123 */ 1;
PREPARE
pg-14.4 rw root@db1=# execute test;
?column?
----------
1
(1 row)
pg-14.4 rw root@db1=# set application_name='test';
SET
pg-14.4 rw root@db1=# select substr(query,1,50) from pg_stat_statements;
substr
-------------------------------------------
prepare test as select /* unique123 */ $1
select pg_stat_statements_reset()
set application_name=$1
(3 rows)
=====
pg-14.4 rw root@db1=# set pg_stat_statements.track_utility=off;
SET
pg-14.4 rw root@db1=# select pg_stat_statements_reset();
pg_stat_statements_reset
--------------------------
(1 row)
pg-14.4 rw root@db1=# prepare test as select /* unique123 */ 1;
PREPARE
pg-14.4 rw root@db1=# execute test;
?column?
----------
1
(1 row)
pg-14.4 rw root@db1=# set application_name='test';
SET
pg-14.4 rw root@db1=# select substr(query,1,50) from pg_stat_statements;
substr
-------------------------------------------
prepare test as select /* unique123 */ $1
select pg_stat_statements_reset()
(2 rows)
-- Jeremy Schneider Database Engineer Amazon Web Services
Hi,
Hi, On 2022-08-31 17:33:44 +0200, Drouvot, Bertrand wrote:@@ -383,6 +384,30 @@ JumbleExpr(JumbleState *jstate, Node *node) APP_JUMB(var->varlevelsup); } break; + case T_CallStmt: + { + CallStmt *stmt = (CallStmt *) node; + FuncExpr *expr = stmt->funcexpr; + + APP_JUMB(expr->funcid); + JumbleExpr(jstate, (Node *) expr->args); + } + break;Why do we need to take the arguments into account?
Thanks for looking at it!
Agree that It's not needed to "solve" the Lock contention issue, but I think it's needed for the "render".
Without it we would get, things like:
postgres=# call MY_PROC(10);
CALL
postgres=# call MY_PROC(100000000);
CALL
postgres=# SELECT query, calls, rows FROM pg_stat_statements;
query | calls | rows
-----------------------------------+-------+------
select pg_stat_statements_reset() | 1 | 1
call MY_PROC(10) | 2 | 0
(2 rows)
instead of
postgres=# SELECT query, calls, rows FROM pg_stat_statements;
query | calls | rows
-----------------------------------+-------+------
select pg_stat_statements_reset() | 1 | 1
call MY_PROC($1) | 2 | 0
(2 rows)
+ case T_VariableSetStmt: + { + VariableSetStmt *stmt = (VariableSetStmt *) node; + + APP_JUMB_STRING(stmt->name); + JumbleExpr(jstate, (Node *) stmt->args); + } + break;Same?
yeah, same reason. Without it we would get things like:
postgres=# set enable_seqscan=false;
SET
postgres=# set enable_seqscan=true;
SET
postgres=# SELECT query, calls, rows FROM pg_stat_statements;
query | calls | rows
-----------------------------------+-------+------
select pg_stat_statements_reset() | 1 | 1
set enable_seqscan=false | 2 | 0
(2 rows)
instead of
postgres=# SELECT query, calls, rows FROM pg_stat_statements;
query | calls | rows
-----------------------------------+-------+------
set enable_seqscan=$1 | 2 | 0
select pg_stat_statements_reset() | 1 | 1
(2 rows)
+ case T_A_Const: + { + int loc = ((const A_Const *) node)->location; + + RecordConstLocation(jstate, loc); + } + break;I suspect we only need this because of the jumbling of unparsed arguments I questioned above?
Right but only for the T_VariableSetStmt case.
If we do end up needing it, shouldn't we include the type in the jumbling?
I don't think so as this is only for the T_VariableSetStmt case.
And looking closer I don't see such as thing as "consttype" (that we can find in the Const struct) in the A_Const struct.
Regards,
-- Bertrand Drouvot Amazon Web Services: https://aws.amazon.com
Hi,
On 8/31/22 12:06 PM, Andres Freund wrote:Regarding SET, the compelling use case was around "application_name" whose purpose is to provide a label in pg_stat_activity and on log lines, which can be used to improve observability and connect queries to their source in application code.I wasn't saying that SET shouldn't be jumbled, just that it seems more reasonable to track it only when track_utility is enabled, rather than doing so even when that's disabled. Which I do think makes sense for executing a prepared statement and calling a procedure, since they're really only utility statements by accident.
I get your point about CALL, maybe it does make sense to also exclude this.
That's a good point and i think we should track CALL whatever the value of pgss_track_utility is.
I think so because we are tracking function calls in all the cases (because "linked" to select aka not a utility) and i don't see any reasons why not to do the same for procedure calls.
Please find attached v2 as an attempt to do so.
With v2 we get things like:
postgres=# set pg_stat_statements.track_utility=on;
SET
postgres=# call MY_PROC(20);
CALL
postgres=# call MY_PROC(10);
CALL
postgres=# set enable_seqscan=false;
SET
postgres=# set enable_seqscan=true;
SET
postgres=# select queryid,query,calls from pg_stat_statements;
queryid | query | calls
---------------------+-----------------------------------------+-------
4670878543381973400 | set pg_stat_statements.track_utility=$1 | 1
-640317129591544054 | set enable_seqscan=$1 | 2
492647827690744963 | select pg_stat_statements_reset() | 1
6541399678435597534 | call MY_PROC($1) | 2
and
postgres=# set pg_stat_statements.track_utility=off;
SET
postgres=# call MY_PROC(10);
CALL
postgres=# call MY_PROC(20);
CALL
postgres=# set enable_seqscan=true;
SET
postgres=# set enable_seqscan=false;
SET
postgres=# select queryid,query,calls from pg_stat_statements;
queryid | query | calls
---------------------+-----------------------------------------+-------
4670878543381973400 | set pg_stat_statements.track_utility=$1 | 1
492647827690744963 | select pg_stat_statements_reset() | 1
6541399678435597534 | call MY_PROC($1) | 2
(3 rows)
It might also be worth a small update to the doc for track_utility about how it behaves, in this regard.
https://www.postgresql.org/docs/14/pgstatstatements.html#id-1.11.7.39.9
Example updated sentence:
>pg_stat_statements.track_utility
controls whether <<most>> utility commands are tracked by the module. Utility commands are all those other thanSELECT
,INSERT
,UPDATE
andDELETE
<<, but this parameter does not disable tracking of PREPARE, EXECUTE or CALL>>. The default value ison
. Only superusers can change this setting.
Agree, wording added to v2.
Regards,
-- Bertrand Drouvot Amazon Web Services: https://aws.amazon.com
Attachment
> Please find attached v2 as an attempt to do so.
+1 to the idea.
I think it will be better to evaluate jstate instead of
JUMBLE_UTILITY, such as:
if (query->utilityStmt && !jstate)
instead of
if (query->utilityStmt && !JUMBLE_UTILITY(query->utilityStmt))
This will allow for support of potentially other utility statements
In the future without having to teach pg_stat_statements about them.
If a jstate is set for the utility statements, pgss will do the right thing.
Thanks
--
Sami Imseih
Amazon Web Services (AWS)
Hi,
@font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;}@font-face {font-family:Consolas; panose-1:2 11 6 9 2 2 4 3 2 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0in; font-size:11.0pt; font-family:"Calibri",sans-serif;}pre {mso-style-priority:99; mso-style-link:"HTML Preformatted Char"; margin:0in; font-size:10.0pt; font-family:"Courier New";}span.HTMLPreformattedChar {mso-style-name:"HTML Preformatted Char"; mso-style-priority:99; mso-style-link:"HTML Preformatted"; font-family:Consolas;}span.EmailStyle22 {mso-style-type:personal-reply; font-family:"Calibri",sans-serif; color:windowtext;}.MsoChpDefault {mso-style-type:export-only; font-size:10.0pt;}div.WordSection1 {page:WordSection1;} > Please find attached v2 as an attempt to do so.
+1 to the idea.
Thanks for looking at it!
I think it will be better to evaluate jstate instead of
JUMBLE_UTILITY, such as:
if (query->utilityStmt && !jstate)
instead of
if (query->utilityStmt && !JUMBLE_UTILITY(query->utilityStmt))
This will allow for support of potentially other utility statements
In the future without having to teach pg_stat_statements about them.
If a jstate is set for the utility statements, pgss will do the right thing.
Fair point, thanks!
v3 including this change is attached.
Thanks,
--
Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
This is ready for committer but I suggest the following for the
doc changes:
1.
Plannable queries (that is, SELECT, INSERT, UPDATE, and DELETE) are
combined into a single pg_stat_statements entry whenever they have
identical query structures according to an internal hash calculation.
Typically, two queries will be considered the same for this purpose
if they are semantically equivalent except for the values of literal
constants appearing in the query. Utility commands (that is, all other commands)
are compared strictly on the basis of their textual query strings, however.
-- to --
Plannable queries (that is, SELECT, INSERT, UPDATE, and DELETE) as
well as CALL and SET commands are combined into a single
pg_stat_statements entry whenever they have identical query
structures according to an internal hash calculation.
Typically, two queries will be considered the same for this purpose
if they are semantically equivalent except for the values of literal
constants appearing in the command. All other commands are compared
strictly on the basis of their textual query strings, however.
2.
pg_stat_statements.track_utility controls whether utility
commands are tracked by the module. Utility commands
are all those other than SELECT, INSERT, UPDATE and DELETE.
The default value is on. Only superusers can change this setting.
-- to --
pg_stat_statements.track_utility controls whether utility commands
are tracked by the module. Tracked utility commands are all those
other than SELECT, INSERT, UPDATE, DELETE, CALL and SET.
The default value is on. Only superusers can change this setting.
--
Thanks,
Sami Imseih
Amazon Web Services (AWS)
From: "Drouvot, Bertrand" <bdrouvot@amazon.com>
Date: Friday, September 2, 2022 at 4:06 AM
To: "Imseih (AWS), Sami" <simseih@amazon.com>, "Schneider (AWS), Jeremy" <schnjere@amazon.com>, Andres Freund <andres@anarazel.de>
Cc: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>, Peter Eisentraut <peter.eisentraut@enterprisedb.com>, Pavel Stehule <pavel.stehule@gmail.com>, Nikolay Samokhvalov <samokhvalov@gmail.com>
Subject: Re: [PATCH] Query Jumbling for CALL and SET utility statements
Hi,
On 9/1/22 5:13 PM, Imseih (AWS), Sami wrote:
> Please find attached v2 as an attempt to do so.
+1 to the idea.
Thanks for looking at it!
I think it will be better to evaluate jstate instead of
JUMBLE_UTILITY, such as:
if (query->utilityStmt && !jstate)
instead of
if (query->utilityStmt && !JUMBLE_UTILITY(query->utilityStmt))
This will allow for support of potentially other utility statements
In the future without having to teach pg_stat_statements about them.
If a jstate is set for the utility statements, pgss will do the right thing.
Fair point, thanks!
v3 including this change is attached.
Thanks,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
v3 including this change is attached.
FYI: "reset all" core dumps with v3
I didn't fully debug yet, but here's the backtrace on my 14.4 build with the patch
[postgres@ip-172-31-44-176 data]$ gdb /usr/local/pgsql-14.4/bin/postgres core.27217
...
Core was generated by `postgres: postgres postgres [local] RESET '.
Program terminated with signal 11, Segmentation fault.
#0 0x00007f7776ae4821 in __strlen_sse2_pminub () from /lib64/libc.so.6
...
(gdb) bt
#0 0x00007f7776ae4821 in __strlen_sse2_pminub () from /lib64/libc.so.6
#1 0x00000000008e061c in JumbleExpr (jstate=0x1cf7f80, node=<optimized out>) at queryjumble.c:400
#2 0x00000000008dfdd8 in JumbleQueryInternal (jstate=0x1cf7f80, query=0x1cf7e70) at queryjumble.c:247
#3 0x00000000008e0b4b in JumbleQuery (query=query@entry=0x1cf7e70, querytext=querytext@entry=0x1cf72f8 "reset all;") at queryjumble.c:127
#4 0x000000000056ba4b in parse_analyze (parseTree=0x1cf7ce0, sourceText=0x1cf72f8 "reset all;", paramTypes=0x0, numParams=<optimized out>, queryEnv=0x0) at analyze.c:130
#5 0x000000000079df63 in pg_analyze_and_rewrite (parsetree=parsetree@entry=0x1cf7ce0, query_string=query_string@entry=0x1cf72f8 "reset all;",
paramTypes=paramTypes@entry=0x0, numParams=numParams@entry=0, queryEnv=queryEnv@entry=0x0) at postgres.c:657
#6 0x000000000079e472 in exec_simple_query (query_string=0x1cf72f8 "reset all;") at postgres.c:1130
#7 0x000000000079f9d3 in PostgresMain (argc=argc@entry=1, argv=argv@entry=0x7ffd0c341f80, dbname=0x1d44948 "postgres", username=<optimized out>) at postgres.c:4496
#8 0x000000000048c9f3 in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4530
#9 BackendStartup (port=0x1d3bdd0) at postmaster.c:4252
#10 ServerLoop () at postmaster.c:1745
#11 0x0000000000721332 in PostmasterMain (argc=argc@entry=5, argv=argv@entry=0x1cf1e10) at postmaster.c:1417
#12 0x000000000048da6e in main (argc=5, argv=0x1cf1e10) at main.c:209
-- Jeremy Schneider Database Engineer Amazon Web Services
On Wed, Sep 07, 2022 at 06:19:42PM -0700, Jeremy Schneider wrote: > I didn't fully debug yet, but here's the backtrace on my 14.4 build with > the patch What happens on HEAD? That would be the target branch for a new feature. -- Michael
Attachment
Hi,
On Wed, Sep 07, 2022 at 06:19:42PM -0700, Jeremy Schneider wrote:I didn't fully debug yet, but here's the backtrace on my 14.4 build with the patch
Thanks Jeremy for reporting the issue!
What happens on HEAD? That would be the target branch for a new feature.
Just tested and i can see the same issue on HEAD.
Issue is on stmt->name being NULL here:
Breakpoint 2, JumbleExpr (jstate=0x55d60e769e30, node=0x55d60e769b60) at queryjumble.c:364
364 if (node == NULL)
(gdb) n
368 check_stack_depth();
(gdb)
374 APP_JUMB(node->type);
(gdb)
376 switch (nodeTag(node))
(gdb)
398 VariableSetStmt *stmt = (VariableSetStmt *) node;
(gdb) n
400 APP_JUMB_STRING(stmt->name);
(gdb) p stmt->name
$1 = 0x0
I'll have a closer look.
Regards,
--
Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Thu, Sep 08, 2022 at 02:23:19PM +0900, Michael Paquier wrote: > On Wed, Sep 07, 2022 at 06:19:42PM -0700, Jeremy Schneider wrote: > > I didn't fully debug yet, but here's the backtrace on my 14.4 build with > > the patch > > What happens on HEAD? That would be the target branch for a new > feature. It would be the same AFAICS. From v3: + case T_VariableSetStmt: + { + VariableSetStmt *stmt = (VariableSetStmt *) node; + + APP_JUMB_STRING(stmt->name); + JumbleExpr(jstate, (Node *) stmt->args); + } For a RESET ALL command stmt->name is NULL.
Hi,
On Thu, Sep 08, 2022 at 02:23:19PM +0900, Michael Paquier wrote:On Wed, Sep 07, 2022 at 06:19:42PM -0700, Jeremy Schneider wrote:I didn't fully debug yet, but here's the backtrace on my 14.4 build with the patchWhat happens on HEAD? That would be the target branch for a new feature.It would be the same AFAICS. From v3: + case T_VariableSetStmt: + { + VariableSetStmt *stmt = (VariableSetStmt *) node; + + APP_JUMB_STRING(stmt->name); + JumbleExpr(jstate, (Node *) stmt->args); + } For a RESET ALL command stmt->name is NULL.
Right, please find attached v4 addressing the issue and also Sami's comments [1].
[1]: https://www.postgresql.org/message-id/82A35172-BEB3-4DFA-B11C-AE5E50A0F932%40amazon.com
Regards,
-- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Hi, On Thu, Sep 08, 2022 at 11:06:51AM +0200, Drouvot, Bertrand wrote: > Hi, > > On 9/8/22 8:50 AM, Julien Rouhaud wrote: > > Thanks for looking at it! > > > On Thu, Sep 08, 2022 at 02:23:19PM +0900, Michael Paquier wrote: > > > On Wed, Sep 07, 2022 at 06:19:42PM -0700, Jeremy Schneider wrote: > > > > I didn't fully debug yet, but here's the backtrace on my 14.4 build with > > > > the patch > > > What happens on HEAD? That would be the target branch for a new > > > feature. > > It would be the same AFAICS. From v3: > > > > + case T_VariableSetStmt: > > + { > > + VariableSetStmt *stmt = (VariableSetStmt *) node; > > + > > + APP_JUMB_STRING(stmt->name); > > + JumbleExpr(jstate, (Node *) stmt->args); > > + } > > > > For a RESET ALL command stmt->name is NULL. > > Right, please find attached v4 addressing the issue and also Sami's comments > [1]. (Sorry I've not been following this thread until now) IME if your application relies on 2PC it's very likely that you will hit the exact same problems described in your original email. What do you think about normalizing those too while working on the subject?
Hi,
Hi, On Thu, Sep 08, 2022 at 11:06:51AM +0200, Drouvot, Bertrand wrote:Hi, On 9/8/22 8:50 AM, Julien Rouhaud wrote: Thanks for looking at it!On Thu, Sep 08, 2022 at 02:23:19PM +0900, Michael Paquier wrote:On Wed, Sep 07, 2022 at 06:19:42PM -0700, Jeremy Schneider wrote:I didn't fully debug yet, but here's the backtrace on my 14.4 build with the patchWhat happens on HEAD? That would be the target branch for a new feature.It would be the same AFAICS. From v3: + case T_VariableSetStmt: + { + VariableSetStmt *stmt = (VariableSetStmt *) node; + + APP_JUMB_STRING(stmt->name); + JumbleExpr(jstate, (Node *) stmt->args); + } For a RESET ALL command stmt->name is NULL.Right, please find attached v4 addressing the issue and also Sami's comments [1].(Sorry I've not been following this thread until now) IME if your application relies on 2PC it's very likely that you will hit the exact same problems described in your original email.
Agree
What do you think about normalizing those too while working on the subject?
That sounds reasonable, I'll have a look at those too while at it.
Regards,
-- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi,
On 9/8/22 1:29 PM, Julien Rouhaud wrote:IME if your application relies on 2PC it's very likely that you will hit the exact same problems described in your original email.Agree
What do you think about normalizing those too while working on the subject?That sounds reasonable, I'll have a look at those too while at it.
Attached v5 to normalize 2PC commands too, so that we get things like:
create table test_tx (a int);
begin;
prepare transaction 'tx1';
insert into test_tx values (1);
commit prepared 'tx1';
begin;
prepare transaction 'tx2';
insert into test_tx values (2);
commit prepared 'tx2';
begin;
prepare transaction 'tx3';
insert into test_tx values (3);
rollback prepared 'tx3';
begin;
prepare transaction 'tx4';
insert into test_tx values (4);
rollback prepared 'tx4';
SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls | rows
------------------------------------------------------------------------------+-------+------
SELECT pg_stat_statements_reset() | 1 | 1
SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C" | 0 | 0
begin | 4 | 0
commit prepared $1 | 2 | 0
create table test_tx (a int) | 1 | 0
insert into test_tx values ($1) | 4 | 4
prepare transaction $1 | 4 | 0
rollback prepared $1 | 2 | 0
(8 rows)
For those ones I also had to do some minor changes in gram.y and to the TransactionStmt struct to record the gid location.
Regards,
-- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
> Attached v5 to normalize 2PC commands too, so that we get things like: > > > create table test_tx (a int); > begin; > prepare transaction 'tx1'; > insert into test_tx values (1); > commit prepared 'tx1'; > begin; > prepare transaction 'tx2'; > insert into test_tx values (2); > commit prepared 'tx2'; > begin; > prepare transaction 'tx3'; > insert into test_tx values (3); > rollback prepared 'tx3'; > begin; > prepare transaction 'tx4'; > insert into test_tx values (4); > rollback prepared 'tx4'; > SELECT query, calls, rows FROM pg_stat_statements ORDER BY query > COLLATE "C"; > query > | calls | rows > ------------------------------------------------------------------------------+-------+------ > SELECT pg_stat_statements_reset() > | 1 | 1 > SELECT query, calls, rows FROM pg_stat_statements ORDER BY query > COLLATE "C" | 0 | 0 > begin > | 4 | 0 > commit prepared $1 > | 2 | 0 > create table test_tx (a int) > | 1 | 0 > insert into test_tx values ($1) > | 4 | 4 > prepare transaction $1 > | 4 | 0 > rollback prepared $1 > | 2 | 0 > (8 rows) > > For those ones I also had to do some minor changes in gram.y and to > the TransactionStmt struct to record the gid location. Thanks Bertrand. I used your patch. It's looks very good. I found that utility statement is counted separately in upper and lower case. For example BEGIN and begin are counted separately. Is it difficult to fix this problem? Regards, Kotaro Kawamoto
Hi, On Tue, Sep 13, 2022 at 11:43:52AM +0900, bt22kawamotok wrote: > > I found that utility statement is counted separately in upper and lower > case. > For example BEGIN and begin are counted separately. > Is it difficult to fix this problem? This is a known behavior, utility command aren't normalized (apart from the few that will be with this patch) and the queryid is just a hash of the provided query string. It seems unrelated to this patch though. While it can be a bit annoying, it's unlikely that the application will have thousands of way to ask for a new transaction (mixing case, adding a random number of spaces between BEGIN and TRANSACTION and so on), so in real life it won't cause any problem. Fixing it would require to fully jumble all utility statements, which would require a separate discussion.
Hi, On 9/13/22 6:33 AM, Julien Rouhaud wrote: > Hi, > > On Tue, Sep 13, 2022 at 11:43:52AM +0900, bt22kawamotok wrote: >> I found that utility statement is counted separately in upper and lower >> case. >> For example BEGIN and begin are counted separately. >> Is it difficult to fix this problem? > This is a known behavior, utility command aren't normalized (apart from the few > that will be with this patch) and the queryid is just a hash of the provided > query string. > > It seems unrelated to this patch though. While it can be a bit annoying, it's > unlikely that the application will have thousands of way to ask for a new > transaction (mixing case, adding a random number of spaces between BEGIN and > TRANSACTION and so on), so in real life it won't cause any problem. Agree that it seems unlikely to cause any problem (as compare to the utility statements that are handled in this patch). > Fixing it > would require to fully jumble all utility statements, which would require a > separate discussion. Agree. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com Amazon Web Services EMEA SARL, 38 avenue John F. Kennedy, L-1855 Luxembourg, R.C.S. Luxembourg B186284 Amazon Web Services EMEA SARL, succursale francaise, 31 Place des Corolles, Tour Carpe Diem, F-92400 Courbevoie, SIREN 831001 334, RCS Nanterre, APE 6311Z, TVA FR30831001334
> Attached v5 to normalize 2PC commands too, so that we get things like:
A nit on the documentation for v5, otherwise lgtm.
Plannable queries (that is, SELECT, INSERT, UPDATE, and DELETE) as well as CALL, SET
and two-phase commit commands PREPARE TRANSACTION, , COMMIT PREPARED
and ROLLBACK PREPARED are combined
---- to ----
Plannable queries (that is, SELECT, INSERT, UPDATE, and DELETE) as well as CALL,
SET, PREPARE TRANSACTION, COMMIT PREPARED and ROLLBACK PREPARED are combined
---
Regards,
Sami Imseih
Amazon Web Services (AWS)
On 2022/09/09 19:11, Drouvot, Bertrand wrote: >>> IME if your application relies on 2PC it's very likely that you will hit the >>> exact same problems described in your original email. The utility commands for cursor like DECLARE CURSOR seem to have the same issue and can cause lots of pgss entries. For example, when we use postgres_fdw and execute "SELECT * FROM <foreign table> WHERE id = 10" five times in the same transaction, the following commands are executed in the remote PostgreSQL server and recorded as pgss entries there. DECLARE c1 CURSOR FOR ... DECLARE c2 CURSOR FOR ... DECLARE c3 CURSOR FOR ... DECLARE c4 CURSOR FOR ... DECLARE c5 CURSOR FOR ... FETCH 100 FROM c1 FETCH 100 FROM c2 FETCH 100 FROM c3 FETCH 100 FROM c4 FETCH 100 FROM c5 CLOSE c1 CLOSE c2 CLOSE c3 CLOSE c4 CLOSE c5 Furthermore, if the different query on foreign table is executed in the next transaction, it may reuse the same cursor name previously used by another query. That is, different queries can cause the same FETCH command like "FETCH 100 FROM c1". This would be also an issue. I'm not sure if the patch should also handle cursor cases. We can implement that separately later if necessary. I don't think that the patch should include the fix for cursor cases. It can be implemented separately later if necessary. > Attached v5 to normalize 2PC commands too, so that we get things like: + case T_VariableSetStmt: + { + VariableSetStmt *stmt = (VariableSetStmt *) node; + + /* stmt->name is NULL for RESET ALL */ + if (stmt->name) + { + APP_JUMB_STRING(stmt->name); + JumbleExpr(jstate, (Node *) stmt->args); With the patch, "SET ... TO DEFAULT" and "RESET ..." are counted as the same query. Is this intentional? Which might be ok because their behavior is basically the same. But I'm afaid which may cause users to be confused. For example, they may fail to find the pgss entry for RESET command they ran and just wonder why the command was not recorded. To avoid such confusion, how about appending stmt->kind to the jumble? Thought? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
> The utility commands for cursor like DECLARE CURSOR seem to have the same issue > and can cause lots of pgss entries. For example, when we use postgres_fdw and > execute "SELECT * FROM <foreign table> WHERE id = 10" five times in the same > transaction, the following commands are executed in the remote PostgreSQL server > and recorded as pgss entries there. > DECLARE c1 CURSOR FOR ... > DECLARE c2 CURSOR FOR ... > DECLARE c3 CURSOR FOR ... +1 I also made this observation recently and have a patch to suggest to improve tis situation. I will start a separate thread for this. Regards, -- Sami Imseih Amazon Web Services (AWS)
Hi, On 9/16/22 2:53 PM, Fujii Masao wrote: > > >> Attached v5 to normalize 2PC commands too, so that we get things like: > > + case T_VariableSetStmt: > + { > + VariableSetStmt *stmt = (VariableSetStmt *) node; > + > + /* stmt->name is NULL for RESET ALL */ > + if (stmt->name) > + { > + APP_JUMB_STRING(stmt->name); > + JumbleExpr(jstate, (Node *) stmt->args); > > With the patch, "SET ... TO DEFAULT" and "RESET ..." are counted as the > same query. > Is this intentional? Thanks for looking at the patch! No, it is not intentional, good catch! > Which might be ok because their behavior is > basically the same. > But I'm afaid which may cause users to be confused. For example, they > may fail to > find the pgss entry for RESET command they ran and just wonder why the > command was > not recorded. To avoid such confusion, how about appending stmt->kind to > the jumble? > Thought? I think that's a good idea and will provide a new version taking care of it (and also Sami's comments up-thread). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On 9/16/22 5:47 PM, Drouvot, Bertrand wrote: > Hi, > > On 9/16/22 2:53 PM, Fujii Masao wrote: >> >> >>> Attached v5 to normalize 2PC commands too, so that we get things like: >> >> + case T_VariableSetStmt: >> + { >> + VariableSetStmt *stmt = (VariableSetStmt *) node; >> + >> + /* stmt->name is NULL for RESET ALL */ >> + if (stmt->name) >> + { >> + APP_JUMB_STRING(stmt->name); >> + JumbleExpr(jstate, (Node *) stmt->args); >> >> With the patch, "SET ... TO DEFAULT" and "RESET ..." are counted as >> the same query. >> Is this intentional? > > Thanks for looking at the patch! > No, it is not intentional, good catch! > >> Which might be ok because their behavior is basically the same. >> But I'm afaid which may cause users to be confused. For example, they >> may fail to >> find the pgss entry for RESET command they ran and just wonder why the >> command was >> not recorded. To avoid such confusion, how about appending stmt->kind >> to the jumble? >> Thought? > > I think that's a good idea and will provide a new version taking care of > it (and also Sami's comments up-thread). Please find attached v6 taking care of the remarks mentioned above. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On 2022/09/19 15:29, Drouvot, Bertrand wrote: > Please find attached v6 taking care of the remarks mentioned above. Thanks for updating the patch! +SET pg_stat_statements.track_utility = TRUE; + +-- PL/pgSQL procedure and pg_stat_statements.track = all +-- we drop and recreate the procedures to avoid any caching funnies +SET pg_stat_statements.track_utility = FALSE; Could you tell me why track_utility is enabled just before it's disabled? Could you tell me what actually happens if we don't drop and recreate the procedures? I'd like to know what "any caching funnies" are. +SELECT pg_stat_statements_reset(); +CALL MINUS_TWO(3); +CALL MINUS_TWO(7); +CALL SUM_TWO(3, 8); +CALL SUM_TWO(7, 5); + +SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; This test set for the procedures is executed with the following four conditions, respectively. Do we really need all of these tests? track = top, track_utility = true track = top, track_utility = false track = all, track_utility = true track = all, track_utility = false +begin; +prepare transaction 'tx1'; +insert into test_tx values (1); +commit prepared 'tx1'; The test set of 2PC commands is also executed with track_utility = on and off, respectively. But why do we need to run that test when track_utility = off? - if (query->utilityStmt) + if (query->utilityStmt && !jstate) { if (pgss_track_utility && !PGSS_HANDLED_UTILITY(query->utilityStmt)) "pgss_track_utility" should be "pgss_track_utility || FORCE_TRACK_UTILITY(parsetree)" theoretically? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Hi, On 9/21/22 6:07 PM, Fujii Masao wrote: > > > On 2022/09/19 15:29, Drouvot, Bertrand wrote: >> Please find attached v6 taking care of the remarks mentioned above. > > Thanks for updating the patch! > > +SET pg_stat_statements.track_utility = TRUE; > + > +-- PL/pgSQL procedure and pg_stat_statements.track = all > +-- we drop and recreate the procedures to avoid any caching funnies > +SET pg_stat_statements.track_utility = FALSE; > > Could you tell me why track_utility is enabled just before it's disabled? Thanks for looking at the new version! No real reason, I removed those useless SET in the new V7 attached. > > Could you tell me what actually happens if we don't drop and > recreate the procedures? I'd like to know what "any caching funnies" are. Without the drop/recreate the procedure body does not appear normalized (while the CALL itself is) when switching from track = top to track = all. I copy-pasted this comment from the already existing "function" section in the pg_stat_statements.sql file. This comment has been introduced for the function section in commit 83f2061dd0. Note that the behavior would be the same for the function case (function body does not appear normalized without the drop/recreate). > > +SELECT pg_stat_statements_reset(); > +CALL MINUS_TWO(3); > +CALL MINUS_TWO(7); > +CALL SUM_TWO(3, 8); > +CALL SUM_TWO(7, 5); > + > +SELECT query, calls, rows FROM pg_stat_statements ORDER BY query > COLLATE "C"; > > This test set for the procedures is executed with the following > four conditions, respectively. Do we really need all of these tests? > > track = top, track_utility = true > track = top, track_utility = false > track = all, track_utility = true > track = all, track_utility = false Oh right, the track_utility = false cases have been added when we decided up-thread to force track CALL. But now that's probably not needed to test with track_utility = true. So I'm just keeping track_utility = off with track = top or all in the new V7 attached (like this is the case for the "function" section). > > +begin; > +prepare transaction 'tx1'; > +insert into test_tx values (1); > +commit prepared 'tx1'; > > The test set of 2PC commands is also executed with track_utility = on > and off, respectively. But why do we need to run that test when > track_utility = off? That's useless, thanks for pointing out. Removed in V7 attached. > > - if (query->utilityStmt) > + if (query->utilityStmt && !jstate) > { > if (pgss_track_utility && > !PGSS_HANDLED_UTILITY(query->utilityStmt)) > > "pgss_track_utility" should be > "pgss_track_utility || FORCE_TRACK_UTILITY(parsetree)" theoretically? Good catch! That's not needed (in practice) with the current code but that is "theoretically" needed indeed, let's add it in V7 attached (that's safer should the code change later on). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On Mon, Sep 19, 2022 at 08:29:22AM +0200, Drouvot, Bertrand wrote: > Please find attached v6 taking care of the remarks mentioned above. > + case T_VariableSetStmt: > + { > + VariableSetStmt *stmt = (VariableSetStmt *) node; > + > + /* stmt->name is NULL for RESET ALL */ > + if (stmt->name) > + { > + APP_JUMB(stmt->kind); > + APP_JUMB_STRING(stmt->name); > + JumbleExpr(jstate, (Node *) stmt->args); > + } > + } > + break; Hmm. If VariableSetStmt->is_local is not added to the jumble, then aren't "SET foo = $1" and "SET LOCAL foo = $1" counted as the same query? I am not seeing SAVEPOINT, RELEASE, ROLLBACK .. TO SAVEPOINT mentioned on this thread. Would these be worth considering in what gets compiled? That would cover the remaining bits of TransactionStmt. The ODBC driver abuses of savepoints, for example, so this could be useful for monitoring purposes in such cases. As of the code stands, it could be cleaner to check IsJumbleUtilityAllowed() in compute_utility_query_id(), falling back to a default in JumbleQuery(). Not that what your patch does is incorrect, of course. -- Michael
Attachment
Hi, On 9/26/22 12:40 PM, Drouvot, Bertrand wrote: > let's add it in V7 attached > (that's safer should the code change later on). Attached a tiny rebase needed due to 249b0409b1. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Hi, On 10/6/22 8:39 AM, Michael Paquier wrote: > On Mon, Sep 19, 2022 at 08:29:22AM +0200, Drouvot, Bertrand wrote: >> Please find attached v6 taking care of the remarks mentioned above. >> + case T_VariableSetStmt: >> + { >> + VariableSetStmt *stmt = (VariableSetStmt *) node; >> + >> + /* stmt->name is NULL for RESET ALL */ >> + if (stmt->name) >> + { >> + APP_JUMB(stmt->kind); >> + APP_JUMB_STRING(stmt->name); >> + JumbleExpr(jstate, (Node *) stmt->args); >> + } >> + } >> + break; > > Hmm. If VariableSetStmt->is_local is not added to the jumble, then > aren't "SET foo = $1" and "SET LOCAL foo = $1" counted as the same > query? > Good catch, thanks! While at it let's also jumble "SET SESSION foo =". For this one, we would need to record another bool in VariableSetStmt: I'll create a dedicated patch for that. > I am not seeing SAVEPOINT, RELEASE, ROLLBACK .. TO SAVEPOINT > mentioned on this thread. Would these be worth considering in what > gets compiled? That would cover the remaining bits of > TransactionStmt. The ODBC driver abuses of savepoints, for example, > so this could be useful for monitoring purposes in such cases. Agree. I'll look at those too. > > As of the code stands, it could be cleaner to check > IsJumbleUtilityAllowed() in compute_utility_query_id(), falling back > to a default in JumbleQuery(). Not that what your patch does is > incorrect, of course. Will look at it. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Thu, Oct 06, 2022 at 10:43:57AM +0200, Drouvot, Bertrand wrote: > On 10/6/22 8:39 AM, Michael Paquier wrote: >> I am not seeing SAVEPOINT, RELEASE, ROLLBACK .. TO SAVEPOINT >> mentioned on this thread. Would these be worth considering in what >> gets compiled? That would cover the remaining bits of >> TransactionStmt. The ODBC driver abuses of savepoints, for example, >> so this could be useful for monitoring purposes in such cases. > > Agree. I'll look at those too. Thanks. While studying a bit more this thread, I've been reminded of the fact that this would treat different flavors of BEGIN/COMMIT commands (mix of upper/lower characters, etc.) as different entries in pg_stat_statements, and it feels inconsistent to me that we'd begin jumbling the 2PC and savepoint commands with their nodes but not do that for the rest of the commands, even if, as mentioned upthread, applications may not mix grammars. If they do, one could finish by viewing incorrect reports, and I'd like to think that this would make the life of a lot of people easier. SET/RESET and CALL have a much lower presence frequency than the transaction commands, where it is fine by me to include both of these under the utility statement switch. For OLTP workloads (I've seen quite a bit of 2PC used across multiple nodes for short transactions with writes involving more than two remote nodes), with a lot of BEGIN/COMMIT or even 2PC commands issued, the performance could be noticeable? It may make sense to control these with a different GUC switch, where we drop completely the string-only approach under track_utility. In short, I don't have any objections about the business with SET and CALL, but the transaction part worries me a bit. As a first step, we could cut the cake in two parts, and just focus on SET/RESET and CALL, which was the main point of discussion of this thread to begin with. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > While studying a bit more this thread, I've been reminded of the fact > that this would treat different flavors of BEGIN/COMMIT commands (mix > of upper/lower characters, etc.) as different entries in > pg_stat_statements, and it feels inconsistent to me that we'd begin > jumbling the 2PC and savepoint commands with their nodes but not do > that for the rest of the commands, even if, as mentioned upthread, > applications may not mix grammars. I've been thinking since the beginning of this thread that there was no coherent, defensible rationale being offered for jumbling some utility statements and not others. I wonder if the answer is to jumble them all. We avoided that up to now because it would imply a ton of manual effort and future code maintenance ... but now that the backend/nodes/ infrastructure is largely auto-generated, could we auto-generate the jumbling code? regards, tom lane
On Thu, Oct 06, 2022 at 11:51:52PM -0400, Tom Lane wrote: > I've been thinking since the beginning of this thread that there > was no coherent, defensible rationale being offered for jumbling > some utility statements and not others. Yeah. The potential performance impact of all the TransactionStmts worries me a bit, though. > I wonder if the answer is to jumble them all. We avoided that > up to now because it would imply a ton of manual effort and > future code maintenance ... but now that the backend/nodes/ > infrastructure is largely auto-generated, could we auto-generate > the jumbling code? Probably. One part that may be tricky though is the location of the constants we'd like to make generic, but perhaps this could be handled by using a dedicated variable type that just maps to int? It does not seem like a mandatory requirement to add that everywhere as a first step, either. -- Michael
Attachment
On Thu, Oct 06, 2022 at 11:51:52PM -0400, Tom Lane wrote: > Michael Paquier <michael@paquier.xyz> writes: > > While studying a bit more this thread, I've been reminded of the fact > > that this would treat different flavors of BEGIN/COMMIT commands (mix > > of upper/lower characters, etc.) as different entries in > > pg_stat_statements, and it feels inconsistent to me that we'd begin > > jumbling the 2PC and savepoint commands with their nodes but not do > > that for the rest of the commands, even if, as mentioned upthread, > > applications may not mix grammars. > > I've been thinking since the beginning of this thread that there > was no coherent, defensible rationale being offered for jumbling > some utility statements and not others. Only a very small subset causes trouble in real life scenario, but I agree that cherry-picking some utility statements isn't a great approach. > I wonder if the answer is to jumble them all. We avoided that > up to now because it would imply a ton of manual effort and > future code maintenance ... but now that the backend/nodes/ > infrastructure is largely auto-generated, could we auto-generate > the jumbling code? That's a good idea. Naively, it seems doable as the infrastructure in gen_node_support.pl already supports everything that should be needed (like per-member annotation).
Hi, On 10/7/22 6:13 AM, Michael Paquier wrote: > On Thu, Oct 06, 2022 at 11:51:52PM -0400, Tom Lane wrote: >> I've been thinking since the beginning of this thread that there >> was no coherent, defensible rationale being offered for jumbling >> some utility statements and not others. > > Yeah. The potential performance impact of all the TransactionStmts > worries me a bit, though. > >> I wonder if the answer is to jumble them all. We avoided that >> up to now because it would imply a ton of manual effort and >> future code maintenance ... but now that the backend/nodes/ >> infrastructure is largely auto-generated, could we auto-generate >> the jumbling code? I think that's a good idea. > > Probably. One part that may be tricky though is the location of the > constants we'd like to make generic, but perhaps this could be handled > by using a dedicated variable type that just maps to int? It looks to me that we'd also need to teach the perl parser which fields per statements struct need to be jumbled (or more probably which ones to exclude from the jumbling, for example funccall in CallStmt). Not sure yet how to teach the perl parser, but I'll build this list first to help decide for a right approach, unless you already have some thoughts about it? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Mon, Oct 10, 2022 at 03:04:57PM +0200, Drouvot, Bertrand wrote: > > On 10/7/22 6:13 AM, Michael Paquier wrote: > > > > Probably. One part that may be tricky though is the location of the > > constants we'd like to make generic, but perhaps this could be handled > > by using a dedicated variable type that just maps to int? > > It looks to me that we'd also need to teach the perl parser which fields per > statements struct need to be jumbled (or more probably which ones to exclude > from the jumbling, for example funccall in CallStmt). Not sure yet how to > teach the perl parser, but I'll build this list first to help decide for a > right approach, unless you already have some thoughts about it? Unless I'm missing something both can be handled using pg_node_attr() annotations that already exists?
On Mon, Oct 10, 2022 at 09:16:47PM +0800, Julien Rouhaud wrote: > Unless I'm missing something both can be handled using pg_node_attr() > annotations that already exists? Indeed, that should work for the locators. -- Michael
Attachment
> I've been thinking since the beginning of this thread that there > was no coherent, defensible rationale being offered for jumbling > some utility statements and not others. +1 to the idea, as there are other utility statement cases that should be Jumbled. Here is a recent thread for jumbling cursors [1]. The CF entry [2] has been withdrawn until consensus is reached on this topic. [1]: https://www.postgresql.org/message-id/203CFCF7-176E-4AFC-A48E-B2CECFECD6AA@amazon.com [2]: https://commitfest.postgresql.org/40/3901/ Regards, Sami Imseih AWS (Amazon Web Services)
On Tue, Oct 11, 2022 at 02:18:54PM +0000, Imseih (AWS), Sami wrote: > +1 to the idea, as there are other utility statement cases > that should be Jumbled. Here is a recent thread for jumbling > cursors [1]. Thanks for mentioning that. With an automated way to generate this code, cursors would be handled, at the advantage of making sure that no fields are missing in the jumbled structures (is_local was missed for example on SET). > The CF entry [2] has been withdrawn until consensus is reached > on this topic. It seems to me that the consensus is here, which is a good step forward ;) -- Michael
Attachment
On Wed, Oct 12, 2022 at 09:13:20AM +0900, Michael Paquier wrote: > Thanks for mentioning that. With an automated way to generate this > code, cursors would be handled, at the advantage of making sure that > no fields are missing in the jumbled structures (is_local was missed > for example on SET). So, this thread has stalled for the last few weeks and it seems to me that the conclusion is that we'd want an approach using a set of scripts that automate the generation of the code in charge of the DDL jumbling. And, depending on the portions of the queries that need to be silenced, we may need to extend a few nodes with a location. We are not there yet, so I have marked the patch as returned with feedback. -- Michael
Attachment
On Tue, Oct 11, 2022 at 11:54:51AM +0900, Michael Paquier wrote: > On Mon, Oct 10, 2022 at 09:16:47PM +0800, Julien Rouhaud wrote: >> Unless I'm missing something both can be handled using pg_node_attr() >> annotations that already exists? > > Indeed, that should work for the locators. My mistake here. When it comes to the locations to silence in the normalization, we already rely on the variable name "location" of type int in gen_node_support.pl, so we can just do the same for the code generating the jumbling. This stuff still needs two more pg_node_attr() to be able to work: one to ignore a full Node in the jumbling and a second that can be used on a per-field basis. Once this is in place, automating the generation of the code is not that complicated, most of the work is to get around placing the pg_node_attr() so as the jumbling gets things right. The number of fields to mark as things to ignore depends on the Node type (heavy for Const, for example), but I'd like to think that a "ignore" approach is better than an "include" approach so as new fields would be considered by default in the jumble compilation. I have not looked at all the edge cases, so perhaps more attrs would be needed.. -- Michael