Thread: [PATCH] Query Jumbling for CALL and SET utility statements

[PATCH] Query Jumbling for CALL and SET utility statements

From

"Drouvot, Bertrand"

Date:

31 August 2022, 15:33:44

Hi hackers,

While query jumbling is provided for function calls that’s currently not the case for procedures calls.
The reason behind this is that all utility statements are currently discarded for jumbling.

We’ve recently seen performance impacts (LWLock contention) due to the lack of jumbling on procedure calls with pg_stat_statements and pg_stat_statements.track_utility enabled (think an application with a high rate of procedure calls with unique parameters for each call).

Jeremy has had this conversation on twitter (see https://twitter.com/jer_s/status/1560003560116342785) and Nikolay reported that he also had to work on a similar performance issue with SET being used.

That’s why we think it would make sense to allow jumbling for those 2 utility statements: CALL and SET.

Please find attached a patch proposal for doing so.

With the attached patch we would get things like:CALL MINUS_TWO(3);
CALL MINUS_TWO(7);
CALL SUM_TWO(3, 8);
CALL SUM_TWO(7, 5);
set enable_seqscan=false;
set enable_seqscan=true;
set seq_page_cost=2.0;
set seq_page_cost=1.0;

postgres=# SELECT query, calls, rows FROM pg_stat_statements;
               query               | calls | rows
-----------------------------------+-------+------
set seq_page_cost=$1              |     2 |    0
CALL MINUS_TWO($1)                |     2 |    0
set enable_seqscan=$1             |     2 |    0
CALL SUM_TWO($1, $2)              |     2 |    0

Looking forward to your feedback,

Thanks,

Jeremy & Bertrand

-- 
Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com

Attachment

v1-0001-JumbleQuery-on-Call-and-Set.patch

Re: [PATCH] Query Jumbling for CALL and SET utility statements

From

Pavel Stehule

Date:

31 August 2022, 15:50:42

st 31. 8. 2022 v 17:34 odesílatel Drouvot, Bertrand <bdrouvot@amazon.com> napsal:

Hi hackers,

While query jumbling is provided for function calls that’s currently not the case for procedures calls.
The reason behind this is that all utility statements are currently discarded for jumbling.

We’ve recently seen performance impacts (LWLock contention) due to the lack of jumbling on procedure calls with pg_stat_statements and pg_stat_statements.track_utility enabled (think an application with a high rate of procedure calls with unique parameters for each call).

Jeremy has had this conversation on twitter (see https://twitter.com/jer_s/status/1560003560116342785) and Nikolay reported that he also had to work on a similar performance issue with SET being used.

That’s why we think it would make sense to allow jumbling for those 2 utility statements: CALL and SET.

Please find attached a patch proposal for doing so.

With the attached patch we would get things like:CALL MINUS_TWO(3);
CALL MINUS_TWO(7);
CALL SUM_TWO(3, 8);
CALL SUM_TWO(7, 5);
set enable_seqscan=false;
set enable_seqscan=true;
set seq_page_cost=2.0;
set seq_page_cost=1.0;
postgres=# SELECT query, calls, rows FROM pg_stat_statements;
               query               | calls | rows
-----------------------------------+-------+------
set seq_page_cost=$1              |     2 |    0
CALL MINUS_TWO($1)                |     2 |    0
set enable_seqscan=$1             |     2 |    0
CALL SUM_TWO($1, $2)              |     2 |    0
Looking forward to your feedback,

The idea is good, but I think you should use pg_stat_functions instead. Maybe it is supported already (I didn't test it). I am not sure so SET statement should be traced in pg_stat_statements - it is usually pretty fast, and without context it says nothing. It looks like just overhead.

Regards

Pavel

Thanks,

Jeremy & Bertrand

-- 
Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com

Re: [PATCH] Query Jumbling for CALL and SET utility statements

From

Andres Freund

Date:

31 August 2022, 16:08:39

Hi,

On 2022-08-31 17:33:44 +0200, Drouvot, Bertrand wrote:
> While query jumbling is provided for function calls that’s currently not the
> case for procedures calls.
> The reason behind this is that all utility statements are currently
> discarded for jumbling.
> [...]
> That’s why we think it would make sense to allow jumbling for those 2
> utility statements: CALL and SET.

Yes, I've seen this be an issue repeatedly. Although more heavily for PREPARE
/ EXECUTE than either of the two cases you handle here. IME not tracking
PREPARE / EXECUTE can distort statistics substantially - there's appears to be
a surprising number of applications / frameworks resorting to them. Basically
requiring that track utility is turned on.

I suspect we should carve out things like CALL, PREPARE, EXECUTE from
track_utility - it's more or less an architectural accident that they're
utility statements.  It's a bit less clear that SET should be dealt with that
way.

> @@ -383,6 +384,30 @@ JumbleExpr(JumbleState *jstate, Node *node)
>                  APP_JUMB(var->varlevelsup);
>              }
>              break;
> +        case T_CallStmt:
> +            {
> +                CallStmt   *stmt = (CallStmt *) node;
> +                FuncExpr   *expr = stmt->funcexpr;
> +
> +                APP_JUMB(expr->funcid);
> +                JumbleExpr(jstate, (Node *) expr->args);
> +            }
> +            break;

Why do we need to take the arguments into account?

> +        case T_VariableSetStmt:
> +            {
> +                VariableSetStmt *stmt = (VariableSetStmt *) node;
> +
> +                APP_JUMB_STRING(stmt->name);
> +                JumbleExpr(jstate, (Node *) stmt->args);
> +            }
> +            break;

Same?

> +        case T_A_Const:
> +            {
> +                int            loc = ((const A_Const *) node)->location;
> +
> +                RecordConstLocation(jstate, loc);
> +            }
> +            break;

I suspect we only need this because of the jumbling of unparsed arguments I
questioned above?  If we do end up needing it, shouldn't we include the type
in the jumbling?

Greetings,

Andres Freund

Re: [PATCH] Query Jumbling for CALL and SET utility statements

From

Pavel Stehule

Date:

31 August 2022, 16:59:23

st 31. 8. 2022 v 17:50 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:

Hi

st 31. 8. 2022 v 17:34 odesílatel Drouvot, Bertrand <bdrouvot@amazon.com> napsal:
Hi hackers,

While query jumbling is provided for function calls that’s currently not the case for procedures calls.
The reason behind this is that all utility statements are currently discarded for jumbling.

We’ve recently seen performance impacts (LWLock contention) due to the lack of jumbling on procedure calls with pg_stat_statements and pg_stat_statements.track_utility enabled (think an application with a high rate of procedure calls with unique parameters for each call).

Jeremy has had this conversation on twitter (see https://twitter.com/jer_s/status/1560003560116342785) and Nikolay reported that he also had to work on a similar performance issue with SET being used.

That’s why we think it would make sense to allow jumbling for those 2 utility statements: CALL and SET.

Please find attached a patch proposal for doing so.

With the attached patch we would get things like:CALL MINUS_TWO(3);
CALL MINUS_TWO(7);
CALL SUM_TWO(3, 8);
CALL SUM_TWO(7, 5);
set enable_seqscan=false;
set enable_seqscan=true;
set seq_page_cost=2.0;
set seq_page_cost=1.0;
postgres=# SELECT query, calls, rows FROM pg_stat_statements;
               query               | calls | rows
-----------------------------------+-------+------
set seq_page_cost=$1              |     2 |    0
CALL MINUS_TWO($1)                |     2 |    0
set enable_seqscan=$1             |     2 |    0
CALL SUM_TWO($1, $2)              |     2 |    0
Looking forward to your feedback,
The idea is good, but I think you should use pg_stat_functions instead. Maybe it is supported already (I didn't test it). I am not sure so SET statement should be traced in pg_stat_statements - it is usually pretty fast, and without context it says nothing. It looks like just overhead.

I was wrong - there is an analogy with SELECT fx, and the statistics are in pg_stat_statements, and in pg_stat_function too.

Regards

Pavel

Regards

Pavel
Thanks,
Jeremy & Bertrand
-- 
Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com

Re: [PATCH] Query Jumbling for CALL and SET utility statements

From

Jeremy Schneider

Date:

31 August 2022, 17:58:14

On 8/31/22 8:33 AM, Drouvot, Bertrand wrote:
> 
> We’ve recently seen performance impacts (LWLock contention) due to the
> lack of jumbling on procedure calls with pg_stat_statements and
> pg_stat_statements.track_utility enabled (think an application with a
> high rate of procedure calls with unique parameters for each call).

I ran some performance tests with the patch that Bertrand wrote to get
numbers. From my perspective, this patch is scoped very minimally and is
low risk; I don’t think it should need an enormous amount of validation.
It does appear to address the issues with both SET and CALL statements
that Nikolay and I respectively encountered. Honestly, this almost seems
like it was just an minor oversight in the original patch that added
support for CALL and procedures.

I used an r5.large EC2 instance running Linux and tested Bertrand’s
patch using the PostgreSQL 14.4 code base, compiled without and with
Bertrand’s patch. The difference is a lot more extreme on big servers
with lots of cores, but the difference is obvious even on a small
instance like this one.

As a side note: while I certainly don't want to build a database
primarily based on benchmarks, it's nice when benchmarks showcase the
database's strength. Without this patch, HammerDB completely falls over
in stored procedure mode, since one of the procedure arguments is a
time-based unique value on every call. Someone else at Amazon running
HammerDB was how I originally became aware of this problem.

-Jeremy


===== Setup:
$ psql -c "create procedure test(x int) as 'begin return; end' language
plpgsql;"
CREATE PROCEDURE
$ echo -e "\set x random(1,100000000) \n call test(:x)" >test-call.pgbench
$ echo -e "\set x random(1,100000000) \n set application_name=':x'"
>test-set.pgbench



===== CALL results without patch:

[postgres@ip-172-31-44-176 ~]$ pgbench -n -c 100 -j 100 -T15 -r -f
test-set.pgbench
pgbench (14.4)
transaction type: test-set.pgbench
scaling factor: 1
query mode: simple
number of clients: 100
number of threads: 100
duration: 15 s
number of transactions actually processed: 728748
latency average = 2.051 ms
initial connection time = 91.844 ms
tps = 48755.446492 (without initial connection time)
statement latencies in milliseconds:
0.000 \set x random(1,100000000)
2.046 set application_name=':x'


pg-14.4 rw postgres@postgres=# select wait_event, count(*) from
pg_stat_activity group by wait_event; \watch 1
...
Tue 30 Aug 2022 08:26:35 PM UTC (every 1s)

wait_event | count
---------------------+-------
[NULL] | 6
pg_stat_statements | 95
BgWriterMain | 1
ArchiverMain | 1
WalWriterMain | 1
AutoVacuumMain | 1
CheckpointerMain | 1
LogicalLauncherMain | 1
(8 rows)
...



===== CALL results with patch:

[postgres@ip-172-31-44-176 ~]$ pgbench -n -c 100 -j 100 -T15 -r -f
test-call.pgbench
pgbench (14.4)
transaction type: test-call.pgbench
scaling factor: 1
query mode: simple
number of clients: 100
number of threads: 100
duration: 15 s
number of transactions actually processed: 1098776
latency average = 1.361 ms
initial connection time = 89.002 ms
tps = 73491.904878 (without initial connection time)
statement latencies in milliseconds:
0.000 \set x random(1,100000000)
1.383 call test(:x)


pg-14.4 rw postgres@postgres=# select wait_event, count(*) from
pg_stat_activity group by wait_event; \watch 1
...
Tue 30 Aug 2022 08:42:51 PM UTC (every 1s)

wait_event | count
---------------------+-------
[NULL] | 99
BgWriterHibernate | 1
ArchiverMain | 1
WalWriterMain | 1
AutoVacuumMain | 1
CheckpointerMain | 1
ClientRead | 2
LogicalLauncherMain | 1
(8 rows)
...



===== SET results without patch:

[postgres@ip-172-31-44-176 ~]$ pgbench -n -c 100 -j 100 -T15 -r -f
test-set.pgbench
pgbench (14.4)
transaction type: test-set.pgbench
scaling factor: 1
query mode: simple
number of clients: 100
number of threads: 100
duration: 15 s
number of transactions actually processed: 728748
latency average = 2.051 ms
initial connection time = 91.844 ms
tps = 48755.446492 (without initial connection time)
statement latencies in milliseconds:
0.000 \set x random(1,100000000)
2.046 set application_name=':x'


pg-14.4 rw postgres@postgres=# select wait_event, count(*) from
pg_stat_activity group by wait_event; \watch 1
...
Tue 30 Aug 2022 08:26:35 PM UTC (every 1s)

wait_event | count
---------------------+-------
[NULL] | 6
pg_stat_statements | 95
BgWriterMain | 1
ArchiverMain | 1
WalWriterMain | 1
AutoVacuumMain | 1
CheckpointerMain | 1
LogicalLauncherMain | 1
(8 rows)
...



===== SET results with patch:

[postgres@ip-172-31-44-176 ~]$ pgbench -n -c 100 -j 100 -T15 -r -f
test-set.pgbench
pgbench (14.4)
transaction type: test-set.pgbench
scaling factor: 1
query mode: simple
number of clients: 100
number of threads: 100
duration: 15 s
number of transactions actually processed: 1178844
latency average = 1.268 ms
initial connection time = 89.159 ms
tps = 78850.178814 (without initial connection time)
statement latencies in milliseconds:
0.000 \set x random(1,100000000)
1.270 set application_name=':x'


pg-14.4 rw postgres@postgres=# select wait_event, count(*) from
pg_stat_activity group by wait_event; \watch 1
...
Tue 30 Aug 2022 08:44:30 PM UTC (every 1s)

wait_event | count
---------------------+-------
[NULL] | 101
BgWriterHibernate | 1
ArchiverMain | 1
WalWriterMain | 1
AutoVacuumMain | 1
CheckpointerMain | 1
LogicalLauncherMain | 1
(7 rows)
...





-- 
Jeremy Schneider
Database Engineer
Amazon Web Services

Re: [PATCH] Query Jumbling for CALL and SET utility statements

From

Jeremy Schneider

Date:

31 August 2022, 18:00:05

On 8/31/22 9:08 AM, Andres Freund wrote:
> 
> I suspect we should carve out things like CALL, PREPARE, EXECUTE from
> track_utility - it's more or less an architectural accident that they're
> utility statements.  It's a bit less clear that SET should be dealt with that
> way.

Regarding SET, the compelling use case was around "application_name"
whose purpose is to provide a label in pg_stat_activity and on log
lines, which can be used to improve observability and connect queries to
their source in application code. Nikolay's incident (on peak shopping
day for an eCommerce corp) evidently involved an application which
leveraged this, but as a result the contention on the pg_stat_statements
LWLock in exclusive mode effectively caused an outage for the retailer?
Or nearly did? My description here is based on Nikolay's public twitter
comment.

I've seen a lot of applications that make heavy use of temp tables,
where DDL would be pretty important to track as part of the regular
workload. So that probably should be added to the list alongside
prepared statements. And I'd want to spend a little more time thinking
about what other use cases might be missing. I'm hesitant about the
general idea of carving out some utility statements away from this
"track_utility" GUC.

Personally, at this point, I think pg_stat_statements is critical
infrastructure for anyone running PostgreSQL at scale. The information
it provides is indispensable. I don't think it's really defensible to
tell people that if they want to scale, then they need to fly blind on
any utility statements.

-Jeremy

-- 
Jeremy Schneider
Database Engineer
Amazon Web Services

Re: [PATCH] Query Jumbling for CALL and SET utility statements

From

Andres Freund

Date:

31 August 2022, 19:06:04

Hi,

On 2022-08-31 11:00:05 -0700, Jeremy Schneider wrote:
> On 8/31/22 9:08 AM, Andres Freund wrote:
> > 
> > I suspect we should carve out things like CALL, PREPARE, EXECUTE from
> > track_utility - it's more or less an architectural accident that they're
> > utility statements.  It's a bit less clear that SET should be dealt with that
> > way.
> 
> Regarding SET, the compelling use case was around "application_name"
> whose purpose is to provide a label in pg_stat_activity and on log
> lines, which can be used to improve observability and connect queries to
> their source in application code.

I wasn't saying that SET shouldn't be jumbled, just that it seems more
reasonable to track it only when track_utility is enabled, rather than doing
so even when that's disabled. Which I do think makes sense for executing a
prepared statement and calling a procedure, since they're really only utility
statements by accident.


> Personally, at this point, I think pg_stat_statements is critical
> infrastructure for anyone running PostgreSQL at scale. The information
> it provides is indispensable. I don't think it's really defensible to
> tell people that if they want to scale, then they need to fly blind on
> any utility statements.

I wasn't suggesting doing so at all.

Greetings,

Andres Freund

Re: [PATCH] Query Jumbling for CALL and SET utility statements

From

Jeremy Schneider

Date:

31 August 2022, 20:05:39

On 8/31/22 12:06 PM, Andres Freund wrote:

Regarding SET, the compelling use case was around "application_name"
whose purpose is to provide a label in pg_stat_activity and on log
lines, which can be used to improve observability and connect queries to
their source in application code.

I wasn't saying that SET shouldn't be jumbled, just that it seems more
reasonable to track it only when track_utility is enabled, rather than doing
so even when that's disabled. Which I do think makes sense for executing a
prepared statement and calling a procedure, since they're really only utility
statements by accident.

Hey Andres, sorry for misunderstanding your email!

Based on this quick test I just now ran (transcript below), I think that PREPARE/EXECUTE is already excluded from track_utility?

I get your point about CALL, maybe it does make sense to also exclude this. It might also be worth a small update to the doc for track_utility about how it behaves, in this regard.

https://www.postgresql.org/docs/14/pgstatstatements.html#id-1.11.7.39.9

Example updated sentence:
> pg_stat_statements.track_utility controls whether <<most>> utility commands are tracked by the module. Utility commands are all those other than SELECT, INSERT, UPDATE and DELETE <<, but this parameter does not disable tracking of PREPARE, EXECUTE or CALL>>. The default value is on. Only superusers can change this setting.

=====

pg-14.4 rw root@db1=# set pg_stat_statements.track_utility=on;
SET
pg-14.4 rw root@db1=# select pg_stat_statements_reset();
pg_stat_statements_reset
--------------------------

(1 row)
pg-14.4 rw root@db1=# prepare test as select /* unique123 */ 1;
PREPARE
pg-14.4 rw root@db1=# execute test;
?column?
----------
        1
(1 row)

pg-14.4 rw root@db1=# set application_name='test';
SET
pg-14.4 rw root@db1=# select substr(query,1,50) from pg_stat_statements;
                  substr
-------------------------------------------
prepare test as select /* unique123 */ $1
select pg_stat_statements_reset()
set application_name=$1
(3 rows)

=====

pg-14.4 rw root@db1=# set pg_stat_statements.track_utility=off;
SET
pg-14.4 rw root@db1=# select pg_stat_statements_reset();
pg_stat_statements_reset
--------------------------

(1 row)

pg-14.4 rw root@db1=# prepare test as select /* unique123 */ 1;
PREPARE
pg-14.4 rw root@db1=# execute test;
?column?
----------
        1
(1 row)

pg-14.4 rw root@db1=# set application_name='test';
SET
pg-14.4 rw root@db1=# select substr(query,1,50) from pg_stat_statements;
                  substr
-------------------------------------------
prepare test as select /* unique123 */ $1
select pg_stat_statements_reset()
(2 rows)

-- 
Jeremy Schneider
Database Engineer
Amazon Web Services

Re: [PATCH] Query Jumbling for CALL and SET utility statements

From

"Drouvot, Bertrand"

Date:

01 September 2022, 07:16:35

Hi,

On 8/31/22 6:08 PM, Andres Freund wrote:

Hi,

On 2022-08-31 17:33:44 +0200, Drouvot, Bertrand wrote:

@@ -383,6 +384,30 @@ JumbleExpr(JumbleState *jstate, Node *node)                              APP_JUMB(var->varlevelsup);                      }                      break;
+             case T_CallStmt:
+                     {
+                             CallStmt   *stmt = (CallStmt *) node;
+                             FuncExpr   *expr = stmt->funcexpr;
+
+                             APP_JUMB(expr->funcid);
+                             JumbleExpr(jstate, (Node *) expr->args);
+                     }
+                     break;

Why do we need to take the arguments into account?

Thanks for looking at it!

Agree that It's not needed to "solve" the Lock contention issue, but I think it's needed for the "render".

Without it we would get, things like:

instead of

+             case T_VariableSetStmt:
+                     {
+                             VariableSetStmt *stmt = (VariableSetStmt *) node;
+
+                             APP_JUMB_STRING(stmt->name);
+                             JumbleExpr(jstate, (Node *) stmt->args);
+                     }
+                     break;

Same?

yeah, same reason. Without it we would get things like:

postgres=# set enable_seqscan=false;
SET
postgres=# set enable_seqscan=true;
SET
postgres=# SELECT query, calls, rows FROM pg_stat_statements;
               query               | calls | rows
-----------------------------------+-------+------
select pg_stat_statements_reset() |     1 |    1
set enable_seqscan=false          |     2 |    0
(2 rows)

instead of

+             case T_A_Const:
+                     {
+                             int                     loc = ((const A_Const *) node)->location;
+
+                             RecordConstLocation(jstate, loc);
+                     }
+                     break;

I suspect we only need this because of the jumbling of unparsed arguments I
questioned above?

Right but only for the T_VariableSetStmt case.

 If we do end up needing it, shouldn't we include the type
in the jumbling?

I don't think so as this is only for the T_VariableSetStmt case.

And looking closer I don't see such as thing as "consttype" (that we can find in the Const struct) in the A_Const struct.

Regards,

-- 
Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com

Re: [PATCH] Query Jumbling for CALL and SET utility statements

From

"Drouvot, Bertrand"

Date:

01 September 2022, 10:55:11

Hi,

On 8/31/22 10:05 PM, Jeremy Schneider wrote:

On 8/31/22 12:06 PM, Andres Freund wrote:

Regarding SET, the compelling use case was around "application_name"
whose purpose is to provide a label in pg_stat_activity and on log
lines, which can be used to improve observability and connect queries to
their source in application code.

I wasn't saying that SET shouldn't be jumbled, just that it seems more
reasonable to track it only when track_utility is enabled, rather than doing
so even when that's disabled. Which I do think makes sense for executing a
prepared statement and calling a procedure, since they're really only utility
statements by accident.

I get your point about CALL, maybe it does make sense to also exclude this.

That's a good point and i think we should track CALL whatever the value of pgss_track_utility is.

I think so because we are tracking function calls in all the cases (because "linked" to select aka not a utility) and i don't see any reasons why not to do the same for procedure calls.

Please find attached v2 as an attempt to do so.

With v2 we get things like:

postgres=# set pg_stat_statements.track_utility=on;
SET
postgres=# call MY_PROC(20);
CALL
postgres=# call MY_PROC(10);
CALL
postgres=# set enable_seqscan=false;
SET
postgres=# set enable_seqscan=true;
SET
postgres=# select queryid,query,calls from pg_stat_statements;
       queryid       |                  query                  | calls
---------------------+-----------------------------------------+-------
4670878543381973400 | set pg_stat_statements.track_utility=$1 |     1
-640317129591544054 | set enable_seqscan=$1                   |     2
492647827690744963 | select pg_stat_statements_reset()       |     1
6541399678435597534 | call MY_PROC($1)                        |     2

and

postgres=# set pg_stat_statements.track_utility=off;
SET
postgres=# call MY_PROC(10);
CALL
postgres=# call MY_PROC(20);
CALL
postgres=# set enable_seqscan=true;
SET
postgres=# set enable_seqscan=false;
SET
postgres=# select queryid,query,calls from pg_stat_statements;
       queryid       |                  query                  | calls
---------------------+-----------------------------------------+-------
4670878543381973400 | set pg_stat_statements.track_utility=$1 |     1
492647827690744963 | select pg_stat_statements_reset()       |     1
6541399678435597534 | call MY_PROC($1)                        |     2
(3 rows)

It might also be worth a small update to the doc for track_utility about how it behaves, in this regard.

https://www.postgresql.org/docs/14/pgstatstatements.html#id-1.11.7.39.9

Example updated sentence:
> pg_stat_statements.track_utility controls whether <<most>> utility commands are tracked by the module. Utility commands are all those other than SELECT, INSERT, UPDATE and DELETE <<, but this parameter does not disable tracking of PREPARE, EXECUTE or CALL>>. The default value is on. Only superusers can change this setting.

Agree, wording added to v2.
Regards,

-- 
Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com

Attachment

v2-0001-JumbleQuery-on-Call-and-Set.patch

Re: [PATCH] Query Jumbling for CALL and SET utility statements

From

"Imseih (AWS), Sami"

Date:

01 September 2022, 15:13:42

> Please find attached v2 as an attempt to do so.

+1 to the idea.

I think it will be better to evaluate jstate instead of

JUMBLE_UTILITY, such as:

if (query->utilityStmt && !jstate)

instead of

if (query->utilityStmt && !JUMBLE_UTILITY(query->utilityStmt))

This will allow for support of potentially other utility statements

In the future without having to teach pg_stat_statements about them.

If a jstate is set for the utility statements, pgss will do the right thing.

Thanks

Sami Imseih

Amazon Web Services (AWS)

Re: [PATCH] Query Jumbling for CALL and SET utility statements

From

"Drouvot, Bertrand"

Date:

02 September 2022, 09:06:50

Hi,

On 9/1/22 5:13 PM, Imseih (AWS), Sami wrote:

@font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;}@font-face {font-family:Consolas; panose-1:2 11 6 9 2 2 4 3 2 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0in; font-size:11.0pt; font-family:"Calibri",sans-serif;}pre {mso-style-priority:99; mso-style-link:"HTML Preformatted Char"; margin:0in; font-size:10.0pt; font-family:"Courier New";}span.HTMLPreformattedChar {mso-style-name:"HTML Preformatted Char"; mso-style-priority:99; mso-style-link:"HTML Preformatted"; font-family:Consolas;}span.EmailStyle22 {mso-style-type:personal-reply; font-family:"Calibri",sans-serif; color:windowtext;}.MsoChpDefault {mso-style-type:export-only; font-size:10.0pt;}div.WordSection1 {page:WordSection1;}
> Please find attached v2 as an attempt to do so.
+1 to the idea.

Thanks for looking at it!

I think it will be better to evaluate jstate instead of
JUMBLE_UTILITY, such as:

if (query->utilityStmt && !jstate)

instead of

if (query->utilityStmt && !JUMBLE_UTILITY(query->utilityStmt))

This will allow for support of potentially other utility statements
In the future without having to teach pg_stat_statements about them.
If a jstate is set for the utility statements, pgss will do the right thing.

Fair point, thanks!

v3 including this change is attached.

Thanks,
--

Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

v3-0001-JumbleQuery-on-Call-and-Set.patch

Re: [PATCH] Query Jumbling for CALL and SET utility statements

From

"Imseih (AWS), Sami"

Date:

07 September 2022, 15:48:36

This is ready for committer but I suggest the following for the

doc changes:

Plannable queries (that is, SELECT, INSERT, UPDATE, and DELETE) are

combined into a single pg_stat_statements entry whenever they have

identical query structures according to an internal hash calculation.

Typically, two queries will be considered the same for this purpose

if they are semantically equivalent except for the values of literal

constants appearing in the query. Utility commands (that is, all other commands)

are compared strictly on the basis of their textual query strings, however.

-- to --

Plannable queries (that is, SELECT, INSERT, UPDATE, and DELETE) as

well as CALL and SET commands are combined into a single

pg_stat_statements entry whenever they have identical query

structures according to an internal hash calculation.

Typically, two queries will be considered the same for this purpose

if they are semantically equivalent except for the values of literal

constants appearing in the command. All other commands are compared

strictly on the basis of their textual query strings, however.

pg_stat_statements.track_utility controls whether utility

commands are tracked by the module. Utility commands

are all those other than SELECT, INSERT, UPDATE and DELETE.

The default value is on. Only superusers can change this setting.

-- to --

pg_stat_statements.track_utility controls whether utility commands

are tracked by the module. Tracked utility commands are all those

other than SELECT, INSERT, UPDATE, DELETE, CALL and SET.

The default value is on. Only superusers can change this setting.

Thanks,

Sami Imseih

Amazon Web Services (AWS)

From: "Drouvot, Bertrand" <bdrouvot@amazon.com>
Date: Friday, September 2, 2022 at 4:06 AM
To: "Imseih (AWS), Sami" <simseih@amazon.com>, "Schneider (AWS), Jeremy" <schnjere@amazon.com>, Andres Freund <andres@anarazel.de>
Cc: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>, Peter Eisentraut <peter.eisentraut@enterprisedb.com>, Pavel Stehule <pavel.stehule@gmail.com>, Nikolay Samokhvalov <samokhvalov@gmail.com>
Subject: Re: [PATCH] Query Jumbling for CALL and SET utility statements

Hi,

On 9/1/22 5:13 PM, Imseih (AWS), Sami wrote:

> Please find attached v2 as an attempt to do so.
+1 to the idea.

Thanks for looking at it!

I think it will be better to evaluate jstate instead of
JUMBLE_UTILITY, such as:

if (query->utilityStmt && !jstate)

instead of

if (query->utilityStmt && !JUMBLE_UTILITY(query->utilityStmt))

This will allow for support of potentially other utility statements
In the future without having to teach pg_stat_statements about them.
If a jstate is set for the utility statements, pgss will do the right thing.

Fair point, thanks!

v3 including this change is attached.

Thanks,

Bertrand Drouvot

PostgreSQL Contributors Team

RDS Open Source Databases

Amazon Web Services: https://aws.amazon.com

Re: [PATCH] Query Jumbling for CALL and SET utility statements

From

Jeremy Schneider

Date:

08 September 2022, 01:19:42

On 9/2/22 2:06 AM, Drouvot, Bertrand wrote:

v3 including this change is attached.

FYI: "reset all" core dumps with v3

I didn't fully debug yet, but here's the backtrace on my 14.4 build with the patch

[postgres@ip-172-31-44-176 data]$ gdb /usr/local/pgsql-14.4/bin/postgres core.27217
...
Core was generated by `postgres: postgres postgres [local] RESET '.
Program terminated with signal 11, Segmentation fault.
#0 0x00007f7776ae4821 in __strlen_sse2_pminub () from /lib64/libc.so.6
...
(gdb) bt
#0 0x00007f7776ae4821 in __strlen_sse2_pminub () from /lib64/libc.so.6
#1 0x00000000008e061c in JumbleExpr (jstate=0x1cf7f80, node=<optimized out>) at queryjumble.c:400
#2 0x00000000008dfdd8 in JumbleQueryInternal (jstate=0x1cf7f80, query=0x1cf7e70) at queryjumble.c:247
#3 0x00000000008e0b4b in JumbleQuery (query=query@entry=0x1cf7e70, querytext=querytext@entry=0x1cf72f8 "reset all;") at queryjumble.c:127
#4 0x000000000056ba4b in parse_analyze (parseTree=0x1cf7ce0, sourceText=0x1cf72f8 "reset all;", paramTypes=0x0, numParams=<optimized out>, queryEnv=0x0) at analyze.c:130
#5 0x000000000079df63 in pg_analyze_and_rewrite (parsetree=parsetree@entry=0x1cf7ce0, query_string=query_string@entry=0x1cf72f8 "reset all;",
paramTypes=paramTypes@entry=0x0, numParams=numParams@entry=0, queryEnv=queryEnv@entry=0x0) at postgres.c:657
#6 0x000000000079e472 in exec_simple_query (query_string=0x1cf72f8 "reset all;") at postgres.c:1130
#7 0x000000000079f9d3 in PostgresMain (argc=argc@entry=1, argv=argv@entry=0x7ffd0c341f80, dbname=0x1d44948 "postgres", username=<optimized out>) at postgres.c:4496
#8 0x000000000048c9f3 in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4530
#9 BackendStartup (port=0x1d3bdd0) at postmaster.c:4252
#10 ServerLoop () at postmaster.c:1745
#11 0x0000000000721332 in PostmasterMain (argc=argc@entry=5, argv=argv@entry=0x1cf1e10) at postmaster.c:1417
#12 0x000000000048da6e in main (argc=5, argv=0x1cf1e10) at main.c:209

-- 
Jeremy Schneider
Database Engineer
Amazon Web Services

Re: [PATCH] Query Jumbling for CALL and SET utility statements

From

Michael Paquier

Date:

08 September 2022, 05:23:19

On Wed, Sep 07, 2022 at 06:19:42PM -0700, Jeremy Schneider wrote:
> I didn't fully debug yet, but here's the backtrace on my 14.4 build with
> the patch

What happens on HEAD?  That would be the target branch for a new
feature.
--
Michael

Attachment

signature.asc

Re: [PATCH] Query Jumbling for CALL and SET utility statements

From

"Drouvot, Bertrand"

Date:

08 September 2022, 06:49:13

Hi,

On 9/8/22 7:23 AM, Michael Paquier wrote:

On Wed, Sep 07, 2022 at 06:19:42PM -0700, Jeremy Schneider wrote:

I didn't fully debug yet, but here's the backtrace on my 14.4 build with
the patch

Thanks Jeremy for reporting the issue!

What happens on HEAD?  That would be the target branch for a new
feature.

Just tested and i can see the same issue on HEAD.

Issue is on stmt->name being NULL here:

Breakpoint 2, JumbleExpr (jstate=0x55d60e769e30, node=0x55d60e769b60) at queryjumble.c:364
364             if (node == NULL)
(gdb) n
368             check_stack_depth();
(gdb)
374             APP_JUMB(node->type);
(gdb)
376             switch (nodeTag(node))
(gdb)
398                                     VariableSetStmt *stmt = (VariableSetStmt *) node;
(gdb) n
400                                     APP_JUMB_STRING(stmt->name);
(gdb) p stmt->name
$1 = 0x0

I'll have a closer look.

Regards,
--

Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [PATCH] Query Jumbling for CALL and SET utility statements

From

Julien Rouhaud

Date:

08 September 2022, 06:50:12

On Thu, Sep 08, 2022 at 02:23:19PM +0900, Michael Paquier wrote:
> On Wed, Sep 07, 2022 at 06:19:42PM -0700, Jeremy Schneider wrote:
> > I didn't fully debug yet, but here's the backtrace on my 14.4 build with
> > the patch
> 
> What happens on HEAD?  That would be the target branch for a new
> feature.

It would be the same AFAICS.  From v3:

+        case T_VariableSetStmt:
+            {
+                VariableSetStmt *stmt = (VariableSetStmt *) node;
+
+                APP_JUMB_STRING(stmt->name);
+                JumbleExpr(jstate, (Node *) stmt->args);
+            }

For a RESET ALL command stmt->name is NULL.

Re: [PATCH] Query Jumbling for CALL and SET utility statements

From

"Drouvot, Bertrand"

Date:

08 September 2022, 09:06:51

Hi,

On 9/8/22 8:50 AM, Julien Rouhaud wrote:

Thanks for looking at it!

On Thu, Sep 08, 2022 at 02:23:19PM +0900, Michael Paquier wrote:

On Wed, Sep 07, 2022 at 06:19:42PM -0700, Jeremy Schneider wrote:

I didn't fully debug yet, but here's the backtrace on my 14.4 build with
the patch

What happens on HEAD?  That would be the target branch for a new
feature.

It would be the same AFAICS.  From v3:

+               case T_VariableSetStmt:
+                       {
+                               VariableSetStmt *stmt = (VariableSetStmt *) node;
+
+                               APP_JUMB_STRING(stmt->name);
+                               JumbleExpr(jstate, (Node *) stmt->args);
+                       }

For a RESET ALL command stmt->name is NULL.

Right, please find attached v4 addressing the issue and also Sami's comments [1].

[1]: https://www.postgresql.org/message-id/82A35172-BEB3-4DFA-B11C-AE5E50A0F932%40amazon.com

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

v4-0001-JumbleQuery-on-Call-and-Set.patch

Re: [PATCH] Query Jumbling for CALL and SET utility statements

From

Julien Rouhaud

Date:

08 September 2022, 11:29:19

Hi,

On Thu, Sep 08, 2022 at 11:06:51AM +0200, Drouvot, Bertrand wrote:
> Hi,
> 
> On 9/8/22 8:50 AM, Julien Rouhaud wrote:
> 
> Thanks for looking at it!
> 
> > On Thu, Sep 08, 2022 at 02:23:19PM +0900, Michael Paquier wrote:
> > > On Wed, Sep 07, 2022 at 06:19:42PM -0700, Jeremy Schneider wrote:
> > > > I didn't fully debug yet, but here's the backtrace on my 14.4 build with
> > > > the patch
> > > What happens on HEAD?  That would be the target branch for a new
> > > feature.
> > It would be the same AFAICS.  From v3:
> > 
> > +               case T_VariableSetStmt:
> > +                       {
> > +                               VariableSetStmt *stmt = (VariableSetStmt *) node;
> > +
> > +                               APP_JUMB_STRING(stmt->name);
> > +                               JumbleExpr(jstate, (Node *) stmt->args);
> > +                       }
> > 
> > For a RESET ALL command stmt->name is NULL.
> 
> Right, please find attached v4 addressing the issue and also Sami's comments
> [1].

(Sorry I've not been following this thread until now)

IME if your application relies on 2PC it's very likely that you will hit the
exact same problems described in your original email.  What do you think about
normalizing those too while working on the subject?

Re: [PATCH] Query Jumbling for CALL and SET utility statements

From

"Drouvot, Bertrand"

Date:

08 September 2022, 16:07:05

Hi,

On 9/8/22 1:29 PM, Julien Rouhaud wrote:

Hi,

On Thu, Sep 08, 2022 at 11:06:51AM +0200, Drouvot, Bertrand wrote:

Hi,

On 9/8/22 8:50 AM, Julien Rouhaud wrote:

Thanks for looking at it!

On Thu, Sep 08, 2022 at 02:23:19PM +0900, Michael Paquier wrote:

On Wed, Sep 07, 2022 at 06:19:42PM -0700, Jeremy Schneider wrote:

I didn't fully debug yet, but here's the backtrace on my 14.4 build with
the patch

What happens on HEAD?  That would be the target branch for a new
feature.

It would be the same AFAICS.  From v3:

+               case T_VariableSetStmt:
+                       {
+                               VariableSetStmt *stmt = (VariableSetStmt *) node;
+
+                               APP_JUMB_STRING(stmt->name);
+                               JumbleExpr(jstate, (Node *) stmt->args);
+                       }

For a RESET ALL command stmt->name is NULL.

Right, please find attached v4 addressing the issue and also Sami's comments
[1].

(Sorry I've not been following this thread until now)

IME if your application relies on 2PC it's very likely that you will hit the
exact same problems described in your original email.

Agree

 What do you think about
normalizing those too while working on the subject?

That sounds reasonable, I'll have a look at those too while at it.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Query Jumbling for CALL and SET utility statements

From

"Drouvot, Bertrand"

Date:

09 September 2022, 10:11:50

Hi,

On 9/8/22 6:07 PM, Drouvot, Bertrand wrote:

On 9/8/22 1:29 PM, Julien Rouhaud wrote:
IME if your application relies on 2PC it's very likely that you will hit the
exact same problems described in your original email. 
Agree
 What do you think about
normalizing those too while working on the subject?
That sounds reasonable, I'll have a look at those too while at it.

Attached v5 to normalize 2PC commands too, so that we get things like:

create table test_tx (a int);
begin;
prepare transaction 'tx1';
insert into test_tx values (1);
commit prepared 'tx1';
begin;
prepare transaction 'tx2';
insert into test_tx values (2);
commit prepared 'tx2';
begin;
prepare transaction 'tx3';
insert into test_tx values (3);
rollback prepared 'tx3';
begin;
prepare transaction 'tx4';
insert into test_tx values (4);
rollback prepared 'tx4';
SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
                                    query                                     | calls | rows
------------------------------------------------------------------------------+-------+------
SELECT pg_stat_statements_reset()                                            |     1 |    1
SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C" |     0 |    0
begin                                                                        |     4 |    0
commit prepared $1                                                           |     2 |    0
create table test_tx (a int)                                                 |     1 |    0
insert into test_tx values ($1)                                              |     4 |    4
prepare transaction $1                                                       |     4 |    0
rollback prepared $1                                                         |     2 |    0
(8 rows)

For those ones I also had to do some minor changes in gram.y and to the TransactionStmt struct to record the gid location.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

v5-0001-JumbleQuery-on-Call-and-Set.patch

Re: Query Jumbling for CALL and SET utility statements

From

bt22kawamotok

Date:

13 September 2022, 02:43:52

> Attached v5 to normalize 2PC commands too, so that we get things like:
> 
> 
> create table test_tx (a int);
> begin;
> prepare transaction 'tx1';
> insert into test_tx values (1);
> commit prepared 'tx1';
> begin;
> prepare transaction 'tx2';
> insert into test_tx values (2);
> commit prepared 'tx2';
> begin;
> prepare transaction 'tx3';
> insert into test_tx values (3);
> rollback prepared 'tx3';
> begin;
> prepare transaction 'tx4';
> insert into test_tx values (4);
> rollback prepared 'tx4';
> SELECT query, calls, rows FROM pg_stat_statements ORDER BY query
> COLLATE "C";
>                                     query
>        | calls | rows
> ------------------------------------------------------------------------------+-------+------
>  SELECT pg_stat_statements_reset()
>        |     1 |    1
>  SELECT query, calls, rows FROM pg_stat_statements ORDER BY query
> COLLATE "C" |     0 |    0
>  begin
>        |     4 |    0
>  commit prepared $1
>        |     2 |    0
>  create table test_tx (a int)
>        |     1 |    0
>  insert into test_tx values ($1)
>        |     4 |    4
>  prepare transaction $1
>        |     4 |    0
>  rollback prepared $1
>        |     2 |    0
> (8 rows)
> 
> For those ones I also had to do some minor changes in gram.y and to
> the TransactionStmt struct to record the gid location.

Thanks Bertrand.
I used your patch. It's looks very good.
I found that utility statement is counted separately in upper and lower 
case.
For example BEGIN and begin are counted separately.
Is it difficult to fix this problem?

Regards,

Kotaro Kawamoto

Re: Query Jumbling for CALL and SET utility statements

From

Julien Rouhaud

Date:

13 September 2022, 04:33:16

Hi,

On Tue, Sep 13, 2022 at 11:43:52AM +0900, bt22kawamotok wrote:
>
> I found that utility statement is counted separately in upper and lower
> case.
> For example BEGIN and begin are counted separately.
> Is it difficult to fix this problem?

This is a known behavior, utility command aren't normalized (apart from the few
that will be with this patch) and the queryid is just a hash of the provided
query string.

It seems unrelated to this patch though.  While it can be a bit annoying, it's
unlikely that the application will have thousands of way to ask for a new
transaction (mixing case, adding a random number of spaces between BEGIN and
TRANSACTION and so on), so in real life it won't cause any problem.  Fixing it
would require to fully jumble all utility statements, which would require a
separate discussion.

Re: Query Jumbling for CALL and SET utility statements

From

"Drouvot, Bertrand"

Date:

13 September 2022, 05:30:23

Hi,

On 9/13/22 6:33 AM, Julien Rouhaud wrote:
> Hi,
>
> On Tue, Sep 13, 2022 at 11:43:52AM +0900, bt22kawamotok wrote:
>> I found that utility statement is counted separately in upper and lower
>> case.
>> For example BEGIN and begin are counted separately.
>> Is it difficult to fix this problem?
> This is a known behavior, utility command aren't normalized (apart from the few
> that will be with this patch) and the queryid is just a hash of the provided
> query string.
>
> It seems unrelated to this patch though.  While it can be a bit annoying, it's
> unlikely that the application will have thousands of way to ask for a new
> transaction (mixing case, adding a random number of spaces between BEGIN and
> TRANSACTION and so on), so in real life it won't cause any problem.

Agree that it seems unlikely to cause any problem (as compare to the
utility statements that are handled in this patch).

>   Fixing it
> would require to fully jumble all utility statements, which would require a
> separate discussion.

Agree.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Amazon Web Services EMEA SARL, 38 avenue John F. Kennedy, L-1855 Luxembourg, R.C.S. Luxembourg B186284

Amazon Web Services EMEA SARL, succursale francaise, 31 Place des Corolles, Tour Carpe Diem, F-92400 Courbevoie, SIREN
831001 334, RCS Nanterre, APE 6311Z,  TVA FR30831001334

Re: Query Jumbling for CALL and SET utility statements

From

"Imseih (AWS), Sami"

Date:

14 September 2022, 13:20:09

> Attached v5 to normalize 2PC commands too, so that we get things like:

A nit on the documentation for v5, otherwise lgtm.

Plannable queries (that is, SELECT, INSERT, UPDATE, and DELETE) as well as CALL, SET

and two-phase commit commands PREPARE TRANSACTION, , COMMIT PREPARED

and ROLLBACK PREPARED are combined

---- to ----

Plannable queries (that is, SELECT, INSERT, UPDATE, and DELETE) as well as CALL,

SET, PREPARE TRANSACTION, COMMIT PREPARED and ROLLBACK PREPARED are combined

---

Regards,

Sami Imseih

Amazon Web Services (AWS)

Re: Query Jumbling for CALL and SET utility statements

From

Fujii Masao

Date:

16 September 2022, 12:53:36

On 2022/09/09 19:11, Drouvot, Bertrand wrote:
>>> IME if your application relies on 2PC it's very likely that you will hit the
>>> exact same problems described in your original email.

The utility commands for cursor like DECLARE CURSOR seem to have the same issue
and can cause lots of pgss entries. For example, when we use postgres_fdw and
execute "SELECT * FROM <foreign table> WHERE id = 10" five times in the same
transaction, the following commands are executed in the remote PostgreSQL server
and recorded as pgss entries there.

DECLARE c1 CURSOR FOR ...
DECLARE c2 CURSOR FOR ...
DECLARE c3 CURSOR FOR ...
DECLARE c4 CURSOR FOR ...
DECLARE c5 CURSOR FOR ...
FETCH 100 FROM c1
FETCH 100 FROM c2
FETCH 100 FROM c3
FETCH 100 FROM c4
FETCH 100 FROM c5
CLOSE c1
CLOSE c2
CLOSE c3
CLOSE c4
CLOSE c5

Furthermore, if the different query on foreign table is executed in the next
transaction, it may reuse the same cursor name previously used by another query.
That is, different queries can cause the same FETCH command like
"FETCH 100 FROM c1". This would be also an issue.

I'm not sure if the patch should also handle cursor cases. We can implement
that separately later if necessary.

I don't think that the patch should include the fix for cursor cases. It can be implemented separately later if
necessary.

> Attached v5 to normalize 2PC commands too, so that we get things like:

+ case T_VariableSetStmt:
+ {
+ VariableSetStmt *stmt = (VariableSetStmt *) node;
+
+ /* stmt->name is NULL for RESET ALL */
+ if (stmt->name)
+ {
+ APP_JUMB_STRING(stmt->name);
+ JumbleExpr(jstate, (Node *) stmt->args);

With the patch, "SET ... TO DEFAULT" and "RESET ..." are counted as the same query.
Is this intentional? Which might be ok because their behavior is basically the same.
But I'm afaid which may cause users to be confused. For example, they may fail to
find the pgss entry for RESET command they ran and just wonder why the command was
not recorded. To avoid such confusion, how about appending stmt->kind to the jumble?
Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Query Jumbling for CALL and SET utility statements

From

"Imseih (AWS), Sami"

Date:

16 September 2022, 15:08:59

>    The utility commands for cursor like DECLARE CURSOR seem to have the same issue
>    and can cause lots of pgss entries. For example, when we use postgres_fdw and
>    execute "SELECT * FROM <foreign table> WHERE id = 10" five times in the same
>    transaction, the following commands are executed in the remote PostgreSQL server
>    and recorded as pgss entries there.

>    DECLARE c1 CURSOR FOR ...
>    DECLARE c2 CURSOR FOR ...
>    DECLARE c3 CURSOR FOR ...

+1

I also made this observation recently and have a patch to suggest
to improve tis situation. I will start a separate thread for this.

Regards,

--
Sami Imseih
Amazon Web Services (AWS)

Re: Query Jumbling for CALL and SET utility statements

From

"Drouvot, Bertrand"

Date:

16 September 2022, 15:47:40

Hi,

On 9/16/22 2:53 PM, Fujii Masao wrote:
> 
>  
>> Attached v5 to normalize 2PC commands too, so that we get things like:
> 
> +        case T_VariableSetStmt:
> +            {
> +                VariableSetStmt *stmt = (VariableSetStmt *) node;
> +
> +                /* stmt->name is NULL for RESET ALL */
> +                if (stmt->name)
> +                {
> +                    APP_JUMB_STRING(stmt->name);
> +                    JumbleExpr(jstate, (Node *) stmt->args);
> 
> With the patch, "SET ... TO DEFAULT" and "RESET ..." are counted as the 
> same query.
> Is this intentional? 

Thanks for looking at the patch!
No, it is not intentional, good catch!

> Which might be ok because their behavior is 
> basically the same.
> But I'm afaid which may cause users to be confused. For example, they 
> may fail to
> find the pgss entry for RESET command they ran and just wonder why the 
> command was
> not recorded. To avoid such confusion, how about appending stmt->kind to 
> the jumble?
> Thought?

I think that's a good idea and will provide a new version taking care of 
it (and also Sami's comments up-thread).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Query Jumbling for CALL and SET utility statements

From

"Drouvot, Bertrand"

Date:

19 September 2022, 06:29:22

Hi,

On 9/16/22 5:47 PM, Drouvot, Bertrand wrote:
> Hi,
> 
> On 9/16/22 2:53 PM, Fujii Masao wrote:
>>
>>
>>> Attached v5 to normalize 2PC commands too, so that we get things like:
>>
>> +        case T_VariableSetStmt:
>> +            {
>> +                VariableSetStmt *stmt = (VariableSetStmt *) node;
>> +
>> +                /* stmt->name is NULL for RESET ALL */
>> +                if (stmt->name)
>> +                {
>> +                    APP_JUMB_STRING(stmt->name);
>> +                    JumbleExpr(jstate, (Node *) stmt->args);
>>
>> With the patch, "SET ... TO DEFAULT" and "RESET ..." are counted as 
>> the same query.
>> Is this intentional? 
> 
> Thanks for looking at the patch!
> No, it is not intentional, good catch!
> 
>> Which might be ok because their behavior is basically the same.
>> But I'm afaid which may cause users to be confused. For example, they 
>> may fail to
>> find the pgss entry for RESET command they ran and just wonder why the 
>> command was
>> not recorded. To avoid such confusion, how about appending stmt->kind 
>> to the jumble?
>> Thought?
> 
> I think that's a good idea and will provide a new version taking care of 
> it (and also Sami's comments up-thread).

Please find attached v6 taking care of the remarks mentioned above.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

v6-0001-JumbleQuery-on-Call-and-Set.patch

Re: Query Jumbling for CALL and SET utility statements

From

Fujii Masao

Date:

21 September 2022, 16:07:33


On 2022/09/19 15:29, Drouvot, Bertrand wrote:
> Please find attached v6 taking care of the remarks mentioned above.

Thanks for updating the patch!

+SET pg_stat_statements.track_utility = TRUE;
+
+-- PL/pgSQL procedure and pg_stat_statements.track = all
+-- we drop and recreate the procedures to avoid any caching funnies
+SET pg_stat_statements.track_utility = FALSE;

Could you tell me why track_utility is enabled just before it's disabled?

Could you tell me what actually happens if we don't drop and
recreate the procedures? I'd like to know what "any caching funnies" are.

+SELECT pg_stat_statements_reset();
+CALL MINUS_TWO(3);
+CALL MINUS_TWO(7);
+CALL SUM_TWO(3, 8);
+CALL SUM_TWO(7, 5);
+
+SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";

This test set for the procedures is executed with the following
four conditions, respectively. Do we really need all of these tests?

track = top, track_utility = true
track = top, track_utility = false
track = all, track_utility = true
track = all, track_utility = false

+begin;
+prepare transaction 'tx1';
+insert into test_tx values (1);
+commit prepared 'tx1';

The test set of 2PC commands is also executed with track_utility = on
and off, respectively. But why do we need to run that test when
track_utility = off?

-    if (query->utilityStmt)
+    if (query->utilityStmt && !jstate)
      {
          if (pgss_track_utility && !PGSS_HANDLED_UTILITY(query->utilityStmt))

"pgss_track_utility" should be
"pgss_track_utility || FORCE_TRACK_UTILITY(parsetree)" theoretically?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Query Jumbling for CALL and SET utility statements

From

"Drouvot, Bertrand"

Date:

26 September 2022, 10:40:34

Hi,

On 9/21/22 6:07 PM, Fujii Masao wrote:
> 
> 
> On 2022/09/19 15:29, Drouvot, Bertrand wrote:
>> Please find attached v6 taking care of the remarks mentioned above.
> 
> Thanks for updating the patch!
> 
> +SET pg_stat_statements.track_utility = TRUE;
> +
> +-- PL/pgSQL procedure and pg_stat_statements.track = all
> +-- we drop and recreate the procedures to avoid any caching funnies
> +SET pg_stat_statements.track_utility = FALSE;
> 
> Could you tell me why track_utility is enabled just before it's disabled?

Thanks for looking at the new version!

No real reason, I removed those useless SET in the new V7 attached.

> 
> Could you tell me what actually happens if we don't drop and
> recreate the procedures? I'd like to know what "any caching funnies" are.

Without the drop/recreate the procedure body does not appear normalized 
(while the CALL itself is) when switching from track = top to track = all.

I copy-pasted this comment from the already existing "function" section 
in the pg_stat_statements.sql file. This comment has been introduced for 
the function section in commit 83f2061dd0. Note that the behavior would 
be the same for the function case (function body does not appear 
normalized without the drop/recreate).

> 
> +SELECT pg_stat_statements_reset();
> +CALL MINUS_TWO(3);
> +CALL MINUS_TWO(7);
> +CALL SUM_TWO(3, 8);
> +CALL SUM_TWO(7, 5);
> +
> +SELECT query, calls, rows FROM pg_stat_statements ORDER BY query 
> COLLATE "C";
> 
> This test set for the procedures is executed with the following
> four conditions, respectively. Do we really need all of these tests?
> 
> track = top, track_utility = true
> track = top, track_utility = false
> track = all, track_utility = true
> track = all, track_utility = false

Oh right, the track_utility = false cases have been added when we 
decided up-thread to force track CALL.

But now that's probably not needed to test with track_utility = true. So 
I'm just keeping track_utility = off with track = top or all in the new 
V7 attached (like this is the case for the "function" section).

> 
> +begin;
> +prepare transaction 'tx1';
> +insert into test_tx values (1);
> +commit prepared 'tx1';
> 
> The test set of 2PC commands is also executed with track_utility = on
> and off, respectively. But why do we need to run that test when
> track_utility = off?

That's useless, thanks for pointing out. Removed in V7 attached.

> 
> -    if (query->utilityStmt)
> +    if (query->utilityStmt && !jstate)
>       {
>           if (pgss_track_utility && 
> !PGSS_HANDLED_UTILITY(query->utilityStmt))
> 
> "pgss_track_utility" should be
> "pgss_track_utility || FORCE_TRACK_UTILITY(parsetree)" theoretically?

Good catch! That's not needed (in practice) with the current code but 
that is "theoretically" needed indeed, let's add it in V7 attached 
(that's safer should the code change later on).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

v7-0001-JumbleQuery-on-Call-and-Set.patch

Re: Query Jumbling for CALL and SET utility statements

From

Michael Paquier

Date:

06 October 2022, 06:39:41

On Mon, Sep 19, 2022 at 08:29:22AM +0200, Drouvot, Bertrand wrote:
> Please find attached v6 taking care of the remarks mentioned above.
> +        case T_VariableSetStmt:
> +            {
> +                VariableSetStmt *stmt = (VariableSetStmt *) node;
> +
> +                /* stmt->name is NULL for RESET ALL */
> +                if (stmt->name)
> +                {
> +                    APP_JUMB(stmt->kind);
> +                    APP_JUMB_STRING(stmt->name);
> +                    JumbleExpr(jstate, (Node *) stmt->args);
> +                }
> +            }
> +            break;

Hmm.  If VariableSetStmt->is_local is not added to the jumble, then
aren't "SET foo = $1" and "SET LOCAL foo = $1" counted as the same
query?

I am not seeing SAVEPOINT, RELEASE, ROLLBACK .. TO SAVEPOINT
mentioned on this thread.  Would these be worth considering in what
gets compiled?  That would cover the remaining bits of
TransactionStmt.  The ODBC driver abuses of savepoints, for example,
so this could be useful for monitoring purposes in such cases.

As of the code stands, it could be cleaner to check
IsJumbleUtilityAllowed() in compute_utility_query_id(), falling back
to a default in JumbleQuery().  Not that what your patch does is
incorrect, of course.
--
Michael

Attachment

signature.asc

Re: Query Jumbling for CALL and SET utility statements

From

"Drouvot, Bertrand"

Date:

06 October 2022, 08:36:01

Hi,

On 9/26/22 12:40 PM, Drouvot, Bertrand wrote:
> let's add it in V7 attached
> (that's safer should the code change later on).

Attached a tiny rebase needed due to 249b0409b1.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

v8-0001-JumbleQuery-on-Call-and-Set.patch

Re: Query Jumbling for CALL and SET utility statements

From

"Drouvot, Bertrand"

Date:

06 October 2022, 08:43:57

Hi,

On 10/6/22 8:39 AM, Michael Paquier wrote:
> On Mon, Sep 19, 2022 at 08:29:22AM +0200, Drouvot, Bertrand wrote:
>> Please find attached v6 taking care of the remarks mentioned above.
>> +        case T_VariableSetStmt:
>> +            {
>> +                VariableSetStmt *stmt = (VariableSetStmt *) node;
>> +
>> +                /* stmt->name is NULL for RESET ALL */
>> +                if (stmt->name)
>> +                {
>> +                    APP_JUMB(stmt->kind);
>> +                    APP_JUMB_STRING(stmt->name);
>> +                    JumbleExpr(jstate, (Node *) stmt->args);
>> +                }
>> +            }
>> +            break;
> 
> Hmm.  If VariableSetStmt->is_local is not added to the jumble, then
> aren't "SET foo = $1" and "SET LOCAL foo = $1" counted as the same
> query?
> 

Good catch, thanks!
While at it let's also jumble "SET SESSION foo =". For this one, we 
would need to record another bool in VariableSetStmt: I'll create a 
dedicated patch for that.


> I am not seeing SAVEPOINT, RELEASE, ROLLBACK .. TO SAVEPOINT
> mentioned on this thread.  Would these be worth considering in what
> gets compiled?  That would cover the remaining bits of
> TransactionStmt.  The ODBC driver abuses of savepoints, for example,
> so this could be useful for monitoring purposes in such cases.

Agree. I'll look at those too.

> 
> As of the code stands, it could be cleaner to check
> IsJumbleUtilityAllowed() in compute_utility_query_id(), falling back
> to a default in JumbleQuery().  Not that what your patch does is
> incorrect, of course.

Will look at it.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Query Jumbling for CALL and SET utility statements

From

Michael Paquier

Date:

07 October 2022, 03:41:13

On Thu, Oct 06, 2022 at 10:43:57AM +0200, Drouvot, Bertrand wrote:
> On 10/6/22 8:39 AM, Michael Paquier wrote:
>> I am not seeing SAVEPOINT, RELEASE, ROLLBACK .. TO SAVEPOINT
>> mentioned on this thread.  Would these be worth considering in what
>> gets compiled?  That would cover the remaining bits of
>> TransactionStmt.  The ODBC driver abuses of savepoints, for example,
>> so this could be useful for monitoring purposes in such cases.
>
> Agree. I'll look at those too.

Thanks.

While studying a bit more this thread, I've been reminded of the fact
that this would treat different flavors of BEGIN/COMMIT commands (mix
of upper/lower characters, etc.) as different entries in
pg_stat_statements, and it feels inconsistent to me that we'd begin
jumbling the 2PC and savepoint commands with their nodes but not do
that for the rest of the commands, even if, as mentioned upthread,
applications may not mix grammars.  If they do, one could finish by
viewing incorrect reports, and I'd like to think that this would make
the life of a lot of people easier.

SET/RESET and CALL have a much lower presence frequency than the
transaction commands, where it is fine by me to include both of these
under the utility statement switch.  For OLTP workloads (I've seen
quite a bit of 2PC used across multiple nodes for short transactions
with writes involving more than two remote nodes), with a lot of
BEGIN/COMMIT or even 2PC commands issued, the performance could be
noticeable?  It may make sense to control these with a different GUC
switch, where we drop completely the string-only approach under
track_utility.  In short, I don't have any objections about the
business with SET and CALL, but the transaction part worries me a
bit.  As a first step, we could cut the cake in two parts, and just
focus on SET/RESET and CALL, which was the main point of discussion
of this thread to begin with.
--
Michael

Attachment

signature.asc

Re: Query Jumbling for CALL and SET utility statements

From

Tom Lane

Date:

07 October 2022, 03:51:52

Michael Paquier <michael@paquier.xyz> writes:
> While studying a bit more this thread, I've been reminded of the fact
> that this would treat different flavors of BEGIN/COMMIT commands (mix
> of upper/lower characters, etc.) as different entries in
> pg_stat_statements, and it feels inconsistent to me that we'd begin 
> jumbling the 2PC and savepoint commands with their nodes but not do
> that for the rest of the commands, even if, as mentioned upthread,
> applications may not mix grammars.

I've been thinking since the beginning of this thread that there
was no coherent, defensible rationale being offered for jumbling
some utility statements and not others.

I wonder if the answer is to jumble them all.  We avoided that
up to now because it would imply a ton of manual effort and
future code maintenance ... but now that the backend/nodes/
infrastructure is largely auto-generated, could we auto-generate
the jumbling code?

            regards, tom lane

Re: Query Jumbling for CALL and SET utility statements

From

Michael Paquier

Date:

07 October 2022, 04:13:05

On Thu, Oct 06, 2022 at 11:51:52PM -0400, Tom Lane wrote:
> I've been thinking since the beginning of this thread that there
> was no coherent, defensible rationale being offered for jumbling
> some utility statements and not others.

Yeah.  The potential performance impact of all the TransactionStmts
worries me a bit, though.

> I wonder if the answer is to jumble them all.  We avoided that
> up to now because it would imply a ton of manual effort and
> future code maintenance ... but now that the backend/nodes/
> infrastructure is largely auto-generated, could we auto-generate
> the jumbling code?

Probably.  One part that may be tricky though is the location of the
constants we'd like to make generic, but perhaps this could be handled
by using a dedicated variable type that just maps to int?  It does not
seem like a mandatory requirement to add that everywhere as a first
step, either.
--
Michael

Attachment

signature.asc

Re: Query Jumbling for CALL and SET utility statements

From

Julien Rouhaud

Date:

07 October 2022, 04:18:26

On Thu, Oct 06, 2022 at 11:51:52PM -0400, Tom Lane wrote:
> Michael Paquier <michael@paquier.xyz> writes:
> > While studying a bit more this thread, I've been reminded of the fact
> > that this would treat different flavors of BEGIN/COMMIT commands (mix
> > of upper/lower characters, etc.) as different entries in
> > pg_stat_statements, and it feels inconsistent to me that we'd begin
> > jumbling the 2PC and savepoint commands with their nodes but not do
> > that for the rest of the commands, even if, as mentioned upthread,
> > applications may not mix grammars.
>
> I've been thinking since the beginning of this thread that there
> was no coherent, defensible rationale being offered for jumbling
> some utility statements and not others.

Only a very small subset causes trouble in real life scenario, but I agree that
cherry-picking some utility statements isn't a great approach.

> I wonder if the answer is to jumble them all.  We avoided that
> up to now because it would imply a ton of manual effort and
> future code maintenance ... but now that the backend/nodes/
> infrastructure is largely auto-generated, could we auto-generate
> the jumbling code?

That's a good idea.  Naively, it seems doable as the infrastructure in
gen_node_support.pl already supports everything that should be needed (like
per-member annotation).

Re: Query Jumbling for CALL and SET utility statements

From

"Drouvot, Bertrand"

Date:

10 October 2022, 13:04:57

Hi,

On 10/7/22 6:13 AM, Michael Paquier wrote:
> On Thu, Oct 06, 2022 at 11:51:52PM -0400, Tom Lane wrote:
>> I've been thinking since the beginning of this thread that there
>> was no coherent, defensible rationale being offered for jumbling
>> some utility statements and not others.
> 
> Yeah.  The potential performance impact of all the TransactionStmts
> worries me a bit, though.
> 
>> I wonder if the answer is to jumble them all.  We avoided that
>> up to now because it would imply a ton of manual effort and
>> future code maintenance ... but now that the backend/nodes/
>> infrastructure is largely auto-generated, could we auto-generate
>> the jumbling code?

I think that's a good idea.

> 
> Probably.  One part that may be tricky though is the location of the
> constants we'd like to make generic, but perhaps this could be handled
> by using a dedicated variable type that just maps to int?  

It looks to me that we'd also need to teach the perl parser which fields 
per statements struct need to be jumbled (or more probably which ones to 
exclude from the jumbling, for example funccall in CallStmt). Not sure 
yet how to teach the perl parser, but I'll build this list first to help 
decide for a right approach, unless you already have some thoughts about it?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Query Jumbling for CALL and SET utility statements

From

Julien Rouhaud

Date:

10 October 2022, 13:16:47

Hi,

On Mon, Oct 10, 2022 at 03:04:57PM +0200, Drouvot, Bertrand wrote:
>
> On 10/7/22 6:13 AM, Michael Paquier wrote:
> >
> > Probably.  One part that may be tricky though is the location of the
> > constants we'd like to make generic, but perhaps this could be handled
> > by using a dedicated variable type that just maps to int?
>
> It looks to me that we'd also need to teach the perl parser which fields per
> statements struct need to be jumbled (or more probably which ones to exclude
> from the jumbling, for example funccall in CallStmt). Not sure yet how to
> teach the perl parser, but I'll build this list first to help decide for a
> right approach, unless you already have some thoughts about it?

Unless I'm missing something both can be handled using pg_node_attr()
annotations that already exists?

Re: Query Jumbling for CALL and SET utility statements

From

Michael Paquier

Date:

11 October 2022, 02:54:51

On Mon, Oct 10, 2022 at 09:16:47PM +0800, Julien Rouhaud wrote:
> Unless I'm missing something both can be handled using pg_node_attr()
> annotations that already exists?

Indeed, that should work for the locators.
--
Michael

Attachment

signature.asc

Re: Query Jumbling for CALL and SET utility statements

From

"Imseih (AWS), Sami"

Date:

11 October 2022, 14:18:54

> I've been thinking since the beginning of this thread that there
> was no coherent, defensible rationale being offered for jumbling
> some utility statements and not others.

+1 to the idea, as there are other utility statement cases
that should be Jumbled. Here is a recent thread for jumbling
cursors [1].

The CF entry [2] has been withdrawn until consensus is reached
on this topic.

[1]: https://www.postgresql.org/message-id/203CFCF7-176E-4AFC-A48E-B2CECFECD6AA@amazon.com
[2]: https://commitfest.postgresql.org/40/3901/


Regards,

Sami Imseih
AWS (Amazon Web Services)

Re: Query Jumbling for CALL and SET utility statements

From

Michael Paquier

Date:

12 October 2022, 00:13:20

On Tue, Oct 11, 2022 at 02:18:54PM +0000, Imseih (AWS), Sami wrote:
> +1 to the idea, as there are other utility statement cases
> that should be Jumbled. Here is a recent thread for jumbling
> cursors [1].

Thanks for mentioning that.  With an automated way to generate this
code, cursors would be handled, at the advantage of making sure that
no fields are missing in the jumbled structures (is_local was missed
for example on SET).

> The CF entry [2] has been withdrawn until consensus is reached
> on this topic.

It seems to me that the consensus is here, which is a good step
forward ;)
--
Michael

Attachment

signature.asc

Re: Query Jumbling for CALL and SET utility statements

From

Michael Paquier

Date:

30 November 2022, 06:55:16

On Wed, Oct 12, 2022 at 09:13:20AM +0900, Michael Paquier wrote:
> Thanks for mentioning that.  With an automated way to generate this
> code, cursors would be handled, at the advantage of making sure that
> no fields are missing in the jumbled structures (is_local was missed
> for example on SET).

So, this thread has stalled for the last few weeks and it seems to me
that the conclusion is that we'd want an approach using a set of
scripts that automate the generation of the code in charge of the DDL
jumbling.  And, depending on the portions of the queries that need to
be silenced, we may need to extend a few nodes with a location.  We
are not there yet, so I have marked the patch as returned with
feedback.
--
Michael

Attachment

signature.asc

Re: Query Jumbling for CALL and SET utility statements

From

Michael Paquier

Date:

07 December 2022, 00:57:22

On Tue, Oct 11, 2022 at 11:54:51AM +0900, Michael Paquier wrote:
> On Mon, Oct 10, 2022 at 09:16:47PM +0800, Julien Rouhaud wrote:
>> Unless I'm missing something both can be handled using pg_node_attr()
>> annotations that already exists?
>
> Indeed, that should work for the locators.

My mistake here.

When it comes to the locations to silence in the normalization, we
already rely on the variable name "location" of type int in
gen_node_support.pl, so we can just do the same for the code
generating the jumbling.

This stuff still needs two more pg_node_attr() to be able to work: one
to ignore a full Node in the jumbling and a second that can be used on
a  per-field basis.  Once this is in place, automating the generation
of the code is not that complicated, most of the work is to get around
placing the pg_node_attr() so as the jumbling gets things right.  The
number of fields to mark as things to ignore depends on the Node type
(heavy for Const, for example), but I'd like to think that a "ignore"
approach is better than an "include" approach so as new fields would
be considered by default in the jumble compilation.

I have not looked at all the edge cases, so perhaps more attrs would
be needed..
--
Michael

Attachment

signature.asc