Thread: Record SET session in VariableSetStmt

Record SET session in VariableSetStmt

From
"Drouvot, Bertrand"
Date:
Hi hackers,

"SET local" is currently recorded in VariableSetStmt (with the boolean 
is_local) but "SET session" is not.

Please find attached a patch proposal to also record "SET session" so 
that VariableSetStmt records all the cases.

Remark: Recording "SET session" will also help for the Jumbling work 
being done in [1].

[1]: 
https://www.postgresql.org/message-id/66be1104-164f-dcb8-6c43-f03a68a139a7%40gmail.com

Looking forward to your feedback,

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachment

Re: Record SET session in VariableSetStmt

From
Julien Rouhaud
Date:
Hi,

On Thu, Oct 06, 2022 at 12:57:17PM +0200, Drouvot, Bertrand wrote:
> Hi hackers,
>
> "SET local" is currently recorded in VariableSetStmt (with the boolean
> is_local) but "SET session" is not.
>
> Please find attached a patch proposal to also record "SET session" so that
> VariableSetStmt records all the cases.
>
> Remark: Recording "SET session" will also help for the Jumbling work being
> done in [1].

I don't think it's necessary.  SET and SET SESSION are semantically the same so
nothing should rely on how exactly someone spelled it.  This is also the case
for our core jumbling code, where we guarantee (or at least try to) that two
semantically identical statements will get the same queryid, and therefore
don't distinguish eg. LIKE vs ~~.



Re: Record SET session in VariableSetStmt

From
"Drouvot, Bertrand"
Date:
Hi,

On 10/6/22 1:18 PM, Julien Rouhaud wrote:
> Hi,
> 
> On Thu, Oct 06, 2022 at 12:57:17PM +0200, Drouvot, Bertrand wrote:
>> Hi hackers,
>>
>> "SET local" is currently recorded in VariableSetStmt (with the boolean
>> is_local) but "SET session" is not.
>>
>> Please find attached a patch proposal to also record "SET session" so that
>> VariableSetStmt records all the cases.
>>
>> Remark: Recording "SET session" will also help for the Jumbling work being
>> done in [1].
> 
> I don't think it's necessary.  SET and SET SESSION are semantically the same

Thanks for your feedback!

> so
> nothing should rely on how exactly someone spelled it.  This is also the case
> for our core jumbling code, where we guarantee (or at least try to) that two
> semantically identical statements will get the same queryid, and therefore
> don't distinguish eg. LIKE vs ~~.

Agree, but on the other hand currently SET and SET SESSION are recorded 
with distinct queryid:

postgres=# select calls, query, queryid from pg_stat_statements;
  calls |                    query                    |       queryid
-------+---------------------------------------------+----------------------
      2 | select calls, query from pg_stat_statements | -6345508659980235519
      1 | set session enable_seqscan=1                | -3921418831612111986
      1 | create extension pg_stat_statements         | -1739183385080879393
      1 | set enable_seqscan=1                        |  7925920505912025406
(4 rows)

and this behavior would change with the Jumbling work in progress in [1] 
(mentioned up-thread) if we don't record "SET SESSION".

I think that would make sense to keep the same behavior, what do you think?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Record SET session in VariableSetStmt

From
Julien Rouhaud
Date:
On Thu, Oct 06, 2022 at 02:19:32PM +0200, Drouvot, Bertrand wrote:
>
> On 10/6/22 1:18 PM, Julien Rouhaud wrote:
>
> > so
> > nothing should rely on how exactly someone spelled it.  This is also the case
> > for our core jumbling code, where we guarantee (or at least try to) that two
> > semantically identical statements will get the same queryid, and therefore
> > don't distinguish eg. LIKE vs ~~.
>
> Agree, but on the other hand currently SET and SET SESSION are recorded with
> distinct queryid:
>
> postgres=# select calls, query, queryid from pg_stat_statements;
>  calls |                    query                    |       queryid
> -------+---------------------------------------------+----------------------
>      2 | select calls, query from pg_stat_statements | -6345508659980235519
>      1 | set session enable_seqscan=1                | -3921418831612111986
>      1 | create extension pg_stat_statements         | -1739183385080879393
>      1 | set enable_seqscan=1                        |  7925920505912025406
> (4 rows)
>
> and this behavior would change with the Jumbling work in progress in [1]
> (mentioned up-thread) if we don't record "SET SESSION".
>
> I think that would make sense to keep the same behavior, what do you think?

It's because until now jumbling of utility statements was just hashing the
query text, which is quite terrible.  This was also implying getting different
queryids for things like this:

=# select query, queryid from pg_stat_statements where query like '%work_mem%';;
         query         |       queryid
-----------------------+----------------------
 SeT work_mem = 123465 | -1114638544275583196
 Set work_mem = 123465 | -1966597613643458788
 SET work_mem = 123465 |  4161441071081149574
 seT work_mem = 123465 |  8327271737593275474
(4 rows)

If we move to a real jumbling of VariableSetStmt, we should keep the rules
consistent with the rest of the jumble code and ignore an explicit "SESSION" in
the original command.



Re: Record SET session in VariableSetStmt

From
"Drouvot, Bertrand"
Date:

On 10/6/22 2:28 PM, Julien Rouhaud wrote:
> On Thu, Oct 06, 2022 at 02:19:32PM +0200, Drouvot, Bertrand wrote:
>>
>> On 10/6/22 1:18 PM, Julien Rouhaud wrote:
>>
>>> so
>>> nothing should rely on how exactly someone spelled it.  This is also the case
>>> for our core jumbling code, where we guarantee (or at least try to) that two
>>> semantically identical statements will get the same queryid, and therefore
>>> don't distinguish eg. LIKE vs ~~.
>>
>> Agree, but on the other hand currently SET and SET SESSION are recorded with
>> distinct queryid:
>>
>> postgres=# select calls, query, queryid from pg_stat_statements;
>>   calls |                    query                    |       queryid
>> -------+---------------------------------------------+----------------------
>>       2 | select calls, query from pg_stat_statements | -6345508659980235519
>>       1 | set session enable_seqscan=1                | -3921418831612111986
>>       1 | create extension pg_stat_statements         | -1739183385080879393
>>       1 | set enable_seqscan=1                        |  7925920505912025406
>> (4 rows)
>>
>> and this behavior would change with the Jumbling work in progress in [1]
>> (mentioned up-thread) if we don't record "SET SESSION".
>>
>> I think that would make sense to keep the same behavior, what do you think?
> 
> It's because until now jumbling of utility statements was just hashing the
> query text, which is quite terrible.  This was also implying getting different
> queryids for things like this:
> 
> =# select query, queryid from pg_stat_statements where query like '%work_mem%';;
>           query         |       queryid
> -----------------------+----------------------
>   SeT work_mem = 123465 | -1114638544275583196
>   Set work_mem = 123465 | -1966597613643458788
>   SET work_mem = 123465 |  4161441071081149574
>   seT work_mem = 123465 |  8327271737593275474
> (4 rows)
> 
> If we move to a real jumbling of VariableSetStmt, we should keep the rules
> consistent with the rest of the jumble code and ignore an explicit "SESSION" in
> the original command.

Understood, so I agree that it makes sense to keep the jumbling behavior 
consistent and so keep the same queryid for statements that are 
semantically identical.

Thanks for your feedback!

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Record SET session in VariableSetStmt

From
Michael Paquier
Date:
On Thu, Oct 06, 2022 at 08:28:27PM +0800, Julien Rouhaud wrote:
> If we move to a real jumbling of VariableSetStmt, we should keep the rules
> consistent with the rest of the jumble code and ignore an explicit "SESSION" in
> the original command.

Hm, interesting bit, I should study more this area.  So the query ID
calculation actually only cares about the contents of the Nodes
parsed, while the query string used is the one when the entry is
created for the first time.  It seems like the patch to add
TransactionStmt nodes into the jumbling misses something here, as we'd
still compile different query IDs depending on the query string itself
for simple commands like BEGIN or COMMIT.  I'll reply on the other
thread about all that..
--
Michael

Attachment

Re: Record SET session in VariableSetStmt

From
Julien Rouhaud
Date:
On Fri, Oct 07, 2022 at 10:30:28AM +0900, Michael Paquier wrote:
> On Thu, Oct 06, 2022 at 08:28:27PM +0800, Julien Rouhaud wrote:
> > If we move to a real jumbling of VariableSetStmt, we should keep the rules
> > consistent with the rest of the jumble code and ignore an explicit "SESSION" in
> > the original command.
> 
> Hm, interesting bit, I should study more this area.  So the query ID
> calculation actually only cares about the contents of the Nodes
> parsed, while the query string used is the one when the entry is
> created for the first time.  It seems like the patch to add
> TransactionStmt nodes into the jumbling misses something here, as we'd
> still compile different query IDs depending on the query string itself
> for simple commands like BEGIN or COMMIT.  I'll reply on the other
> thread about all that..

Ah, indeed we have different TransactionStmtKind for BEGIN and START!