Thread: Record SET session in VariableSetStmt
Hi hackers, "SET local" is currently recorded in VariableSetStmt (with the boolean is_local) but "SET session" is not. Please find attached a patch proposal to also record "SET session" so that VariableSetStmt records all the cases. Remark: Recording "SET session" will also help for the Jumbling work being done in [1]. [1]: https://www.postgresql.org/message-id/66be1104-164f-dcb8-6c43-f03a68a139a7%40gmail.com Looking forward to your feedback, Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Hi, On Thu, Oct 06, 2022 at 12:57:17PM +0200, Drouvot, Bertrand wrote: > Hi hackers, > > "SET local" is currently recorded in VariableSetStmt (with the boolean > is_local) but "SET session" is not. > > Please find attached a patch proposal to also record "SET session" so that > VariableSetStmt records all the cases. > > Remark: Recording "SET session" will also help for the Jumbling work being > done in [1]. I don't think it's necessary. SET and SET SESSION are semantically the same so nothing should rely on how exactly someone spelled it. This is also the case for our core jumbling code, where we guarantee (or at least try to) that two semantically identical statements will get the same queryid, and therefore don't distinguish eg. LIKE vs ~~.
Hi, On 10/6/22 1:18 PM, Julien Rouhaud wrote: > Hi, > > On Thu, Oct 06, 2022 at 12:57:17PM +0200, Drouvot, Bertrand wrote: >> Hi hackers, >> >> "SET local" is currently recorded in VariableSetStmt (with the boolean >> is_local) but "SET session" is not. >> >> Please find attached a patch proposal to also record "SET session" so that >> VariableSetStmt records all the cases. >> >> Remark: Recording "SET session" will also help for the Jumbling work being >> done in [1]. > > I don't think it's necessary. SET and SET SESSION are semantically the same Thanks for your feedback! > so > nothing should rely on how exactly someone spelled it. This is also the case > for our core jumbling code, where we guarantee (or at least try to) that two > semantically identical statements will get the same queryid, and therefore > don't distinguish eg. LIKE vs ~~. Agree, but on the other hand currently SET and SET SESSION are recorded with distinct queryid: postgres=# select calls, query, queryid from pg_stat_statements; calls | query | queryid -------+---------------------------------------------+---------------------- 2 | select calls, query from pg_stat_statements | -6345508659980235519 1 | set session enable_seqscan=1 | -3921418831612111986 1 | create extension pg_stat_statements | -1739183385080879393 1 | set enable_seqscan=1 | 7925920505912025406 (4 rows) and this behavior would change with the Jumbling work in progress in [1] (mentioned up-thread) if we don't record "SET SESSION". I think that would make sense to keep the same behavior, what do you think? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Thu, Oct 06, 2022 at 02:19:32PM +0200, Drouvot, Bertrand wrote: > > On 10/6/22 1:18 PM, Julien Rouhaud wrote: > > > so > > nothing should rely on how exactly someone spelled it. This is also the case > > for our core jumbling code, where we guarantee (or at least try to) that two > > semantically identical statements will get the same queryid, and therefore > > don't distinguish eg. LIKE vs ~~. > > Agree, but on the other hand currently SET and SET SESSION are recorded with > distinct queryid: > > postgres=# select calls, query, queryid from pg_stat_statements; > calls | query | queryid > -------+---------------------------------------------+---------------------- > 2 | select calls, query from pg_stat_statements | -6345508659980235519 > 1 | set session enable_seqscan=1 | -3921418831612111986 > 1 | create extension pg_stat_statements | -1739183385080879393 > 1 | set enable_seqscan=1 | 7925920505912025406 > (4 rows) > > and this behavior would change with the Jumbling work in progress in [1] > (mentioned up-thread) if we don't record "SET SESSION". > > I think that would make sense to keep the same behavior, what do you think? It's because until now jumbling of utility statements was just hashing the query text, which is quite terrible. This was also implying getting different queryids for things like this: =# select query, queryid from pg_stat_statements where query like '%work_mem%';; query | queryid -----------------------+---------------------- SeT work_mem = 123465 | -1114638544275583196 Set work_mem = 123465 | -1966597613643458788 SET work_mem = 123465 | 4161441071081149574 seT work_mem = 123465 | 8327271737593275474 (4 rows) If we move to a real jumbling of VariableSetStmt, we should keep the rules consistent with the rest of the jumble code and ignore an explicit "SESSION" in the original command.
On 10/6/22 2:28 PM, Julien Rouhaud wrote: > On Thu, Oct 06, 2022 at 02:19:32PM +0200, Drouvot, Bertrand wrote: >> >> On 10/6/22 1:18 PM, Julien Rouhaud wrote: >> >>> so >>> nothing should rely on how exactly someone spelled it. This is also the case >>> for our core jumbling code, where we guarantee (or at least try to) that two >>> semantically identical statements will get the same queryid, and therefore >>> don't distinguish eg. LIKE vs ~~. >> >> Agree, but on the other hand currently SET and SET SESSION are recorded with >> distinct queryid: >> >> postgres=# select calls, query, queryid from pg_stat_statements; >> calls | query | queryid >> -------+---------------------------------------------+---------------------- >> 2 | select calls, query from pg_stat_statements | -6345508659980235519 >> 1 | set session enable_seqscan=1 | -3921418831612111986 >> 1 | create extension pg_stat_statements | -1739183385080879393 >> 1 | set enable_seqscan=1 | 7925920505912025406 >> (4 rows) >> >> and this behavior would change with the Jumbling work in progress in [1] >> (mentioned up-thread) if we don't record "SET SESSION". >> >> I think that would make sense to keep the same behavior, what do you think? > > It's because until now jumbling of utility statements was just hashing the > query text, which is quite terrible. This was also implying getting different > queryids for things like this: > > =# select query, queryid from pg_stat_statements where query like '%work_mem%';; > query | queryid > -----------------------+---------------------- > SeT work_mem = 123465 | -1114638544275583196 > Set work_mem = 123465 | -1966597613643458788 > SET work_mem = 123465 | 4161441071081149574 > seT work_mem = 123465 | 8327271737593275474 > (4 rows) > > If we move to a real jumbling of VariableSetStmt, we should keep the rules > consistent with the rest of the jumble code and ignore an explicit "SESSION" in > the original command. Understood, so I agree that it makes sense to keep the jumbling behavior consistent and so keep the same queryid for statements that are semantically identical. Thanks for your feedback! Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Thu, Oct 06, 2022 at 08:28:27PM +0800, Julien Rouhaud wrote: > If we move to a real jumbling of VariableSetStmt, we should keep the rules > consistent with the rest of the jumble code and ignore an explicit "SESSION" in > the original command. Hm, interesting bit, I should study more this area. So the query ID calculation actually only cares about the contents of the Nodes parsed, while the query string used is the one when the entry is created for the first time. It seems like the patch to add TransactionStmt nodes into the jumbling misses something here, as we'd still compile different query IDs depending on the query string itself for simple commands like BEGIN or COMMIT. I'll reply on the other thread about all that.. -- Michael
Attachment
On Fri, Oct 07, 2022 at 10:30:28AM +0900, Michael Paquier wrote: > On Thu, Oct 06, 2022 at 08:28:27PM +0800, Julien Rouhaud wrote: > > If we move to a real jumbling of VariableSetStmt, we should keep the rules > > consistent with the rest of the jumble code and ignore an explicit "SESSION" in > > the original command. > > Hm, interesting bit, I should study more this area. So the query ID > calculation actually only cares about the contents of the Nodes > parsed, while the query string used is the one when the entry is > created for the first time. It seems like the patch to add > TransactionStmt nodes into the jumbling misses something here, as we'd > still compile different query IDs depending on the query string itself > for simple commands like BEGIN or COMMIT. I'll reply on the other > thread about all that.. Ah, indeed we have different TransactionStmtKind for BEGIN and START!