I saw a database recently where some app was inserting the source port into the application_name field, which meant that pg_stat_statements.max was quickly reached and queries were simply pouring in and out of pg_stat_statements, dominated by some "SET application_name = 'myapp
10.0.0.1:1234'" calls. Which got me thinking, is there really any value to having non-normalized 'SET application_name' queries inside of pg_stat_statements? Or any SET stuff, for that matter?
Attached please find a small proof-of-concept for normalizing/de-jumbling certain SET queries. Because we only want to cover the VAR_SET_VALUE parts of VariableSetStmt, a custom jumble func was needed. There are a lot of funky SET things inside of gram.y as well that don't do the standard SET X = Y formula (e.g. SET TIME ZONE, SET SCHEMA). I tried to handle those as best I could, and carved a couple of exceptions for time zones and xml.
I'm not sure where else to possibly draw lines. Obviously calls to time zone have a small and finite pool of possible values, so easy enough to exclude them, while things like application_name and work_mem are fairly infinite, so great candidates for normalizing. One could argue for simply normalizing everything, as SET is trivially fast for purposes of performance tracking via pg_stat_statements, so who cares if we don't have the exact string? That's what regular logging is for, after all. Most importantly, less unique queryids means less chance that errant SETs will crowd out the more important stuff.
In summary, we want to change this:
SELECT calls, query from pg_stat_statements where query ~ 'set' order by 1;
1 | set application_name = 'alice'
1 | set application_name = 'bob'
1 | set application_name = 'eve'
1 | set application_name = 'mallory'
to this:
SELECT calls, query from pg_stat_statements where query ~ 'set' order by 1;
4 | set application_name = $1
I haven't updated the regression tests yet, until we reach a consensus on how thorough the normalizing should be. But there is a new test to exercise the changes in gram.y.
Cheers,
Greg