Thread: limiting collected query text length in pg_stat_statements
Greetings,
I want to limit the query text that gets captured in pg_stat_statements. We have sql statements with thousands of values clauses (upwards of 10,000) that run at a 1 second interval. When just a handful are running plus 2 or 3 loads using the same technique (10,000 entry values clauses) querying the pg_stat_statements table gets bogged down (see below). With the pg_stat_statements.max is set to 1000 statements just querying the table stats table seems to impact the running statements! I have temporarily staved off the issue by reducing the max to 250 statements, and I have made recommendations to the development team to cut down the number of values clauses. However, it seems to me that the ability to truncate the captured query would be a useful feature.
I’ve peeked at the source code and I don’t see the track_activity_query_size used (pg_stat_activity.query) which would be one mechanism. I don’t really know what would be the right way to do this or even if it is a good idea, i.e. if limiting that would have a larger impact to the statistics ecosystem…
Thoughts or suggestions?
Regards,
pg
# select length(query) from pg_stat_statements;
length
---------
876153
879385
171
44
3796
873527
<snip>
896454
864538
1869286
938
869891
<snip>
883526
877365
(969 rows)
Time: 9898.411 ms (00:09.898)
# select count(*) from pg_stat_statements;
count
-------
971
(1 row)
Time: 6457.985 ms (00:06.458)
Using showtext:=false shows the impact of the large columns:
# select count(*) from pg_stat_statements(showtext:=false);
count
-------
970
(1 row)
Time: 10.644 ms
Phil Godfrin | Database Administration
NOV
NOV US | Engineering Data
9720 Beechnut St | Houston, Texas 77036
M 281.825.2311
Hi, On Mon, May 02, 2022 at 12:45:28PM +0000, Godfrin, Philippe E wrote: > Greetings, > > I want to limit the query text that gets captured in pg_stat_statements. We > have sql statements with thousands of values clauses (upwards of 10,000) that > run at a 1 second interval. When just a handful are running plus 2 or 3 loads > using the same technique (10,000 entry values clauses) querying the > pg_stat_statements table gets bogged down (see below). With the > pg_stat_statements.max is set to 1000 statements just querying the table > stats table seems to impact the running statements! I have temporarily staved > off the issue by reducing the max to 250 statements, and I have made > recommendations to the development team to cut down the number of values > clauses. However, it seems to me that the ability to truncate the captured > query would be a useful feature. The store queries are normalized so the values themselves won't be stored, only a "?" per value. And as long as all the queries have the same number of values there should be a single entry stored for the same role and database, so all in all it should limit the size of the stored query texts. On the other hand, with such a low pg_stat_statements.max, you may have a lot of entry evictions, which tends to bloat the external query file ($PGDATA/pg_stat_tmp/pgss_query_texts.stat). Did you check how big it is and if yes how fast it grows? I've once seen the file being more than 1GB without any reason why, which was obviously slowing everything down. A simple call to pg_stat_statements_reset() fixed the problem, at least as far as I know as I never had access to the server and never had any news after that. > I've peeked at the source code and I don't see the track_activity_query_size > used (pg_stat_activity.query) which would be one mechanism. I don't really > know what would be the right way to do this or even if it is a good idea, > i.e. if limiting that would have a larger impact to the statistics > ecosystem... pg_stat_statements used to truncate the query text to track_activity_query_size, but that limitation was removed when the query texts were moved to the external query file. It's quite convenient to have the full normalized query text available, especially with the application is using some ORM, as they tend to make SQL even more verbose than it already is. Having a very high number of values (and I'm assuming queries with different number of values) seems like a corner case, but truncating the query text would only fix part of the problem. It will lead to a very high number of different queryid, which is also problematic as frequent entry evictions also tends to have an insanely high overhead, and you can't and an infinite number of entries stored. > Thoughts or suggestions? You didn't explain how you're using pg_stat_statements. Do you really need to query pg_stat_statements with the query text each time? If you only need to get some performance metrics you could adapt your system to only retrieve the query text for the wanted queryid(s) once you find some problematic pattern, and/or cache the query texts a table or some other place. But with a very low pg_stat_statements.max (especially if you can have a varying number of values from 1 to 10k) it might be hard to do.