Thread: limiting collected query text length in pg_stat_statements

limiting collected query text length in pg_stat_statements

From
"Godfrin, Philippe E"
Date:

Greetings,

 

I want to limit the query text that gets captured in pg_stat_statements. We have sql statements with thousands of values clauses (upwards of 10,000) that run at a 1 second interval. When just a handful are running plus 2 or 3 loads using the same technique (10,000 entry values clauses) querying the pg_stat_statements table gets bogged down (see below). With the pg_stat_statements.max is set to 1000 statements just querying the table stats table seems to impact the running statements! I have temporarily staved off the issue by reducing the max to 250 statements, and I have made recommendations to the development team to cut down the number of values clauses. However, it seems to me that the ability to truncate the captured query would be a useful feature.

 

I’ve peeked at the source code and I don’t see the track_activity_query_size used (pg_stat_activity.query) which would be one mechanism. I don’t really know what would be the right way to do this or even if it is a good idea, i.e. if limiting that would have a larger impact to the statistics ecosystem…

                                                         

Thoughts or suggestions?

Regards,

pg

 

 

# select length(query) from pg_stat_statements;

length 

---------

  876153

  879385

     171

      44

    3796

  873527

  <snip>

  896454

  864538

1869286

     938

  869891

  <snip>

  883526

  877365

(969 rows)

 

Time: 9898.411 ms (00:09.898)

 

# select count(*) from pg_stat_statements;

count

-------

   971

(1 row)

Time: 6457.985 ms (00:06.458)

 

Using showtext:=false shows the impact of the large columns:

 

# select count(*) from pg_stat_statements(showtext:=false);

count

-------

   970

(1 row)

Time: 10.644 ms

 

 

 

 

Phil Godfrin | Database Administration

NOV

NOV US | Engineering Data

9720 Beechnut St | Houston, Texas 77036

M  281.825.2311

E   Philippe.Godfrin@nov.com

 

Re: limiting collected query text length in pg_stat_statements

From
Julien Rouhaud
Date:
Hi,

On Mon, May 02, 2022 at 12:45:28PM +0000, Godfrin, Philippe E wrote:
> Greetings,
> 
> I want to limit the query text that gets captured in pg_stat_statements. We
> have sql statements with thousands of values clauses (upwards of 10,000) that
> run at a 1 second interval. When just a handful are running plus 2 or 3 loads
> using the same technique (10,000 entry values clauses) querying the
> pg_stat_statements table gets bogged down (see below). With the
> pg_stat_statements.max is set to 1000 statements just querying the table
> stats table seems to impact the running statements! I have temporarily staved
> off the issue by reducing the max to 250 statements, and I have made
> recommendations to the development team to cut down the number of values
> clauses. However, it seems to me that the ability to truncate the captured
> query would be a useful feature.

The store queries are normalized so the values themselves won't be stored, only
a "?" per value.  And as long as all the queries have the same number of values
there should be a single entry stored for the same role and database, so all in
all it should limit the size of the stored query texts.

On the other hand, with such a low pg_stat_statements.max, you may have a lot
of entry evictions, which tends to bloat the external query file
($PGDATA/pg_stat_tmp/pgss_query_texts.stat).  Did you check how big it is and
if yes how fast it grows?  I've once seen the file being more than 1GB without
any reason why, which was obviously slowing everything down.  A simple call to
pg_stat_statements_reset() fixed the problem, at least as far as I know as I
never had access to the server and never had any news after that.

> I've peeked at the source code and I don't see the track_activity_query_size
> used (pg_stat_activity.query) which would be one mechanism. I don't really
> know what would be the right way to do this or even if it is a good idea,
> i.e. if limiting that would have a larger impact to the statistics
> ecosystem...

pg_stat_statements used to truncate the query text to
track_activity_query_size, but that limitation was removed when the query texts
were moved to the external query file.  It's quite convenient to have the full
normalized query text available, especially with the application is using some
ORM, as they tend to make SQL even more verbose than it already is.  Having a
very high number of values (and I'm assuming queries with different number of
values) seems like a corner case, but truncating the query
text would only fix part of the problem.  It will lead to a very high number of
different queryid, which is also problematic as frequent entry evictions also
tends to have an insanely high overhead, and you can't and an infinite number
of entries stored.

> Thoughts or suggestions?

You didn't explain how you're using pg_stat_statements.  Do you really need to
query pg_stat_statements with the query text each time?  If you only need to
get some performance metrics you could adapt your system to only retrieve the
query text for the wanted queryid(s) once you find some problematic pattern,
and/or cache the query texts a table or some other place.  But with a very low
pg_stat_statements.max (especially if you can have a varying number of values
from 1 to 10k) it might be hard to do.