pg_stat_statements - Mailing list pgsql-hackers

From Godfrin, Philippe E
Subject pg_stat_statements
Date
Msg-id SA0PR15MB3933A83F18E82A2338A8EB2D82C09@SA0PR15MB3933.namprd15.prod.outlook.com
Whole thread Raw
Responses Re: pg_stat_statements
List pgsql-hackers

Hi,

On Mon, May 02, 2022 at 12:45:28PM +0000, Godfrin, Philippe E wrote:

>> Greetings,

>>

>> I want to limit the query text that gets captured in pg_stat_statements. We

>> have sql statements with thousands of values clauses (upwards of 10,000) that

>> run at a 1 second interval. When just a handful are running plus 2 or 3 loads

>> using the same technique (10,000 entry values clauses) querying the

>> pg_stat_statements table gets bogged down (see below). With the

>> pg_stat_statements.max is set to 1000 statements just querying the table

>> stats table seems to impact the running statements! I have temporarily staved

>> off the issue by reducing the max to 250 statements, and I have made

>> recommendations to the development team to cut down the number of values

>> clauses. However, it seems to me that the ability to truncate the captured

>> query would be a useful feature.

 

>The store queries are normalized so the values themselves won't be stored, only

>a "?" per value. And as long as all the queries have the same number of values

>there should be a single entry stored for the same role and database, so all in

>all it should limit the size of the stored query texts.

>On the other hand, with such a low pg_stat_statements.max, you may have a lot

>of entry evictions, which tends to bloat the external query file

>($PGDATA/pg_stat_tmp/pgss_query_texts.stat). Did you check how big it is and

>if yes how fast it grows? I've once seen the file being more than 1GB without

>any reason why, which was obviously slowing everything down. A simple call to

>pg_stat_statements_reset() fixed the problem, at least as far as I know as I

>never had access to the server and never had any news after that.

 

I wasn’t exactly clear about the queries. The values clauses themselves are not long –

We are using repeated values clauses:

 

INSERT INTO timeseries.dvc_104 (tag_id, event_ts, bool_val, float_val, int_val, string_val, last_updt_id)

VALUES ($1,$2,$3,$4,$5,$6,$7),($8,$9,$10,$11,$12,$13,$14),($15,$16,$17,$18,$19,$20,$21),

($22,$23,$24,$25,$26,$27,$28),($29,$30,$31,$32,$33,$34,$35),($36,$37,$38,$39,$40,$41,$42),

($43,$44,$45,$46,$47,$48,$49),($50,$51,$52,$53,$54,$55,$56),($57,$58,$59,$60,$61,$62,$63),

($64,$65,$66,$67,$68,$69,$70),($71,$72,$73,$74,$75,$76,$77),($78,$79,$80,$81,$82,$83,$84),

($85,$86,$87,$88,$89,$90,$91),($92,$93,$94,$95,$96,$97,$98)

 

This one’s not long, but some ‘load statements’ have 10,000 values clauses, others add up to 10,000 more

in an ON CONFLICT clause. I’ve checked the external Query file and it’s currently not large

at all. But I will keep an eye on that. When I had The settings at 1000 statements

the file was indeed over 1GB. For the record, development is reducing those statement

lengths.

 

>> I've peeked at the source code and I don't see the track_activity_query_size

>> used (pg_stat_activity.query) which would be one mechanism. I don't really

>> know what would be the right way to do this or even if it is a good idea,

>> i.e. if limiting that would have a larger impact to the statistics

>> ecosystem...

 

>pg_stat_statements used to truncate the query text to

>track_activity_query_size, but that limitation was removed when the query texts

>were moved to the external query file. It's quite convenient to have the full

>normalized query text available, especially with the application is using some

>ORM, as they tend to make SQL even more verbose than it already is. Having a

>very high number of values (and I'm assuming queries with different number of

>values) seems like a corner case, but truncating the query

>text would only fix part of the problem. It will lead to a very high number of

>different queryid, which is also problematic as frequent entry evictions also

>tends to have an insanely high overhead, and you can't and an infinite number

>of entries stored.

 

>> Thoughts or suggestions?

 

>You didn't explain how you're using pg_stat_statements. Do you really need to

>query pg_stat_statements with the query text each time? If you only need to

>get some performance metrics you could adapt your system to only retrieve the

>query text for the wanted queryid(s) once you find some problematic pattern,

>and/or cache the query texts a table or some other place. But with a very low

>pg_stat_statements.max (especially if you can have a varying number of values

>from 1 to 10k) it might be hard to do.

 

The first observation is how long a simple query took:

 

# select count(*) from pg_stat_statements;

count

-------

   971

Time: 6457.985 ms (00:06.458)

 

MORE than six seconds for a mere 971 rows! Furthermore, when removing the long queries:

# select count(*) from pg_stat_statements(showtext:=false);

count

-------

   970

Time: 10.644 ms

 

Only 10ms…

 

Second, we have Datadog installed. Datadoq queries the pg_stat_statements table

every 10 seconds. The real pain point is querying the pg_stat_statements seems

to have an impact on running queries, specifically inserts in my case.

 

I believe this is an actual impact that needs a solution.

 

 

My apologies, for some reason these mails are not making it to me.

 

Phil Godfrin | Database Administration

NOV

NOV US | Engineering Data

9720 Beechnut St | Houston, Texas 77036

M  281.825.2311

E   Philippe.Godfrin@nov.com

 

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: fix cost subqueryscan wrong parallel cost
Next
From: Noah Misch
Date:
Subject: Re: testclient.exe installed under MSVC