Thread: Re: Sample rate added to pg_stat_statements
> On 18 Nov 2024, at 23:33, Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> wrote: > > Hi hackers, > > Under high-load scenarios with a significant number of transactions per second, pg_stat_statements introduces substantialoverhead due to the collection and storage of statistics. Currently, we are sometimes forced to disable pg_stat_statementsor adjust the size of the statistics using pg_stat_statements.max, which is not always optimal. One potentialsolution to this issue could be query sampling in pg_stat_statements. > > A similar approach has been implemented in extensions like auto_explain and pg_store_plans, and it has proven very usefulin high-load systems. However, this approach has its trade-offs, as it sacrifices statistical accuracy for improvedperformance. This patch introduces a new configuration parameter, pg_stat_statements.sample_rate for the pg_stat_statementsextension. The patch provides the ability to control the sampling of query statistics in pg_stat_statements. > > This patch serves as a proof of concept (POC), and I would like to hear your thoughts on whether such an approach is viableand applicable. +1 for the idea. I heard a lot of complaints about that pgss is costly. Most of them were using it wrong though. But at leastit could give an easy way to rule out performance impact of pgss. > On 19 Nov 2024, at 15:09, Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> wrote: > > I believe we should also include this check in the pgss_ExecutorEnd() function because sampling in pgss_ExecutorEnd() ensuresthat a query not initially sampled in pgss_ExecutorStart() can still be logged if it meets the pg_stat_statements.sample_ratecriteria. This approach adds flexibility by allowing critical queries to be captured whilemaintaining efficient sampling. Is there a reason why pgss_ProcessUtility is excluded? Best regards, Andrey Borodin.
On Tue, Nov 19, 2024 at 7:12 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
+1 for the idea. I heard a lot of complaints about that pgss is costly. Most of them were using it wrong though.
I'm curious what "using it wrong" means exactly?
Oh, and a +1 in general to the patch, OP, although it would also be nice to start finding the bottlenecks that cause such performance issues.
Cheers,
Greg
On Tue, Nov 19, 2024 at 5:07 PM Michael Paquier <michael@paquier.xyz> wrote:
One piece of it would be to see how much of such "bottlenecks" we
would be able to get rid of by integrating pg_stat_statements into
the central pgstats with the custom APIs, without pushing the module
into core.
Any particular reason these days we cannot push this into core and allow disabling on startup? To say this extension is widely used would be an understatement.
Cheers,
Greg