Thread: Re: Sample rate added to pg_stat_statements

Re: Sample rate added to pg_stat_statements

From
"Andrey M. Borodin"
Date:

> On 18 Nov 2024, at 23:33, Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> wrote:
>
> Hi hackers,
>
> Under high-load scenarios with a significant number of transactions per second, pg_stat_statements introduces
substantialoverhead due to the collection and storage of statistics. Currently, we are sometimes forced to disable
pg_stat_statementsor adjust the size of the statistics using pg_stat_statements.max, which is not always optimal. One
potentialsolution to this issue could be query sampling in pg_stat_statements. 
>
> A similar approach has been implemented in extensions like auto_explain and pg_store_plans, and it has proven very
usefulin high-load systems. However, this approach has its trade-offs, as it sacrifices statistical accuracy for
improvedperformance. This patch introduces a new configuration parameter, pg_stat_statements.sample_rate for the
pg_stat_statementsextension. The patch provides the ability to control the sampling of query statistics in
pg_stat_statements.
>
> This patch serves as a proof of concept (POC), and I would like to hear your thoughts on whether such an approach is
viableand applicable. 

+1 for the idea. I heard a lot of complaints about that pgss is costly. Most of them were using it wrong though. But at
leastit could give an easy way to rule out performance impact of pgss. 

> On 19 Nov 2024, at 15:09, Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> wrote:
>
> I believe we should also include this check in the pgss_ExecutorEnd() function because sampling in pgss_ExecutorEnd()
ensuresthat a query not initially sampled in pgss_ExecutorStart() can still be logged if it meets the
pg_stat_statements.sample_ratecriteria. This approach adds flexibility by allowing critical queries to be captured
whilemaintaining efficient sampling. 

Is there a reason why pgss_ProcessUtility is excluded?


Best regards, Andrey Borodin.


Re: Sample rate added to pg_stat_statements

From
Greg Sabino Mullane
Date:
On Tue, Nov 19, 2024 at 7:12 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
+1 for the idea. I heard a lot of complaints about that pgss is costly. Most of them were using it wrong though.

I'm curious what "using it wrong" means exactly?

Oh, and a +1 in general to the patch, OP, although it would also be nice to start finding the bottlenecks that cause such performance issues.

Cheers,
Greg

Re: Sample rate added to pg_stat_statements

From
Greg Sabino Mullane
Date:
On Tue, Nov 19, 2024 at 5:07 PM Michael Paquier <michael@paquier.xyz> wrote:
One piece of it would be to see how much of such "bottlenecks" we
would be able to get rid of by integrating pg_stat_statements into
the central pgstats with the custom APIs, without pushing the module
into core.

Any particular reason these days we cannot push this into core and allow disabling on startup? To say this extension is widely used would be an understatement.

Cheers,
Greg

Re: Sample rate added to pg_stat_statements

From
"Andrey M. Borodin"
Date:

> On 19 Nov 2024, at 17:39, Greg Sabino Mullane <htamfids@gmail.com> wrote:
>
> I'm curious what "using it wrong" means exactly?

Here's an example. pgSCV is querying pgss for every database separately every minute. It makes sense for the project.
Butwhen you have ~1000 databases, you have a lot of traffic to pgss alone. Even if you have only one active database,
becauseall pgss results must be put into tuplestore before filtering. See [0] for details. 


Best regards, Andrey Borodin.

[0] https://www.postgresql.org/message-id/flat/1AEEB240-9B68-44D5-8A29-8F9FDB22C801%40yandex-team.ru


Re: Sample rate added to pg_stat_statements

From
Alexander Korotkov
Date:
On Wed, Nov 20, 2024 at 12:07 AM Michael Paquier <michael@paquier.xyz> wrote:
> On Tue, Nov 19, 2024 at 09:39:21AM -0500, Greg Sabino Mullane wrote:
> > Oh, and a +1 in general to the patch, OP, although it would also be nice to
> > start finding the bottlenecks that cause such performance issues.
>
> FWIW, I'm not eager to integrate this proposal without looking at this
> exact argument in depth.
>
> One piece of it would be to see how much of such "bottlenecks" we
> would be able to get rid of by integrating pg_stat_statements into
> the central pgstats with the custom APIs, without pushing the module
> into core.  This means that we would combine the existing hash of pgss
> to shrink to 8 bytes for objid rather than 13 bytes now as the current
> code relies on (toplevel, userid, queryid) for the entry lookup (entry
> removal is sniped with these three values as well, or dshash seq
> scans).  The odds of conflicts still still play in our favor even if
> we have a few million entries, or even ten times that.

If you run "pgbench -S -M prepared" on a pretty large machine with
high concurrency, then spin lock in pgss_store() could become pretty
much of a bottleneck.  And I'm not sure switching all counters to
atomics could somehow improve the situation given we already have
pretty many counters.

I'm generally +1 for the approach taken in this thread.  But I would
suggest introducing a threshold value for a query execution time, and
sample just everything below that threshold.  The slower query
shouldn't be sampled, because it can't be too frequent, and also it
could be more valuable to be counter individually (while very fast
queries probably only matter "in average").

------
Regards,
Alexander Korotkov
Supabase



Re: Sample rate added to pg_stat_statements

From
Ilia Evdokimov
Date:


On 26.11.2024 01:15, Ilia Evdokimov wrote:


On 22.11.2024 09:08, Alexander Korotkov wrote:
On Wed, Nov 20, 2024 at 12:07 AM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Nov 19, 2024 at 09:39:21AM -0500, Greg Sabino Mullane wrote:
Oh, and a +1 in general to the patch, OP, although it would also be nice to
start finding the bottlenecks that cause such performance issues.
FWIW, I'm not eager to integrate this proposal without looking at this
exact argument in depth.

One piece of it would be to see how much of such "bottlenecks" we
would be able to get rid of by integrating pg_stat_statements into
the central pgstats with the custom APIs, without pushing the module
into core.  This means that we would combine the existing hash of pgss
to shrink to 8 bytes for objid rather than 13 bytes now as the current
code relies on (toplevel, userid, queryid) for the entry lookup (entry
removal is sniped with these three values as well, or dshash seq
scans).  The odds of conflicts still still play in our favor even if
we have a few million entries, or even ten times that.
If you run "pgbench -S -M prepared" on a pretty large machine with
high concurrency, then spin lock in pgss_store() could become pretty
much of a bottleneck.  And I'm not sure switching all counters to
atomics could somehow improve the situation given we already have
pretty many counters.

I'm generally +1 for the approach taken in this thread.  But I would
suggest introducing a threshold value for a query execution time, and
sample just everything below that threshold.  The slower query
shouldn't be sampled, because it can't be too frequent, and also it
could be more valuable to be counter individually (while very fast
queries probably only matter "in average").

------
Regards,
Alexander Korotkov
Supabase

I really liked your idea, and I’d like to propose an enhancement that I believe improves it further.

Yes, if a query’s execution time exceeds the threshold, it should always be tracked without sampling. However, for queries with execution times below the threshold, the sampling logic should prioritize shorter queries over those closer to the threshold. In my view, the ideal approach is for shorter queries to have the highest probability of being sampled, while queries closer to the threshold are less likely to be sampled.

This behavior can be achieved with the following logic:

pg_stat_statements.sample_exectime_threshold * random(0, 1) < total_time

Here’s how it works:

  • As a query’s execution time approaches zero, the probability of it being sampled approaches one.
  • Conversely, as a query’s execution time approaches the threshold, the probability of it being sampled approaches zero.

In other words, the sampling probability decreases linearly from 1 to 0 as the execution time gets closer to the threshold.

I believe this approach offers an ideal user experience. I have attached a new patch implementing this logic. Please let me know if you have any feedback regarding the comments in the code, the naming of variables or documentation. I’m always open to discussion.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.


I’ve been closely reviewing my last (v5-*.patch) patch on implementing time-based sampling, and I’ve realized that it might not be the best approach. Let me explain the reasons.

  • We can only perform sampling before the 'pgss_planner()' function. However, at that point, we don’t yet know the query's execution time since it only becomes available during 'pgss_ExecutorEnd()' or 'pgss_ProcessUtility()';
  • If we wait to sample until the execution completes and we have the actual execution time, this introduces a problem. By that point, we might have already recorded the query's statistics into shared memory from the 'pgss_planner()' making it too late to decide whether to sample the query;
  • Delaying sampling until the execution finishes would require waiting for the execution time, which could introduce performance overhead. This defeats the purpose of sampling, which aims to reduce the cost of tracking query.

I believe we should reconsider the approach and revert to sampling based on v4-*.patch. If I’ve missed anything or there’s an alternative way to implement time threshold-based sampling efficiently, I’d be grateful to hear your insights.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.