> On 18 Nov 2024, at 23:33, Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> wrote:
>
> Hi hackers,
>
> Under high-load scenarios with a significant number of transactions per second, pg_stat_statements introduces
substantialoverhead due to the collection and storage of statistics. Currently, we are sometimes forced to disable
pg_stat_statementsor adjust the size of the statistics using pg_stat_statements.max, which is not always optimal. One
potentialsolution to this issue could be query sampling in pg_stat_statements.
>
> A similar approach has been implemented in extensions like auto_explain and pg_store_plans, and it has proven very
usefulin high-load systems. However, this approach has its trade-offs, as it sacrifices statistical accuracy for
improvedperformance. This patch introduces a new configuration parameter, pg_stat_statements.sample_rate for the
pg_stat_statementsextension. The patch provides the ability to control the sampling of query statistics in
pg_stat_statements.
>
> This patch serves as a proof of concept (POC), and I would like to hear your thoughts on whether such an approach is
viableand applicable.
+1 for the idea. I heard a lot of complaints about that pgss is costly. Most of them were using it wrong though. But at
leastit could give an easy way to rule out performance impact of pgss.
> On 19 Nov 2024, at 15:09, Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> wrote:
>
> I believe we should also include this check in the pgss_ExecutorEnd() function because sampling in pgss_ExecutorEnd()
ensuresthat a query not initially sampled in pgss_ExecutorStart() can still be logged if it meets the
pg_stat_statements.sample_ratecriteria. This approach adds flexibility by allowing critical queries to be captured
whilemaintaining efficient sampling.
Is there a reason why pgss_ProcessUtility is excluded?
Best regards, Andrey Borodin.