> But instead of blindly reducing the frequency via PRNG, we can take a more thoughtful approach with threshold by
executetime:
> Find the most frequent query by column 'calls' in pg_stat_statements;
> In this query look at info about execution time: min_exec_time, max_exec_time, etc;
> Gradually increase the threshold from min_exec_time to max_exec_time, limiting the tracking of this query.
> Monitor performance: once the bottleneck is resolved, stop at the current threshold value.
This approach allows us to:
> Eliminate the spin-lock bottleneck;
> Preserve data about slow queries, which may be critical for performance analysis;
> Reduce the load on the most frequent queries causing contention, instead of uniformly reducing the frequency for all
queries.
In my opinion, sample rate is a better fit for pg_stat_statements,
since the queries that
you care about the most are usually the most frequently executed. Sampling them
will still provide enough good data without the risk of not capturing
statistics about
them at all.
Longer running queries will also likely be the least frequent, so they
are already not likely
contributing to the spinlock contention. Also, the least frequent
queries will likely be aged
out faster, so pg_stat_statements was never really a good candidate to
track those anyways;
slow query logging with log_min_duration_statement is a better way to
ensure you capture
the data.
Maybe others may have a different opinion?
--
Sami