On 09.01.2025 05:29, Sami Imseih wrote:
>> Unfortunately, these changes do not achieve the intended sampling goal.
>> I looked into this more deeply: while the sampled-out queries do not
>> appear in pg_stat_statements, an entry is still allocated in the hash
>> table after normalization, which, in my view, should not happen when
>> sampling is in effect. Therefore, patch v9 is unlikely to meet our needs.
> pg_stat_statements creates entries as "sticky" initially to give them
> more time to stay in the hash before the first execution completes.
> It's not perfect, but it works for the majority of cases. So, what you
> are observing is how pg_stat_statements currently works.
>
> If an entry is popular enough, we will need it anyways ( even
> with the proposed sampling ). An entry that's not popular will
> eventually be aged out.
>
> From my understanding, what the proposed sampling will do is
> to reduce the overhead of incrementing counters of popular entries,
> because of the spinlock to update the counters. This is particularly
> the case with high concurrency on large machines ( high cpu count ),
> and especially when there is a small set of popular entries.
> IMO, This patch should also have a benchmark that proves
> that a user can benefit with sampling in those types of
> workloads.
Ah, so patch version 9 might be the best fit to achieve this. I’ll need
to benchmark it on a large, high-concurrency machine then.
--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.