> Unfortunately, these changes do not achieve the intended sampling goal.
> I looked into this more deeply: while the sampled-out queries do not
> appear in pg_stat_statements, an entry is still allocated in the hash
> table after normalization, which, in my view, should not happen when
> sampling is in effect. Therefore, patch v9 is unlikely to meet our needs.
pg_stat_statements creates entries as "sticky" initially to give them
more time to stay in the hash before the first execution completes.
It's not perfect, but it works for the majority of cases. So, what you
are observing is how pg_stat_statements currently works.
If an entry is popular enough, we will need it anyways ( even
with the proposed sampling ). An entry that's not popular will
eventually be aged out.
From my understanding, what the proposed sampling will do is
to reduce the overhead of incrementing counters of popular entries,
because of the spinlock to update the counters. This is particularly
the case with high concurrency on large machines ( high cpu count ),
and especially when there is a small set of popular entries.
IMO, This patch should also have a benchmark that proves
that a user can benefit with sampling in those types of
workloads.
Regards,
Sami
Sami
Regards,
Sami