Hi,
On Tue, Jun 21, 2022 at 08:04:01PM +0000, Imseih (AWS), Sami wrote:
>
> I separated the pg_stat_statements patch. The patch
> Introduces a secondary hash that tracks locations of
> A query ( by queryid ) in the external file.
I still think that's wrong.
> The hash
> remains in lockstep with the pgss_hash using a
> synchronization routine.
Can you describe how it's kept in sync, and how it makes sure that the property
is maintained over restart / gc? I don't see any change in the code for the
2nd part so I don't see how it could work (and after a quick test it indeed
doesn't).
I also don't see any change in the heuristics for need_gc_qtext(), isn't that
going to lead to too frequent gc?
> My testing does not show any regression for workloads
> In which statements are not issues by multiple users/databases.
>
> However, it shows good improvement, 10-15%, when there
> are similar statements that are issues by multiple
> users/databases/tracking levels.
"no regression" and "10-15% improvement" on what?
Can you share more details on the benchmarks you did? Did you also run
benchmark on workloads that induce entry eviction, with and without need for
gc? Eviction in pgss is already very expensive, and making things worse just
to save a bit of disk space doesn't seem like a good compromise.