Re: pg_stat_statements oddity with track = all - Mailing list pgsql-hackers
From | Magnus Hagander |
---|---|
Subject | Re: pg_stat_statements oddity with track = all |
Date | |
Msg-id | CABUevEycR4_3d8s2dU4zJFGs692WVFm0HK5YzFw=86A8J_FkvQ@mail.gmail.com Whole thread Raw |
In response to | Re: pg_stat_statements oddity with track = all (Julien Rouhaud <rjuju123@gmail.com>) |
Responses |
Re: pg_stat_statements oddity with track = all
|
List | pgsql-hackers |
On Tue, Mar 16, 2021 at 4:34 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Tue, Mar 16, 2021 at 12:55:45PM +0100, Magnus Hagander wrote: > > On Tue, Mar 9, 2021 at 3:39 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > > I think that we might be able to handle that without a flag. The only thing > > > that would need to be done is when creating an entry, look for an existing > > > entry with the opposite flag, and if there's simply use the same > > > (query_offset, query_len) info. This doesn't sound that expensive. > > > > That's basically what I was trying to say :) > > Oh ok sorry :) > > > > The real pain point will be that the garbage collection phase > > > will become way more expensive as it will now have to somehow maintain that > > > knowledge, which will require additional lookups for each entry. I'm a bit > > > concerned about that, especially with the current heuristic to schedule garbage > > > collection. For now, need_qc_qtext says that we have to do it if the extent is > > > more than 512 (B) * pgss_max. This probably doesn't work well for people using > > > ORM as they tend to generate gigantic SQL queries. > > > > Right, the cost would be mostly on the GC side. I've never done any > > profiling to see how big of a thing that is in systems today -- have > > you? > > I didn't, but I don't see how it could be anything but ridiculously impacting. > it's basically preventing any query from being planned or executed on the whole > instance the time needed to read the previous qtext file, and write all entries > still needed. > > > > I don't that think that anyone really had a strong argument, mostly gut > > > feeling. Note that pg_stat_kcache already implemented that toplevel flags, so > > > if people are using that extension in a recent version they might have some > > > figures to show. I'll ping some people that I know are using it. > > > > Great -- data always wins over gut feelings :) > > So I asked some friends that have latest pg_stat_kcache installed on some > preproduction environment configured to track nested queries. There isn't a > high throughput but the activity should still be representative of the > production queries. There are a lot of applications plugged there, around 20 > databases and quite a lot of PL code. > > After a few days, here are the statistics: > > - total of ~ 9500 entries > - ~ 900 entries for nested statements > - ~ 35 entries existing for both top level and nested statements > > So the duplicates account for less than 4% of the nested statements, and less > than 0.5% of the whole entries. > > I wish I had more reports, but if this one is representative enough then it > seems that trying to avoid storing duplicated queries wouldn't be worth it. I agree. If those numbers are indeed representable, it seems like better to pay that overhead than to pay the overhead of trying to de-dupe it. Let's hope they are :) Looking through ti again my feeling said the toplevel column should go after the queryid and not before, but I'm not going to open up a bikeshed over that. I've added in a comment to cover that one that you removed (if you did send an updated patch as you said, then I missed it -- sorry), and applied the rest. Thanks! -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
pgsql-hackers by date: