Re: pg_stat_statements oddity with track = all - Mailing list pgsql-hackers

From Magnus Hagander
Subject Re: pg_stat_statements oddity with track = all
Date
Msg-id CABUevEycR4_3d8s2dU4zJFGs692WVFm0HK5YzFw=86A8J_FkvQ@mail.gmail.com
Whole thread Raw
In response to Re: pg_stat_statements oddity with track = all  (Julien Rouhaud <rjuju123@gmail.com>)
Responses Re: pg_stat_statements oddity with track = all  (Julien Rouhaud <rjuju123@gmail.com>)
List pgsql-hackers
On Tue, Mar 16, 2021 at 4:34 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Tue, Mar 16, 2021 at 12:55:45PM +0100, Magnus Hagander wrote:
> > On Tue, Mar 9, 2021 at 3:39 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > >
> > > I think that we might be able to handle that without a flag.  The only thing
> > > that would need to be done is when creating an entry, look for an existing
> > > entry with the opposite flag, and if there's simply use the same
> > > (query_offset, query_len) info.  This doesn't sound that expensive.
> >
> > That's basically what I was trying to say :)
>
> Oh ok sorry :)
>
> > > The real pain point will be that the garbage collection phase
> > > will become way more expensive as it will now have to somehow maintain that
> > > knowledge, which will require additional lookups for each entry.  I'm a bit
> > > concerned about that, especially with the current heuristic to schedule garbage
> > > collection.  For now, need_qc_qtext says that we have to do it if the extent is
> > > more than 512 (B) * pgss_max.  This probably doesn't work well for people using
> > > ORM as they tend to generate gigantic SQL queries.
> >
> > Right, the cost would be mostly on the GC  side. I've never done any
> > profiling to see how big of a thing that is in systems today -- have
> > you?
>
> I didn't, but I don't see how it could be anything but ridiculously impacting.
> it's basically preventing any query from being planned or executed on the whole
> instance the time needed to read the previous qtext file, and write all entries
> still needed.
>
> > > I don't that think that anyone really had a strong argument, mostly gut
> > > feeling.  Note that pg_stat_kcache already implemented that toplevel flags, so
> > > if people are using that extension in a recent version they might have some
> > > figures to show.  I'll ping some people that I know are using it.
> >
> > Great -- data always wins over gut feelings :)
>
> So I asked some friends that have latest pg_stat_kcache installed on some
> preproduction environment configured to track nested queries.  There isn't a
> high throughput but the activity should still be representative of the
> production queries.  There are a lot of applications plugged there, around 20
> databases and quite a lot of PL code.
>
> After a few days, here are the statistics:
>
> - total of ~ 9500 entries
> - ~ 900 entries for nested statements
> - ~ 35 entries existing for both top level and nested statements
>
> So the duplicates account for less than 4% of the nested statements, and less
> than 0.5% of the whole entries.
>
> I wish I had more reports, but if this one is representative enough then it
> seems that trying to avoid storing duplicated queries wouldn't be worth it.

I agree. If those numbers are indeed representable, it seems like
better to pay that overhead than to pay the overhead of trying to
de-dupe it.

Let's hope they are :)


Looking through ti again my feeling said the toplevel column should go
after the queryid and not before, but I'm not going to open up a
bikeshed over that.

I've added in a comment to cover that one that you removed (if you did
send an updated patch as you said, then I missed it -- sorry), and
applied the rest.

Thanks!

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Support for NSS as a libpq TLS backend
Next
From: Bharath Rupireddy
Date:
Subject: Re: CREATE SEQUENCE with RESTART option