On 2013-12-10 17:46:56 -0500, Robert Haas wrote:
> On Tue, Dec 10, 2013 at 5:38 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2013-12-10 14:30:36 -0800, Peter Geoghegan wrote:
> >> Did you really find pg_stat_statements to be almost useless in such
> >> situations? That seems worse than I thought.
> >
> > It's very hard to see where you should spend efforts when every "logical
> > query" is split into hundreds of pg_stat_statement entries. Suddenly
> > it's important whether a certain counts of parameters are more frequent
> > than others because in the equally distributed cases they fall out of
> > p_s_s again pretty soon. I think that's probably a worse than average
> > case, but certainly not something only I could have the bad fortune of
> > looking at.
>
> Right, but the flip side is that you could collapse things that people
> don't want collapsed. If you've got lots of query that differ only in
> that some of them say user_id IN (const1, const2) and others say
> user_id IN (const1, const2, const3) and the constants vary a lot, then
> of course this seems attractive.
Yea, completely agreed. It might also lead to users missing the fact
that their precious prepared-statement cache is just using up loads of
backend memory and individual prepared statements are seldomly
re-executed because there are so many...
> On the other hand if you have two
> queries and one of them looks like this:
>
> WHERE status IN ('active') AND user_id = ?
>
> and the other looks like this:
>
> WHERE status IN ('inactive', 'deleted') AND user_id = ?
That too.
> Part of me wonders if the real solution here is to invent a way to
> support an arbitrarily large hash table of entries efficiently, and
> then let people do further roll-ups of the data in userland if they
> don't like our rollups. Part of the pain here is that when you
> overflow the hash table, you start losing information that can't be
> recaptured after the fact. If said hash table were by chance also
> suitable for use as part of the stats infrastructure, in place of the
> whole-file-rewrite technique we use today, massive win.
>
> Of course, even if we had all this, it necessarily make doing
> additional rollups *easy*; it's easy to construct cases that can be
> handled much better given access to the underlying parse tree
> representation than they can be with sed and awk. But it's a thought.
That would obviously be neat, but I have roughly no clue how to achieve
that. Granular control over how such rollups would work sounds very hard
to achieve unless that granular control just is getting passed a tree
and returning another.
Greetings,
Andres Freund
-- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services