Re: pg_stat_statements fingerprinting logic and ArrayExpr - Mailing list pgsql-hackers

From Andres Freund
Subject Re: pg_stat_statements fingerprinting logic and ArrayExpr
Date
Msg-id 20131210225538.GC7730@awork2.anarazel.de
Whole thread Raw
In response to Re: pg_stat_statements fingerprinting logic and ArrayExpr  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 2013-12-10 17:46:56 -0500, Robert Haas wrote:
> On Tue, Dec 10, 2013 at 5:38 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2013-12-10 14:30:36 -0800, Peter Geoghegan wrote:
> >> Did you really find pg_stat_statements to be almost useless in such
> >> situations? That seems worse than I thought.
> >
> > It's very hard to see where you should spend efforts when every "logical
> > query" is split into hundreds of pg_stat_statement entries. Suddenly
> > it's important whether a certain counts of parameters are more frequent
> > than others because in the equally distributed cases they fall out of
> > p_s_s again pretty soon. I think that's probably a worse than average
> > case, but certainly not something only I could have the bad fortune of
> > looking at.
> 
> Right, but the flip side is that you could collapse things that people
> don't want collapsed.  If you've got lots of query that differ only in
> that some of them say user_id IN (const1, const2) and others say
> user_id IN (const1, const2, const3) and the constants vary a lot, then
> of course this seems attractive.

Yea, completely agreed. It might also lead to users missing the fact
that their precious prepared-statement cache is just using up loads of
backend memory and individual prepared statements are seldomly
re-executed because there are so many...

>  On the other hand if you have two
> queries and one of them looks like this:
> 
> WHERE status IN ('active') AND user_id = ?
> 
> and the other looks like this:
> 
> WHERE status IN ('inactive', 'deleted') AND user_id = ?

That too.

> Part of me wonders if the real solution here is to invent a way to
> support an arbitrarily large hash table of entries efficiently, and
> then let people do further roll-ups of the data in userland if they
> don't like our rollups.  Part of the pain here is that when you
> overflow the hash table, you start losing information that can't be
> recaptured after the fact.  If said hash table were by chance also
> suitable for use as part of the stats infrastructure, in place of the
> whole-file-rewrite technique we use today, massive win.
> 
> Of course, even if we had all this, it necessarily make doing
> additional rollups *easy*; it's easy to construct cases that can be
> handled much better given access to the underlying parse tree
> representation than they can be with sed and awk.  But it's a thought.

That would obviously be neat, but I have roughly no clue how to achieve
that. Granular control over how such rollups would work sounds very hard
to achieve unless that granular control just is getting passed a tree
and returning another.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: pg_stat_statements fingerprinting logic and ArrayExpr
Next
From: Peter Geoghegan
Date:
Subject: Re: pg_stat_statements fingerprinting logic and ArrayExpr