Re: pg_stat_statements fingerprinting logic and ArrayExpr - Mailing list pgsql-hackers

From Robert Haas
Subject Re: pg_stat_statements fingerprinting logic and ArrayExpr
Date
Msg-id CA+TgmoaC-FS45m_X+Gn4QvR4waRuex5x0C8+iFzyQOYP_kVrFw@mail.gmail.com
Whole thread Raw
In response to pg_stat_statements fingerprinting logic and ArrayExpr  (Peter Geoghegan <pg@heroku.com>)
Responses Re: pg_stat_statements fingerprinting logic and ArrayExpr
Re: pg_stat_statements fingerprinting logic and ArrayExpr
Re: pg_stat_statements fingerprinting logic and ArrayExpr
List pgsql-hackers
On Tue, Dec 10, 2013 at 4:30 AM, Peter Geoghegan <pg@heroku.com> wrote:
> pg_stat_statements' fingerprinting logic considers the following two
> statements as distinct:
>
> select 1 in (1, 2, 3);
> select 1 in (1, 2, 3, 4);
>
> This is because the ArrayExpr jumble case jumbles any ArrayExpr's list
> of elements recursively. In this case it's a list of Const nodes, and
> the fingerprinting logic jumbles those nodes indifferently.
>
> Somebody told me that they think that pg_stat_statements should not do
> that. This person felt that it would be preferable for such
> expressions to be normalized without regard to the number of distinct
> Const elements. I suppose that that would work by determing if the
> ArrayExpr elements list was a list of Const nodes and only const
> nodes. Iff that turned out to be the case, something else would be
> jumbled (something other than the list) that would essentially be a
> representation of "some list of zero or more (or maybe one or more)
> Const nodes with consttype of, in this example, 23". I think that this
> would make at least one person happy, because of course the two
> statements above would have their costs aggregated within a single
> pg_stat_statements entry.
>
> I'm not sure that I agree, but there is anecdata that suggests that it
> isn't uncommon for these sorts of queries to be broken out when
> they're all traceable back to a single point in the application
> (apparently it's common for Django apps to do so, perhaps
> questionably). If we assume that doing what I've described has no real
> downside, then it would probably be worth implementing. Plus I'm
> pretty sure that tools that do regex normalization are already doing
> something analogous. Thoughts?

Sounds like this:
   // What's the worst thing that could happen?   pandoras_box.open();

I am very wary of implementing special-case logic here even though I
know it could be useful to some people, simply because I fear that
there could be a near-infinite variety of situations where, in a
particular environment, a particular distinction isn't important.  We
won't be serving anyone well if we ignore all of those distinctions,
because not everyone will want to ignore all of them.  And adding a
configuration option for each one doesn't sound like a good idea,
either.  *Maybe* this particular case is narrow enough that it's OK to
just ignore it unconditionally and maybe that'd be OK ... but I fear
this is a rathole.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: tracking commit timestamps
Next
From: Robert Haas
Date:
Subject: Re: tracking commit timestamps