Re: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation) - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Date
Msg-id CAEYLb_W9vWe0csaaqLedzxxPZPAS7erUfgTF7hLWvfbZ=dsn3w@mail.gmail.com
Whole thread Raw
In response to Re: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
List pgsql-hackers
On 2 March 2012 20:10, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I do intend to take this one up in due course

I probably should have exposed the query_id directly in the
pg_stat_statements view, perhaps as "query_hash". The idea of that
would be to advertise the potential non-uniqueness of the value - a
collision is *extremely* unlikely (as I've previously calculated), but
we cannot preclude the possibility, and as such it isn't *really*
usable as a primary key. BTW, even if there is a collision, we at
least know that there can't be a situation where one user's query
entry gets spurious statistics from the execution of some other
user's, or one database gets statistics from another, since their
corresponding oid values separately form part of the dynahash key,
alongside query_id.

The other reason why I'd like to do this is that I'd like to build on
this work for 9.3, and add a new column - plan_hash. When a new mode,
pg_stat_statements.plan_hash (or somesuch) is disabled (as it is by
default), this is always null, and we get the same 9.2 behaviour. When
it is enabled, however, all existing entries are invalidated, for a
clean slate. We then start hashing both the query tree *and* the query
plan. It's a whole lot less useful if we only hash the latter. Now,
entries within the view use the plan_hash as their key (or maybe a
composite of query_hash and plan_hash). This often results in entries
with duplicate query_hash values, as the planner generates different
plans for equivalent queries, but that doesn't matter; you can easily
write an aggregate query with a "GROUP BY query_hash" clause if that's
what you happen to want to see.

When this optional mode is enabled, at that point we'd probably also
separately instrument planning time, as recently proposed by Fujii.

Does that seem like an interesting idea?

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: LIST OWNED BY...
Next
From: "Kevin Grittner"
Date:
Subject: Re: pg_upgrade and statistics