Home > mailing lists

Re: Hash id in pg_stat_statements - Mailing list pgsql-hackers

From	Daniel Farina
Subject	Re: Hash id in pg_stat_statements
Date	October 3, 2012 05:03:45
Msg-id	CAAZKuFYMos+X6+D0TCjBRyRFyZ6ECJTkxTWiPWgMdndvkZ9a+g@mail.gmail.com Whole thread
In response to	Hash id in pg_stat_statements (Magnus Hagander <magnus@hagander.net>)
Responses	Re: Hash id in pg_stat_statements
List	pgsql-hackers

Tree view

On Mon, Oct 1, 2012 at 12:57 AM, Magnus Hagander <magnus@hagander.net> wrote:
> Can we please expose the internal hash id of the statements in
> pg_stat_statements?
>
> I know there was discussions about it earlier, and it wasn't done with
> an argument of it not being stable between releases (IIRC). I think we
> can live with that drawback, assuming of course that we document this
> properly.
>
> I've now run into multiple customer installations where it would be
> very useful to have. The usecase is mainly storing snapshots of the
> pg_stat_statements output over time and analyzing those. Weird things
> happen for example when the query text is the same, but the hash is
> different (which can happen for example when a table is dropped and
> recreated). And even without that, in order to do anything useful with
> it, you end up hashing the query text anyway - so using the already
> existing hash would be easier and more useful.

I have a similar problem, however, I am not sure if the hash generated
is ideal.  Putting aside the number of mechanical, versioning,
shut-down/stats files issues, etc reasons given in the main branch of
the thread, I also have this feeling that it is not what I want.
Consider the following case:

SELECT * FROM users WHERE id = ?

<this query isn't seen for a while>

SELECT * FROM users WHERE id = ?

In the intervening time, an equivalent hash could still be evicted and
reintroduced and the statistics silently reset, and that'll befuddle
principled tools.  This is worse than merely less-useful, because it
can lead to drastic underestimations that otherwise pass inspection.

Instead, I think it makes sense to assign a number -- arbitrarily, but
uniquely -- to the generation of a new row in pg_stat_statements, and,
on the flip side, whenever a row is retired its number should be
eliminated, practically, for-ever.  This way re-introductions between
two samplings of pg_stat_statements cannot be confused for a
contiguously maintained statistic on a query.

-- 
fdr

pgsql-hackers by date:

From: Phil Sorber
Date: 03 October 2012, 03:01:44
Subject: PQping command line tool

From: Simon Riggs
Date: 03 October 2012, 07:48:51
Subject: Re: Support for REINDEX CONCURRENTLY

Re: Hash id in pg_stat_statements - Mailing list pgsql-hackers

Previous

Next