Re: Hash id in pg_stat_statements - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: Hash id in pg_stat_statements
Date
Msg-id 20121002165815.GF1267@tamriel.snowman.net
Whole thread Raw
In response to Re: Hash id in pg_stat_statements  (Peter Geoghegan <peter@2ndquadrant.com>)
Responses Re: Hash id in pg_stat_statements  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Hash id in pg_stat_statements  (Peter Geoghegan <peter@2ndquadrant.com>)
Re: Hash id in pg_stat_statements  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-hackers
* Peter Geoghegan (peter@2ndquadrant.com) wrote:
> On 1 October 2012 18:05, Stephen Frost <sfrost@snowman.net> wrote:
> > You're going to have to help me here, 'cause I don't see how there can
> > be duplicates if we include the PGSS_FILE_HEADER as part of the hash,
> > unless we're planning to keep PGSS_FILE_HEADER constant while we change
> > what the hash value is for a given query, yet that goes against the
> > assumptions that were laid out, aiui.
>
> Well, they wouldn't be duplicates if you think that the fact that one
> query was executed before some point release and another after ought
> to differentiate queries. I do not.

This would only be if we happened to change what hash was generated for
a given query during such a point release, where I share your feeling
that it aught to be quite rare.  I'm not suggestion we do this for every
point release...

> By invalidate, I mean that when we go to open the saved file, if the
> header doesn't match, the file is considered corrupt, and we simply
> log that the file could not be read, before unlinking it. This would
> be necessary in the unlikely event of there being some substantive
> change in the representation of query trees in a point release. I am
> not aware of any precedent for this, though Tom said that there was
> one.

Right, and that's all I'm trying to address here- how do we provide a
value for a given query which can be relied upon by outside sources,
even in the face of a point release which changes what our internal hash
value for a given query is.

> I don't want to get too hung up on what we'd do if this problem
> actually occurred, because that isn't what this thread is about.

[...]

> I simply do not understand objections to the proposal. Have I missed something?

It was my impression that the concern is the stability of the hash value
and ensuring that tools which operate on it don't mistakenly lump two
different queries into one because they had the same hash value (caused
by a change in our hashing algorithm or input into it over time, eg a
point release).  I was hoping to address that to allow this proposal to
move forward..
Thanks,
    Stephen

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: xmalloc => pg_malloc
Next
From: Tom Lane
Date:
Subject: Re: Hash id in pg_stat_statements