Home > mailing lists

Re: Hash id in pg_stat_statements - Mailing list pgsql-hackers

From	Stephen Frost
Subject	Re: Hash id in pg_stat_statements
Date	October 2, 2012 16:58:26
Msg-id	20121002165815.GF1267@tamriel.snowman.net Whole thread Raw
In response to	Re: Hash id in pg_stat_statements (Peter Geoghegan <peter@2ndquadrant.com>)
Responses	Re: Hash id in pg_stat_statements Re: Hash id in pg_stat_statements Re: Hash id in pg_stat_statements
List	pgsql-hackers

Tree view

* Peter Geoghegan (peter@2ndquadrant.com) wrote:
> On 1 October 2012 18:05, Stephen Frost <sfrost@snowman.net> wrote:
> > You're going to have to help me here, 'cause I don't see how there can
> > be duplicates if we include the PGSS_FILE_HEADER as part of the hash,
> > unless we're planning to keep PGSS_FILE_HEADER constant while we change
> > what the hash value is for a given query, yet that goes against the
> > assumptions that were laid out, aiui.
>
> Well, they wouldn't be duplicates if you think that the fact that one
> query was executed before some point release and another after ought
> to differentiate queries. I do not.

This would only be if we happened to change what hash was generated for
a given query during such a point release, where I share your feeling
that it aught to be quite rare.  I'm not suggestion we do this for every
point release...

> By invalidate, I mean that when we go to open the saved file, if the
> header doesn't match, the file is considered corrupt, and we simply
> log that the file could not be read, before unlinking it. This would
> be necessary in the unlikely event of there being some substantive
> change in the representation of query trees in a point release. I am
> not aware of any precedent for this, though Tom said that there was
> one.

Right, and that's all I'm trying to address here- how do we provide a
value for a given query which can be relied upon by outside sources,
even in the face of a point release which changes what our internal hash
value for a given query is.

> I don't want to get too hung up on what we'd do if this problem
> actually occurred, because that isn't what this thread is about.

[...]

> I simply do not understand objections to the proposal. Have I missed something?

It was my impression that the concern is the stability of the hash value
and ensuring that tools which operate on it don't mistakenly lump two
different queries into one because they had the same hash value (caused
by a change in our hashing algorithm or input into it over time, eg a
point release).  I was hoping to address that to allow this proposal to
move forward..
Thanks,
    Stephen

pgsql-hackers by date:

From: Andres Freund
Date: 02 October 2012, 16:52:49
Subject: Re: xmalloc => pg_malloc

From: Tom Lane
Date: 02 October 2012, 17:16:27
Subject: Re: Hash id in pg_stat_statements

Re: Hash id in pg_stat_statements - Mailing list pgsql-hackers

Previous

Next