Re: Patch: add timing of buffer I/O requests - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Patch: add timing of buffer I/O requests
Date
Msg-id CAEYLb_WkKZsumQkWhyyBj7Sxq4b2pBhWKrdVE3digAHpdAVPKA@mail.gmail.com
Whole thread Raw
In response to Re: Patch: add timing of buffer I/O requests  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Patch: add timing of buffer I/O requests  ("ktm@rice.edu" <ktm@rice.edu>)
List pgsql-hackers
On 11 April 2012 01:16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Geoghegan <peter@2ndquadrant.com> writes:
>> On 11 April 2012 00:35, Robert Haas <robertmhaas@gmail.com> wrote:
>>> If people need something like that, couldn't they create it by hashing
>>> the normalized query text with an arbitrary algorithm?
>
>> That supposes that the normalised query text is perfectly stable. It
>> may well not be, particularly for things like ad-hoc queries or
>> queries generated by ORMs, across database clusters and over long
>> periods of time -
>
> Indeed, but the hash value isn't stable either given those sorts of
> assumptions, so I'm not convinced that there's any advantage there.

Isn't it? The hash captures the true meaning of the query, while
having the database server's platform as a usually irrelevant
artefact. Another thing that I forgot to mention is client encoding -
it may well be fairly inconvenient to have to use the same algorithm
to hash the query string across applications. You also have to hash
the query string yourself again and again, which is expensive to do
from Python or something, and is often inconvenient - differences
beyond track_activity_query_size bytes (default:1024) are not
recognised. Using an SQL code beautifier where a single byte varies
now breaks everything, which developers don't expect at all (we've
trained them not to), so in many ways you're back to the same
limitations as classic pg_stat_statements if you attempt to aggregate
queries over time and across machines, which is a very real use case.

It's probably pretty annoying to have to get your Python app to use
the same hash function as your Java app or whatever I, unless you want
to use something heavyweight like a cryptographic hash function. By
doing it within Postgres, you avoid those headaches.

I'm not asking you to very loudly proclaim that it should be used like
this - just expose it, accurately document it, and I'm quite confident
that it will be widely used and relied upon by those that are
reasonably well informed, and understand its limitations, which are
really quite straightforward.

> What I think people would actually like to know, if they're in a
> situation where distinct query texts are getting hashed to the same
> thing, is *which* different texts got hashed to the same thing.
> But there's no good way to expose that given the pg_stat_statements
> infrastructure, and exposing the hash value doesn't help.

Apart from detecting the case where we get a straightforward
collision, I don't expect that that would be useful. The whole point
is that the user doesn't care about the difference, and I think we've
specified a practical, widely useful standard for when queries should
be considered equivalent.
--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


pgsql-hackers by date:

Previous
From: Josh Kupershmidt
Date:
Subject: Re: psql: tab completions for 'WITH'
Next
From: Robert Haas
Date:
Subject: Re: pg_tablespace_location() error message