Re: Patch: add timing of buffer I/O requests - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Patch: add timing of buffer I/O requests
Date
Msg-id CAEYLb_WC1=iNmNWH=cQ8TZjj99Wk7YkjzTFRxncFrDw0AB0==A@mail.gmail.com
Whole thread Raw
In response to Re: Patch: add timing of buffer I/O requests  (Greg Stark <stark@mit.edu>)
List pgsql-hackers
On 14 April 2012 02:42, Greg Stark <stark@mit.edu> wrote:
> Is this not subject to the birthday paradox? If you have a given hash
> you're worried about a collision with then you have a
> one-in-four-billion chance. But if you have a collection of hashes and
> you're worried about any collisions then it only takes about 64k
> before there's likely a collision.

Just for the sake of the archives, assuming that there is a uniform
distribution of values, by my calculations that puts the probability
of collision at:

pg_stat_statements.max of 1,000 =   0.00011562303995116263

and perhaps more representatively, if we follow the example in the docs:

pg_stat_statements.max of 10,000 = 0.011496378237656368

It's enough of a problem that I'd expect to hear one or two complaints
about it in the next few years, maybe. This is the probability of
there being a collision, not the probability of someone caring about
it.

You probably wouldn't want to push your luck too far:

pg_stat_statements.max of 100,000 = 0.6853509059051395

Even if you did, that would only mean that there was usually one, but
perhaps two or three values that were collisions, out of an entire
100,000. To labour the point, you'd have to have a lot of bad luck for
those to be the values that a human actually ended up caring about.

Jim Nasby said upthread that selecting from the stats view isn't
performance critical, and he's right. However, maintaining the stats
themselves certainly is, since each and every query will have to
update them, adding latency. pg_stat_statements is far from being a
tool of minority interest, particularly now. People are going to want
to add additional bells and whistles to it, which is fine by me, but
I'm very much opposed to making everyone pay for additional features
that imply performance overhead for all queries, particularly if the
feature is of minority interest.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: [BUGS] BUG #6572: The example of SPI_execute is bogus
Next
From: Heikki Linnakangas
Date:
Subject: Re: Different gettext domain needed for error context