Re: Patch: add timing of buffer I/O requests - Mailing list pgsql-hackers

From Greg Smith
Subject Re: Patch: add timing of buffer I/O requests
Date
Msg-id 4ED43C66.6020108@2ndQuadrant.com
Whole thread Raw
In response to Re: Patch: add timing of buffer I/O requests  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Patch: add timing of buffer I/O requests  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 11/28/2011 05:51 AM, Robert Haas wrote:
> On Mon, Nov 28, 2011 at 2:54 AM, Greg Smith<greg@2ndquadrant.com>  wrote:
>> The real problem with this whole area is that we know there are
>> systems floating around where the amount of time taken to grab timestamps
>> like this is just terrible.
> Assuming the feature is off by default (and I can't imagine we'd
> consider anything else), I don't see why that should be cause for
> concern.  If the instrumentation creates too much system load, then
> don't use it: simple as that.

It's not quite that simple though.  Releasing a performance measurement 
feature that itself can perform terribly under undocumented conditions 
has a wider downside than that.

Consider that people aren't going to turn it on until they are already 
overloaded.  If that has the potential to completely tank performance, 
we better make sure that area is at least explored usefully first; the 
minimum diligence should be to document that fact and make suggestions 
for avoiding or testing it.

Instrumentation that can itself become a performance problem is an 
advocacy problem waiting to happen.  As I write this I'm picturing such 
an encounter resulting in an angry blog post, about how this proves 
PostgreSQL isn't usable for serious systems because someone sees massive 
overhead turning this on.  Right now the primary exposure to this class 
of issue is EXPLAIN ANALYZE.  When I was working on my book, I went out 
of my way to find a worst case for that[1], and that turned out to be a 
query that went from 7.994ms to 69.837ms when instrumented.  I've been 
meaning to investigate what was up there since finding that one.  The 
fact that we already have one such problem bit exposed already worries 
me; I'd really prefer not to have two.

[1] (Dell Store 2 schema, query was "SELECT count(*) FROM customers;")


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: CommitFest 2011-11 Post-Tryptophan Progress Report
Next
From: Tom Lane
Date:
Subject: Re: Patch: add timing of buffer I/O requests