Re: Wait events monitoring future development - Mailing list pgsql-hackers

From Jim Nasby
Subject Re: Wait events monitoring future development
Date
Msg-id 9eda4c7a-6149-7493-5339-099a787e8cfd@BlueTreble.com
Whole thread Raw
In response to Re: Wait events monitoring future development  ("Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com>)
Responses Re: Wait events monitoring future development  ("Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com>)
Re: Wait events monitoring future development  (Craig Ringer <craig@2ndquadrant.com>)
List pgsql-hackers
On 8/8/16 11:07 PM, Tsunakawa, Takayuki wrote:
> From: pgsql-hackers-owner@postgresql.org
>> > If you want to know why people are against enabling this monitoring by
>> > default, above is the reason.  What percentage of people do you think would
>> > be willing to take a 10% performance penalty for monitoring like this?  I
>> > would bet very few, but the argument above doesn't seem to address the fact
>> > it is a small percentage.
>> >
>> > In fact, the argument above goes even farther, saying that we should enable
>> > it all the time because people will be unwilling to enable it on their own.
>> > I have to question the value of the information if users are not willing
>> > to enable it.  And the solution proposed is to force the 10% default overhead
>> > on everyone, whether they are currently doing debugging, whether they will
>> > ever do this level of debugging, because people will be too scared to enable
>> > it.  (Yes, I think Oracle took this
>> > approach.)


Lets put this in perspective: there's tons of companies that spend 
thousands of dollars per month extra by running un-tuned systems in 
cloud environments. I almost called that "waste" but in reality it 
should be a simple business question: is it worth more to the company to 
spend resources on reducing the AWS bill or rolling out new features? 
It's something that can be estimated and a rational business decision made.

Where things become completely *irrational* is when a developer reads 
something like "plpgsql blocks with an EXCEPTION handler are more 
expensive" and they freak out and spend a bunch of time trying to avoid 
them, without even the faintest idea of what that overhead actually is. 
More important, they haven't the faintest idea of what that overhead 
costs the company, vs what it costs the company for them to spend an 
extra hour trying to avoid the EXCEPTION (and probably introducing code 
that's far more bug-prone in the process).

So in reality, the only people likely to notice even something as large 
as a 10% hit are those that were already close to maxing out their 
hardware anyway.

The downside to leaving stuff like this off by default is users won't 
remember it's there when they need it. At best, that means they spend 
more time debugging something than they need to. At worse, it means they 
suffer a production outage for longer than they need to, and that can 
easily exceed many months/years worth of the extra cost from the 
monitoring overhead.

>> > We can talk about this feature all we want, but if we are not willing to
>> > be realistic in how much performance penalty the _average_ user is willing
>> > to lose to have this monitoring, I fear we will make little progress on
>> > this feature.
> OK, 10% was an overstatement.  Anyway, As Amit said, we can discuss the default value based on the performance
evaluationbefore release.
 
>
> As another idea, we can stand on the middle ground.  Interestingly, MySQL also enables their event monitoring
(PerformanceSchema) by default, but not all events are collected.  I guess highly encountered events are not collected
bydefault to minimize the overhead.
 

That's what we currently do with several track_* and log_*_stats GUCs, 
several of which I forgot even existed until just now. Since there's 
question over the actual overhead maybe that's a prudent approach for 
now, but I think we should be striving to enable these things ASAP.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: dsm_unpin_segment
Next
From: Claudio Freire
Date:
Subject: Re: Heap WARM Tuples - Design Draft