Re: Wait events monitoring future development - Mailing list pgsql-hackers

From Tsunakawa, Takayuki
Subject Re: Wait events monitoring future development
Date
Msg-id 0A3221C70F24FB45833433255569204D1F5C0889@G01JPEXMBYT05
Whole thread Raw
In response to Re: Wait events monitoring future development  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
List pgsql-hackers
From: pgsql-hackers-owner@postgresql.org
> Lets put this in perspective: there's tons of companies that spend thousands
> of dollars per month extra by running un-tuned systems in cloud environments.
> I almost called that "waste" but in reality it should be a simple business
> question: is it worth more to the company to spend resources on reducing
> the AWS bill or rolling out new features?
> It's something that can be estimated and a rational business decision made.
> 
> Where things become completely *irrational* is when a developer reads
> something like "plpgsql blocks with an EXCEPTION handler are more expensive"
> and they freak out and spend a bunch of time trying to avoid them, without
> even the faintest idea of what that overhead actually is.
> More important, they haven't the faintest idea of what that overhead costs
> the company, vs what it costs the company for them to spend an extra hour
> trying to avoid the EXCEPTION (and probably introducing code that's far
> more bug-prone in the process).
> 
> So in reality, the only people likely to notice even something as large
> as a 10% hit are those that were already close to maxing out their hardware
> anyway.
> 
> The downside to leaving stuff like this off by default is users won't
> remember it's there when they need it. At best, that means they spend more
> time debugging something than they need to. At worse, it means they suffer
> a production outage for longer than they need to, and that can easily exceed
> many months/years worth of the extra cost from the monitoring overhead.

I'd rather like this way of positive thinking.  It will be better to think of the event monitoring as a positive
featurefor (daily) proactive improvement, not only as a debugging feature which gives negative image.  For example,
pgAdmin4can display 10 most time-consuming events and their solutions.  The DBA initially places the database and WAL
onthe same volume.  As the system grows and the write workload increases, the DBA can get a suggestion from pgAdmin4
thathe can prepare for the system growth by placing WAL on another volume to reduce WALWriteLock wait events.  This is
notdebugging, but proactive monitoring.
 


> > As another idea, we can stand on the middle ground.  Interestingly, MySQL
> also enables their event monitoring (Performance Schema) by default, but
> not all events are collected.  I guess highly encountered events are not
> collected by default to minimize the overhead.
> 
> That's what we currently do with several track_* and log_*_stats GUCs,
> several of which I forgot even existed until just now. Since there's question
> over the actual overhead maybe that's a prudent approach for now, but I
> think we should be striving to enable these things ASAP.

Agreed.  And as Bruce said, it may be better to be able to disable collection of some events that have visible impact
onperformance.
 

Regards
Takayuki Tsunakawa


pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: multivariate statistics (v19)
Next
From: Michael Paquier
Date:
Subject: Re: Small issues in syncrep.c