Re: Generic Monitoring Framework Proposal - Mailing list pgsql-hackers

From Theo Schlossnagle
Subject Re: Generic Monitoring Framework Proposal
Date
Msg-id 3630FC19-345C-4C75-8114-FC8A48E09D71@omniti.com
Whole thread Raw
In response to Re: Generic Monitoring Framework Proposal  (Robert Lor <Robert.Lor@Sun.COM>)
Responses Re: Generic Monitoring Framework Proposal  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
On Jun 19, 2006, at 6:41 PM, Robert Lor wrote:

> Theo Schlossnagle wrote:
>
>>
>> Heh.  Syscall probes and FBT probes in Dtrace have zero  
>> overhead.   User-space probes do have overhead, but it is only a  
>> few instructions  (two I think).  Besically, the probe points are  
>> replaced by illegal  instructions and the kernel infrastructure  
>> for Dtrace will fasttrap  the ops and then act.  So, it is tiny  
>> tiny overhead.  Little enough  that it isn't unreasonable to  
>> instrument things like s_lock which are  tiny.
>
> Theo, you're a genius. FBT (funciton boundary tracing)  probes have  
> zero overhead (section 4.1) and user-space probes has two  
> instructions over head (section 4.2). I was incorrect about making  
> a general zero overhead statement.  But it's so close to zero :-)
>
> http://www.sun.com/bigadmin/content/dtrace/dtrace_usenix.pdf
>
>>
>> The reason that Robert proposes user-space probes (I assume) is  
>> that  tracing C functions can be too granular and not conveniently  
>> expose  the "right" information to make tracing useful.
>
> Yes, I'm proposing user-space probes (aka User Statically-Defined  
> Tracing - USDT). USDT provides a high-level abstraction so the  
> application can expose well defined probes without the user having  
> to know the detailed implementation.  For example, instead of  
> having to know the function LWLockAcquire(), a well documented  
> probe called lwlock_acquire with the appropriate args is much more  
> usable.

I am giving a talk at OSCON this year about PostgreSQL on "big  
systems".  Big is all relative, but I will be talking about dtrace a  
bit and the advantages of running PostgreSQL on Solaris which is what  
we ended up doing after some extremely disturbing experiences on  
Linux.  I was able to track a very acute memory "leak" in pl/perl  
(which Neil so kindly fixed) within a few moments -- and this is  
without explicit user-space trace points.  If there were good user- 
space points, I likely wouldn't have had to dig in the source as a  
pre-cursor to my dtrace efforts.

The things you might be able to do with user-specific trace points:  o better understand the block scatter (distance of
block-level 
 
reads) for a specific query).  o understand lock contention in vastly multiprocessor systems  
using plockstat (my hunch is that heavy-weight locks might be better).    o our current box is 4 way opteron, but we
havea 16-way T2000  
 
as well.  o report on queries including turn-around time, block-accesses,  
lock acquisitions grouped by query for specific time windows.

The nice thing about dtrace is that it requires no "prep" to look at  
a problem.  When something is acting odd in production, you don't  
want to attempt to repeat it in a test environment first.  You want  
to observe it.  Dtrace allows you to dig in "really deep" in  
production with an acceptable performance penalty and ask questions  
that couldn't be asked before.  It is exceptionally clever stuff.  Of  
all the new "neat stuff" in Solaris 10, it has my vote for coolest  
and most useful.  I've nailed  several production problems (outside  
of Postgres) using dtrace with accuracy and efficiency.  When Solaris  
10u2 is released, we'll be trying Postgres on ZFS, so my rankings may  
change :-)

The idea of having intelligently placed dtrace probes in Postrgres  
would allow us to deal with postgres as a "first class" app on  
Solaris 10 with respect to troubleshooting obtuse production  
problems.  That, to me, is exciting stuff.

Best regards,

Theo

// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/
// Ecelerity: Run with it.




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: CVS HEAD busted on Windows?
Next
From: Mark Kirkwood
Date:
Subject: Re: Generic Monitoring Framework Proposal