Re: procpid? - Mailing list pgsql-hackers

From Greg Smith
Subject Re: procpid?
Date
Msg-id 4DF8608A.70708@2ndQuadrant.com
Whole thread Raw
In response to Re: procpid?  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: procpid?
List pgsql-hackers
Here's the sort of thing every person who writes a monitoring tool 
involving pg_stat_activity goes through:

1) Hurray!  I know how to see what the database is doing now!  Let me 
try counting all the connections so I can finally figure out what to set 
[max_connections | work_mem | other] to.
2) Wait, some of these can be "<IDLE>".  That's not documented.  I'll 
have to special case them because they don't really matter for my 
computation.
3) Seriously, there's another state for idle in a transaction?  Just how 
many of these special values are there?  [There's actually one more 
surprise after this]

The whole thing is enormously frustrating, and it's an advocacy 
problem--it contributes to people just starting to become serious about 
using PostgreSQL lowering their opinion of its suitability for their 
business.  If this is what's included for activity monitoring, and it's 
this terrible, it suggest people must not have very high requirements 
for that.

And what you end up with to make it better is not just another few 
keystrokes.  Here, as a common example I re-use a lot, is a decoder 
inspired by Munin's connection count monitoring graph:

SELECT    waiting,    CASE WHEN current_query='<IDLE>' THEN true ELSE false END AS idle,    CASE WHEN
current_query='<IDLE>in transaction' THEN true ELSE 
 
false END AS idletransaction,    CASE WHEN current_query='<insufficient privilege>' THEN false ELSE 
true END as visible,    CASE WHEN NOT waiting AND current_query NOT IN ('<IDLE>', '<IDLE> 
in transaction', '<insufficient privilege>') THEN true ELSE false END AS 
active,    procpid,current_query
FROM pg_stat_activity WHERE procpid != pg_backend_pid();

What percentage of people do you think get this right?  Now, what does 
that number go to if these states were all obviously exposed booleans?  
As I'm concerned, this design is fundamentally flawed as currently 
delivered, so the concept of "breaking" it doesn't really make sense.

The fact that you can only figure all this decoding magic out through 
extensive trial and error, or reading the source code to [the database | 
another monitoring tool], is crazy.  It's a much bigger problem than the 
fact that the pid column is misnamed, and way up on my list of things 
I'm just really tired of doing.  Yes, we could just document all these 
mystery states to help, but they'd still be terrible.

This is a database; let's expose the data in a way that it's easy to 
slice yourself using a database query.  And if we're going to fix 
that--which unfortunately will be breaking it relative to those already 
using the current format--I figure why not bundle the procpid fix into 
that while we're at it.  It's even possible to argue that breaking that 
small thing will draw useful attention to the improvements in other 
parts of the view.  Having your monitoring query break after a version 
upgrade is no fun.  But if investigating why reveals new stuff you 
didn't notice in the release notes, the changes become more 
discoverable, albeit in a somewhat perverse way.

Putting on my stability hat instead of my "make it right" one, maybe 
this really makes sense to expose as a view with a whole new name.  Make 
this new one pg_activity (there's no stats here anyway), keep the old 
one around as pg_stat_activity for a few releases until everyone has 
converted to the new one.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us




pgsql-hackers by date:

Previous
From: Dean Rasheed
Date:
Subject: Re: creating CHECK constraints as NOT VALID
Next
From: Robert Creager
Date:
Subject: Re: Polecat "quit unexpectdly"