Re: RFC: Timing Events - Mailing list pgsql-hackers

From Greg Smith
Subject Re: RFC: Timing Events
Date
Msg-id 50931A43.7090805@2ndQuadrant.com
Whole thread Raw
In response to Re: RFC: Timing Events  (Josh Berkus <josh@agliodbs.com>)
Responses Re: RFC: Timing Events  (Josh Berkus <josh@agliodbs.com>)
List pgsql-hackers
On 11/1/12 11:54 PM, Josh Berkus wrote:
> For example, it would be really useful to be able to
> see, for example, pg_stat_user_tables from 2 days ago to estimate table
> growth and activity, or pg_stat_replication from 10 minutes ago to
> average replication lag.

I don't see all that going into core without a much bigger push than I 
think people will buy.  What people really want for all these is a 
proper trending system, and that means graphs and dashboards and 
bling--not a history table.  I have almost all of my customers using 
Munin or Cacti or Zabbix or something, and none using pg_statsinfo. 
Shoot, static graphs are barely good enough anymore--people really want 
dynamic ones driven by client-side Javascript.  "Why can't I zoom in on 
this Munin graph, this is lame" they tell me.  I blame Google Maps for 
being the first thing that made all the users so demanding in this area.

But the main weakness of these tools isn't display, is that it's seemed 
impractical to get them to collect per-table data, either for 
configuration, speed, or display reasons.  I'm trying to find a good web 
application toolchain to recommend that does that and dynamic graphs, 
too.  I would never take up the fight to try and build in that direction 
in core though.  I think most people aren't even consuming the 
pg_stat_user_tables data already provided fully yet in userland.

[I fear this topic will turn into a more appropriate one for 
pgsql-advocacy in a hurry if it keeps going]

> So, the problem with joining against pg_stat_statements is that a
> special-purpose incident you're looking at (like a lock_wait) might have
> been pushed "off the bottom" of pg_stat_statements even though it is
> still visible in pg_stat_lock_waits.  No?

This whole approach has the assumption that things are going to fall off 
sometimes.  To expand on that theme for a second, right now I'm more 
worried about the "99%" class of problems.  Neither pg_stat_statements 
nor this idea are very good for tracking the rare rogue problem down. 
They're both aimed to make things that happen a lot more statistically 
likely to be seen, by giving an easier UI to glare at them frequently. 
That's not ideal, but I suspect really fleshing the whole queue consumer 
-> table idea needs to happen to do much better.

Thanks for the quick feedback, there's a lot of ideas I should 
incorporate there I need to chew on.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: RFC: Timing Events
Next
From: Greg Smith
Date:
Subject: Re: Proposal for Allow postgresql.conf values to be changed via SQL