Re: Publish checkpoint timing and sync files summary data to pg_stat_bgwriter - Mailing list pgsql-hackers

From Greg Smith
Subject Re: Publish checkpoint timing and sync files summary data to pg_stat_bgwriter
Date
Msg-id 4F18F38B.7080108@2ndQuadrant.com
Whole thread Raw
In response to Re: Publish checkpoint timing and sync files summary data to pg_stat_bgwriter  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 01/19/2012 10:52 AM, Robert Haas wrote:
> It's not quite clear from your email, but I gather that the way that
> this is intended to work is that these values increment every time we
> checkpoint?

Right--they get updated in the same atomic bump that moves up things 
like buffers_checkpoint

> Also, forgive for asking this possibly-stupid question, but of what
> use is this information? I can't imagine why I'd care about a running
> total of the number of files fsync'd to disk.  I also can't really
> imagine why I'd care about the length of the write phase, which surely
> will almost always be a function of checkpoint_completion_target and
> checkpoint_timeout unless I manage to overrun the number of
> checkpoint_segments I've allocated.  The only number that really seems
> useful to me is the time spent syncing.  I have a clear idea what to
> look for there: smaller numbers are better than bigger ones.  For the
> rest I'm mystified.

Priority #1 here is to reduce (but, admittedly, not always eliminate) 
the need for log file parsing of this particular area, so including all 
the major bits from the existing log message that can be published this 
way would include the write phase time.  You mentioned one reason why 
the write phase time might be interesting; there could be others.  One 
of the things expected here is that Munin will expand its graphing of 
values from pg_stat_bgwriter to include all these fields.  Most of the 
time the graph of time spent in the write phase will be boring and 
useless.  Making it easy for a look at a graph to spot those rare times 
when it isn't is one motivation for including it.

As for why to include the number of files being sync'd, one reason is 
again simply wanting to include everything that can easily be 
published.  A second is that it helps support ideas like my "Checkpoint 
sync pause" one; that's untunable in any reasonable way without some 
easy way of monitoring the number of files typically sync'd.  Sometimes 
when I'm investigating checkpoint spikes during sync, I wonder whether 
they were because more files than usual were synced, or if it's instead 
just because of more churn on a smaller number.  Making this easy to 
graph pulls that data out to where I can compare it with disk I/O 
trends.  And there's precedent now proving that an always incrementing 
number in pg_stat_bgwriter can be turned into such a graph easily by 
monitoring tools.

> And, it doesn't seem like it's necessarily going to safe me a whole
> lot either, because if it turns out that my sync phases are long, the
> first question out of my mouth is going to be "what percentage of my
> total sync time is accounted for by the longest sync?".  And so right
> there I'm back to the logs.  It's not clear how such information could
> be usefully exposed in pg_stat_bgwriter either, since you probably
> want to know only the last few values, not a total over all time.

This isn't ideal yet.  I mentioned how some future "performance event 
logging history collector" was really needed as a place to push longest 
sync times into, and we don't have it yet.  This is the best thing to 
instrument that I'm sure is useful, and that I can stick onto with the 
existing infrastructure.

The idea is that this change makes it possible to trigger a "sync times 
are too long" alert out of a tool that's based solely on database 
queries.  When that goes off, yes you're possibly back to the logs again 
for more details about the longest individual sync time.  But the rest 
of the time, what's hopefully the normal state of things, you can ignore 
the logs and just track the pg_stat_bgwriter numbers.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Inline Extension
Next
From: Peter Eisentraut
Date:
Subject: Re: pg_upgrade with plpython is broken