Re: Log levels for checkpoint/bgwriter monitoring - Mailing list pgsql-hackers

From Robert Treat
Subject Re: Log levels for checkpoint/bgwriter monitoring
Date
Msg-id 200702212322.34447.xzilla@users.sourceforge.net
Whole thread Raw
In response to Log levels for checkpoint/bgwriter monitoring  (Greg Smith <gsmith@gregsmith.com>)
Responses Re: Log levels for checkpoint/bgwriter monitoring
List pgsql-hackers
On Monday 19 February 2007 22:59, Greg Smith wrote:
> I have a WIP patch that adds the main detail I have found I need to
> properly tune checkpoint and background writer activity.  I think it's
> almost ready to submit (you can see the current patch against 8.2 at
> http://www.westnet.com/~gsmith/content/postgresql/patch-checkpoint.txt )
> after making it a bit more human-readable.  But I've realized that along
> with that, I need some guidance in regards to what log level is
> appropriate for this information.
>
> An example works better than explaining what the patch does:
>
> 2007-02-19 21:53:24.602 EST - DEBUG:  checkpoint required (wrote
> checkpoint_segments)
> 2007-02-19 21:53:24.685 EST - DEBUG:  checkpoint starting
> 2007-02-19 21:53:24.705 EST - DEBUG:  checkpoint flushing buffer pool
> 2007-02-19 21:53:24.985 EST - DEBUG:  checkpoint database fsync starting
> 2007-02-19 21:53:42.725 EST - DEBUG:  checkpoint database fsync complete
> 2007-02-19 21:53:42.726 EST - DEBUG:  checkpoint buffer flush dirty=8034
> write=279956 us sync=17739974 us
>
> Remember that "Load distributed checkpoint" discussion back in December?
> You can see exactly how bad the problem is on your system with this log
> style (this is from a pgbench run where it's postively awful--it really
> does take over 17 seconds for the fsync to execute, and there are clients
> that are hung the whole time waiting for it).
>
> I also instrumented the background writer.  You get messages like this:
>
> 2007-02-19 21:58:54.328 EST - DEBUG:  BGWriter Scan All - Written = 5/5
> Unscanned = 23/54
>
> This shows that we wrote (5) the maximum pages we were allowed to write
> (5) while failing to scan almost half (23) of the buffers we meant to look
> at (54).  By taking a look at this output while the system is under load,
> I found I was able to do bgwriter optimization that used to take me days
> of frustrating testing in hours.  I've been waiting for a good guide to
> bgwriter tuning since 8.1 came out.  Once you have this, combined with
> knowing how many buffers were dirty at checkpoint time because the
> bgwriter didn't get to them in time (the number you want to minimize), the
> tuning guide practically writes itself.
>
> So my question is...what log level should all this go at?  Right now, I
> have the background writer stuff adjusting its level dynamically based on
> what happened; it logs at DEBUG2 if it hits the write limit (which should
> be a rare event once you're tuned properly), DEBUG3 for writes that
> scanned everything they were supposed to, and DEBUG4 if it scanned but
> didn't find anything to write.  The source of checkpoint information logs
> at DEBUG1, the fsync/write info at DEBUG2.
>
> I'd like to move some of these up.  On my system, I even have many of the
> checkpoint messages logged at INFO (the source of the checkpoint and the
> total write time line).  It's a bit chatty, but when you get some weird
> system pause issue it makes it easy to figure out if checkpoints were to
> blame.  Given how useful I feel some of these messages are to system
> tuning, and to explaining what currently appears as inexplicable pauses, I
> don't want them to be buried at DEBUG levels where people are unlikely to
> ever see them (I think some people may be concerned about turning on
> things labeled DEBUG at all).  I am aware that I am too deep into this to
> have an unbiased opinion at this point though, which is why I ask for
> feedback on how to proceed here.
>

My impression of this is that DBA's would typically want to run this for a 
short period of time to get thier systems tuned and then it pretty much 
becomes chatter.  Can you come up with an idea of what information DBA's need 
to know?  Would it be better to hide all of this logging behind a guc that 
can be turned on or off (log_bgwriter_activity)? Maybe you could just 
reinsterment it as dtrace probes so a seperate stand-alone process could pull 
the information as needed? :-)

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


pgsql-hackers by date:

Previous
From: Warren Turkal
Date:
Subject: SCMS question
Next
From: Tom Lane
Date:
Subject: Re: Column storage positions