Re: Log levels for checkpoint/bgwriter monitoring - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Log levels for checkpoint/bgwriter monitoring
Date
Msg-id 200703271740.l2RHebN07393@momjian.us
Whole thread Raw
In response to Re: Log levels for checkpoint/bgwriter monitoring  (Magnus Hagander <magnus@hagander.net>)
List pgsql-hackers
Magnus Hagander wrote:
> Would not at least some of these numbers be better presented through the
> stats collector, so they can be easily monitored?
> 
> That goes along the line of my way way way away from finished attempt
> earlier, perhaps a combination of these two patches?

Yes.

---------------------------------------------------------------------------


> 
> //Magnus
> 
> 
> Bruce Momjian wrote:
> > Is this patch ready?
> > 
> > ---------------------------------------------------------------------------
> > 
> > Greg Smith wrote:
> >> I have a WIP patch that adds the main detail I have found I need to 
> >> properly tune checkpoint and background writer activity.  I think it's 
> >> almost ready to submit (you can see the current patch against 8.2 at 
> >> http://www.westnet.com/~gsmith/content/postgresql/patch-checkpoint.txt ) 
> >> after making it a bit more human-readable.  But I've realized that along 
> >> with that, I need some guidance in regards to what log level is 
> >> appropriate for this information.
> >>
> >> An example works better than explaining what the patch does:
> >>
> >> 2007-02-19 21:53:24.602 EST - DEBUG:  checkpoint required (wrote 
> >> checkpoint_segments)
> >> 2007-02-19 21:53:24.685 EST - DEBUG:  checkpoint starting
> >> 2007-02-19 21:53:24.705 EST - DEBUG:  checkpoint flushing buffer pool
> >> 2007-02-19 21:53:24.985 EST - DEBUG:  checkpoint database fsync starting
> >> 2007-02-19 21:53:42.725 EST - DEBUG:  checkpoint database fsync complete
> >> 2007-02-19 21:53:42.726 EST - DEBUG:  checkpoint buffer flush dirty=8034 
> >> write=279956 us sync=17739974 us
> >>
> >> Remember that "Load distributed checkpoint" discussion back in December? 
> >> You can see exactly how bad the problem is on your system with this log 
> >> style (this is from a pgbench run where it's postively awful--it really 
> >> does take over 17 seconds for the fsync to execute, and there are clients 
> >> that are hung the whole time waiting for it).
> >>
> >> I also instrumented the background writer.  You get messages like this:
> >>
> >> 2007-02-19 21:58:54.328 EST - DEBUG:  BGWriter Scan All - Written = 5/5 
> >> Unscanned = 23/54
> >>
> >> This shows that we wrote (5) the maximum pages we were allowed to write 
> >> (5) while failing to scan almost half (23) of the buffers we meant to look 
> >> at (54).  By taking a look at this output while the system is under load, 
> >> I found I was able to do bgwriter optimization that used to take me days 
> >> of frustrating testing in hours.  I've been waiting for a good guide to 
> >> bgwriter tuning since 8.1 came out.  Once you have this, combined with 
> >> knowing how many buffers were dirty at checkpoint time because the 
> >> bgwriter didn't get to them in time (the number you want to minimize), the 
> >> tuning guide practically writes itself.
> >>
> >> So my question is...what log level should all this go at?  Right now, I 
> >> have the background writer stuff adjusting its level dynamically based on 
> >> what happened; it logs at DEBUG2 if it hits the write limit (which should 
> >> be a rare event once you're tuned properly), DEBUG3 for writes that 
> >> scanned everything they were supposed to, and DEBUG4 if it scanned but 
> >> didn't find anything to write.  The source of checkpoint information logs 
> >> at DEBUG1, the fsync/write info at DEBUG2.
> >>
> >> I'd like to move some of these up.  On my system, I even have many of the 
> >> checkpoint messages logged at INFO (the source of the checkpoint and the 
> >> total write time line).  It's a bit chatty, but when you get some weird 
> >> system pause issue it makes it easy to figure out if checkpoints were to 
> >> blame.  Given how useful I feel some of these messages are to system 
> >> tuning, and to explaining what currently appears as inexplicable pauses, I 
> >> don't want them to be buried at DEBUG levels where people are unlikely to 
> >> ever see them (I think some people may be concerned about turning on 
> >> things labeled DEBUG at all).  I am aware that I am too deep into this to 
> >> have an unbiased opinion at this point though, which is why I ask for 
> >> feedback on how to proceed here.
> >>
> >> --
> >> * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
> >>
> >> ---------------------------(end of broadcast)---------------------------
> >> TIP 7: You can help support the PostgreSQL project by donating at
> >>
> >>                 http://www.postgresql.org/about/donate
> > 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
> 
>                http://archives.postgresql.org

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Log levels for checkpoint/bgwriter monitoring
Next
From: David Fetter
Date:
Subject: Re: Arrays of Complex Types