Re: Log levels for checkpoint/bgwriter monitoring - Mailing list pgsql-hackers

From Magnus Hagander
Subject Re: Log levels for checkpoint/bgwriter monitoring
Date
Msg-id 460954A4.90803@hagander.net
Whole thread Raw
In response to Re: Log levels for checkpoint/bgwriter monitoring  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Log levels for checkpoint/bgwriter monitoring
Re: Log levels for checkpoint/bgwriter monitoring
List pgsql-hackers
Would not at least some of these numbers be better presented through the
stats collector, so they can be easily monitored?

That goes along the line of my way way way away from finished attempt
earlier, perhaps a combination of these two patches?

//Magnus


Bruce Momjian wrote:
> Is this patch ready?
> 
> ---------------------------------------------------------------------------
> 
> Greg Smith wrote:
>> I have a WIP patch that adds the main detail I have found I need to 
>> properly tune checkpoint and background writer activity.  I think it's 
>> almost ready to submit (you can see the current patch against 8.2 at 
>> http://www.westnet.com/~gsmith/content/postgresql/patch-checkpoint.txt ) 
>> after making it a bit more human-readable.  But I've realized that along 
>> with that, I need some guidance in regards to what log level is 
>> appropriate for this information.
>>
>> An example works better than explaining what the patch does:
>>
>> 2007-02-19 21:53:24.602 EST - DEBUG:  checkpoint required (wrote 
>> checkpoint_segments)
>> 2007-02-19 21:53:24.685 EST - DEBUG:  checkpoint starting
>> 2007-02-19 21:53:24.705 EST - DEBUG:  checkpoint flushing buffer pool
>> 2007-02-19 21:53:24.985 EST - DEBUG:  checkpoint database fsync starting
>> 2007-02-19 21:53:42.725 EST - DEBUG:  checkpoint database fsync complete
>> 2007-02-19 21:53:42.726 EST - DEBUG:  checkpoint buffer flush dirty=8034 
>> write=279956 us sync=17739974 us
>>
>> Remember that "Load distributed checkpoint" discussion back in December? 
>> You can see exactly how bad the problem is on your system with this log 
>> style (this is from a pgbench run where it's postively awful--it really 
>> does take over 17 seconds for the fsync to execute, and there are clients 
>> that are hung the whole time waiting for it).
>>
>> I also instrumented the background writer.  You get messages like this:
>>
>> 2007-02-19 21:58:54.328 EST - DEBUG:  BGWriter Scan All - Written = 5/5 
>> Unscanned = 23/54
>>
>> This shows that we wrote (5) the maximum pages we were allowed to write 
>> (5) while failing to scan almost half (23) of the buffers we meant to look 
>> at (54).  By taking a look at this output while the system is under load, 
>> I found I was able to do bgwriter optimization that used to take me days 
>> of frustrating testing in hours.  I've been waiting for a good guide to 
>> bgwriter tuning since 8.1 came out.  Once you have this, combined with 
>> knowing how many buffers were dirty at checkpoint time because the 
>> bgwriter didn't get to them in time (the number you want to minimize), the 
>> tuning guide practically writes itself.
>>
>> So my question is...what log level should all this go at?  Right now, I 
>> have the background writer stuff adjusting its level dynamically based on 
>> what happened; it logs at DEBUG2 if it hits the write limit (which should 
>> be a rare event once you're tuned properly), DEBUG3 for writes that 
>> scanned everything they were supposed to, and DEBUG4 if it scanned but 
>> didn't find anything to write.  The source of checkpoint information logs 
>> at DEBUG1, the fsync/write info at DEBUG2.
>>
>> I'd like to move some of these up.  On my system, I even have many of the 
>> checkpoint messages logged at INFO (the source of the checkpoint and the 
>> total write time line).  It's a bit chatty, but when you get some weird 
>> system pause issue it makes it easy to figure out if checkpoints were to 
>> blame.  Given how useful I feel some of these messages are to system 
>> tuning, and to explaining what currently appears as inexplicable pauses, I 
>> don't want them to be buried at DEBUG levels where people are unlikely to 
>> ever see them (I think some people may be concerned about turning on 
>> things labeled DEBUG at all).  I am aware that I am too deep into this to 
>> have an unbiased opinion at this point though, which is why I ask for 
>> feedback on how to proceed here.
>>
>> --
>> * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 7: You can help support the PostgreSQL project by donating at
>>
>>                 http://www.postgresql.org/about/donate
> 



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Archive log compression keeping physical log available in the crash recovery
Next
From: Bruce Momjian
Date:
Subject: Re: Log levels for checkpoint/bgwriter monitoring