Re: PATCH: regular logging of checkpoint progress - Mailing list pgsql-hackers

From Greg Smith
Subject Re: PATCH: regular logging of checkpoint progress
Date
Msg-id 4E61C6E2.7050503@2ndQuadrant.com
Whole thread Raw
In response to Re: PATCH: regular logging of checkpoint progress  ("Tomas Vondra" <tv@fuzzy.cz>)
Responses Re: PATCH: regular logging of checkpoint progress
List pgsql-hackers
On 09/02/2011 11:10 AM, Tomas Vondra wrote:
> My 'ideal' solution would be either to add another GUC (to turn this
> on/off) or allow log_checkpoints to have three values
>
> log_checkpoints = {off, normal, detailed}
>
> where 'normal' provides the current output and 'detail' produces this much
> verbose output.
>    

If this is going to be acceptable, that's likely the only path it could 
happen by and still meet what you're looking for.  I will just again 
stress that the part you're working on instrumenting better right now is 
not actually where larger systems really run into the most problems 
here, based on what I've seen.  I added a series of log messages to 9.1 
at DEBUG1, aimed at tracking the sync phase.  That's where I see many 
more checkpoint issues than in the write one.  On Linux in particular, 
it's almost impossible for the write phase to be more of a problem than 
the sync one.

So the logging you're adding here I don't ever expect to turn on.  But I 
wouldn't argue against an option to handle the logging use-case you're 
concerned about.  Letting people observe for themselves and decide which 
of the phases is more interesting to their workload seems appropriate.  
Then users have options for what to log, no matter which type of problem 
they run into.

If you're expanding log_checkpoints to an enum, for that to handle what 
I think everybody might ever want (for what checkpoints do now at 
least), I'd find that more useful if it happened like this instead:

log_checkpoints = {off, on, write, sync, verbose}

I don't think you should change the semantics of off/on, which will 
avoid breaking existing postgresql.conf files and resources that suggest 
tuning advice.  "write" can toggle on what you're adding; "sync" should 
control whether the DEBUG1 messages showing the individual file names in 
the sync phase appear; and "verbose" can include both.

As far as a heuristic for making this less chatty when there's nothing 
exciting happening goes, I think something based on how much time has 
passed would be the best one.  In your use case, I would guess you don't 
really care whether a message appears every n%.  If I understand you 
correctly now, you would mainly care about getting enough log detail to 
know 1) when things are running really slow, or b) often enough that the 
margin of error in your benchmark results from unaccounted checkpoint 
writes is acceptable.  In both of those cases, I'd think a time-based 
threshold would be appropriate, and that also deals with the time-based 
checkpoints, too.

If your logging criteria for the write phase was "display a message any 
time more than 30 seconds have passed since last seeing one", that would 
give you only a few lines of output in a boring, normal 
checkpoint--certainly less than the 9 in-progress samples you're 
outputting now, at 10% intervals.  But in the pathological situations 
where writes are super slow, your log data would become correspondingly 
denser, which is exactly what you want in that situation.

I think combining the two makes the most sense:  "log when >=30 seconds 
have passed since the last message, and there's been >=10% more progress 
made".  (Maybe do the progress check before the time one, to cut down on 
gettimeofday() calls)  That would give you 4 in-progress reports during 
a standard 2.5 minute write phase, and in cases where the checkpoints 
are taking a long time you'd get as many as 9.  That's pretty close to 
auto-tuning the amount of log output, so the amount of it is roughly 
proportional to how likely the information it's logging will be 
interesting.

We certainly don't want to add yet another GUC just to control the 
frequency.  I don't think it will be too hard to put two hard-coded 
thresholds in and do good enough for just about everyone though.  I 
would probably prefer setting those thresholds to 60 seconds/20% 
instead.  That might not be detailed enough for you though.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: pg_upgrade automatic testing
Next
From: Peter Eisentraut
Date:
Subject: Re: pg_upgrade automatic testing