Re: Spread checkpoint sync - Mailing list pgsql-hackers

From Greg Smith
Subject Re: Spread checkpoint sync
Date
Msg-id 4D32263E.8000100@2ndquadrant.com
Whole thread Raw
In response to Re: Spread checkpoint sync  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Spread checkpoint sync  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Robert Haas wrote:
> That seems like a bad idea - don't we routinely recommend that people
> crank this up to 0.9?  You'd be effectively bounding the upper range
> of this setting to a value to the less than the lowest value we
> recommend anyone use today.
>   

I was just giving an example of how I might do an initial split.  
There's a checkpoint happening now at time T; we have a rough idea that 
it needs to be finished before some upcoming time T+D.  Currently with 
default parameters this becomes:

Write:  0.5 * D; Sync:  0

Even though Sync obviously doesn't take zero.  The slop here is enough 
that it usually works anyway.

I was suggesting that a quick reshuffling to:

Write:  0.4 * D; Sync:  0.4 * D

Might be a good first candidate for how to split the time up better.  
The fact that this gives less writing time than the current biggest 
spread possible:

Write:  0.9 * D; Sync: 0

Is true.  It's also true that in the case where sync time really is 
zero, this new default would spread writes less than the current 
default.  I think that's optimistic, but it could happen if checkpoints 
are small and you have a good write cache.

Step back from that a second though.  Ultimately, the person who is 
getting checkpoints at a 5 minute interval, and is being nailed by 
spikes, should have the option of just increasing the parameters to make 
that interval bigger.  First you increase the measly default segments to 
a reasonable range, then checkpoint_completion_target is the second one 
you can try.  But from there, you quickly move onto making 
checkpoint_timeout longer.  At some point, there is no option but to 
give up checkpoints every 5 minutes as being practical, and make the 
average interval longer.

Whether or not a refactoring here makes things slightly worse for cases 
closer to the default doesn't bother me too much.  What bothers me is 
the way trying to stretch checkpoints out further fails to deliver as 
well as it should.  I'd be OK with saying "to get the exact same spread 
situation as in older versions, you may need to retarget for checkpoints 
every 6 minutes" *if* in the process I get a much better sync latency 
distribution in most cases.

Here's an interesting data point from the customer site this all started 
at, one I don't think they'll mind sharing since it helps make the 
situation more clear to the community.  After applying this code to 
spread sync out, in order to get their server back to functional we had 
to move all the parameters involved up to where checkpoints were spaced 
35 minutes apart.  It just wasn't possible to write any faster than that 
without disrupting foreground activity. 

The whole current model where people think of this stuff in terms of 
segments and completion targets is a UI disaster.  The direction I want 
to go in is where users can say "make sure checkpoints happen every N 
minutes", and something reasonable happens without additional parameter 
fiddling.  And if the resulting checkpoint I/O spike is too big, they 
just increase the timeout to N+1 or N*2 to spread the checkpoint 
further.  Getting too bogged down thinking in terms of the current, 
really terrible interface is something I'm trying to break myself of.  
Long-term, I want there to be checkpoint_timeout, and all the other 
parameters are gone, replaced by an internal implementation of the best 
practices proven to work even on busy systems.  I don't have as much 
clarity on exactly what that best practice is the way that, say, I just 
suggested exactly how to eliminate wal_buffers as an important thing to 
manually set.  But that's the dream UI:  you shoot for a checkpoint 
interval, and something reasonable happens; if that's too intense, you 
just increase the interval to spread further.  There probably will be 
small performance regression possible vs. the current code with 
parameter combination that happen to work well on it.  Preserving every 
one of those is something that's not as important to me as making the 
tuning interface simple and clear.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books



pgsql-hackers by date:

Previous
From: Alex Hunsaker
Date:
Subject: Re: arrays as pl/perl input arguments [PATCH]
Next
From: Josh Berkus
Date:
Subject: Re: LAST CALL FOR 9.1