Re: Load Distributed Checkpoints, take 3 - Mailing list pgsql-patches
From | Heikki Linnakangas |
---|---|
Subject | Re: Load Distributed Checkpoints, take 3 |
Date | |
Msg-id | 467BEEF2.1090004@enterprisedb.com Whole thread Raw |
In response to | Re: Load Distributed Checkpoints, take 3 (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Load Distributed Checkpoints, take 3
|
List | pgsql-patches |
Tom Lane wrote: > Heikki Linnakangas <heikki@enterprisedb.com> writes: >> Tom Lane wrote: >>> (BTW, the patch seems >>> a bit schizoid about whether checkpoint_rate is int or float.) > >> Yeah, I've gone back and forth on the data type. I wanted it to be a >> float, but guc code doesn't let you specify a float in KB, so I switched >> it to int. > > I seriously question trying to claim that it's blocks at all, seeing > that the *actual* units are pages per unit time. Pretending that it's > a memory unit does more to obscure the truth than document it. Hmm. What I wanted to avoid is that the I/O rate you get then depends on your bgwriter_delay, so you if you change that you need to change checkpoint_min_rate as well. Now we already have that issue with bgwriter_all/lru_maxpages, and I don't like it there either. If you think it's better to let the user define it directly as pages/bgwriter_delay, fine. >>> And checkpoint_rate really needs to be named checkpoint_min_rate, if >>> it's going to be a minimum. However, I question whether we need it at >>> all, > >> Hmm. With bgwriter_delay of 200 ms, and checkpoint_min_rate of 512 KB/s, >> using the non-broken formula above, we get: > >> (512*1024/8192) * 200 / 1000 = 12.8, truncated to 12. > >> So I think that's fine. > > "Fine?" That's 12x the value you have actually tested. That's enough > of a change to invalidate all your performance testing IMHO. I'll reschedule the tests to be sure, after we settle on how we want to control this feature. > I still think you've not demonstrated a need to expose this parameter. Greg Smith wanted to explicitly control the I/O rate, and let the checkpoint duration vary. I personally think that fixing the checkpoint duration is better because it's easier to tune. But if we only do that, you might end up with ridiculously long checkpoints when there's not many dirty pages. If we want to avoid that, we need some way of telling what's a safe minimum rate to write at, because that can vary greatly depending on your hardware. But maybe we don't care about prolonging checkpoints, and don't really need any GUCs at all. We could then just hardcode writes_per_nap to some low value, and target duration close to 1.0. You would have a checkpoint running practically all the time, and you would use checkpoint_timeout/checkpoint_segments to control how long it takes. I'm a bit worried about jumping to such a completely different regime, though. For example, pg_start_backup needs to create a new checkpoint, so it would need to wait on average 1.5 * checkpoint_timeout/segments, and recovery would need to process on average 1.5 as much WAL as before. Though with LDC, you should get away with shorter checkpoint intervals than before, because the checkpoints aren't as invasive. If we do that, we should remove bgwriter_all_* settings. They wouldn't do much because we would have checkpoint running all the time, writing out dirty pages. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
pgsql-patches by date: