Re: Load Distributed Checkpoints, take 3 - Mailing list pgsql-patches

From Heikki Linnakangas
Subject Re: Load Distributed Checkpoints, take 3
Date
Msg-id 467BEEF2.1090004@enterprisedb.com
Whole thread Raw
In response to Re: Load Distributed Checkpoints, take 3  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Load Distributed Checkpoints, take 3
List pgsql-patches
Tom Lane wrote:
> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>> Tom Lane wrote:
>>> (BTW, the patch seems
>>> a bit schizoid about whether checkpoint_rate is int or float.)
>
>> Yeah, I've gone back and forth on the data type. I wanted it to be a
>> float, but guc code doesn't let you specify a float in KB, so I switched
>> it to int.
>
> I seriously question trying to claim that it's blocks at all, seeing
> that the *actual* units are pages per unit time.  Pretending that it's
> a memory unit does more to obscure the truth than document it.

Hmm. What I wanted to avoid is that the I/O rate you get then depends on
your bgwriter_delay, so you if you change that you need to change
checkpoint_min_rate as well.

Now we already have that issue with bgwriter_all/lru_maxpages, and I
don't like it there either. If you think it's better to let the user
define it directly as pages/bgwriter_delay, fine.

>>> And checkpoint_rate really needs to be named checkpoint_min_rate, if
>>> it's going to be a minimum.  However, I question whether we need it at
>>> all,
>
>> Hmm. With bgwriter_delay of 200 ms, and checkpoint_min_rate of 512 KB/s,
>>   using the non-broken formula above, we get:
>
>> (512*1024/8192) * 200 / 1000 = 12.8, truncated to 12.
>
>> So I think that's fine.
>
> "Fine?"  That's 12x the value you have actually tested.  That's enough
> of a change to invalidate all your performance testing IMHO.

I'll reschedule the tests to be sure, after we settle on how we want to
control this feature.

> I still think you've not demonstrated a need to expose this parameter.

Greg Smith wanted to explicitly control the I/O rate, and let the
checkpoint duration vary. I personally think that fixing the checkpoint
duration is better because it's easier to tune.

But if we only do that, you might end up with ridiculously long
checkpoints when there's not many dirty pages. If we want to avoid that,
we need some way of telling what's a safe minimum rate to write at,
because that can vary greatly depending on your hardware.

But maybe we don't care about prolonging checkpoints, and don't really
need any GUCs at all. We could then just hardcode writes_per_nap to some
low value, and target duration close to 1.0. You would have a checkpoint
running practically all the time, and you would use
checkpoint_timeout/checkpoint_segments to control how long it takes. I'm
a bit worried about jumping to such a completely different regime,
though. For example, pg_start_backup needs to create a new checkpoint,
so it would need to wait on average 1.5 * checkpoint_timeout/segments,
and recovery would need to process on average 1.5 as much WAL as before.
Though with LDC, you should get away with shorter checkpoint intervals
than before, because the checkpoints aren't as invasive.

If we do that, we should remove bgwriter_all_* settings. They wouldn't
do much because we would have checkpoint running all the time, writing
out dirty pages.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

pgsql-patches by date:

Previous
From: "Simon Riggs"
Date:
Subject: Re: Transaction Guarantee, updated version
Next
From: Tom Lane
Date:
Subject: Re: Load Distributed Checkpoints, take 3