Hello Robert,
>> I've looked at the maths.
>>
>> I think that the load is distributed as the derivative of this
>> function, that is (1.5 * x ** 0.5): It starts at 0 but very quicky
>> reaches 0.5, it pass the 1.0 (average load) around 40% progress, and
>> ends up at 1.5, that is the finishing load is 1.5 the average load,
>> just before fsyncing files. This looks like a recipee for a bad time: I
>> would say this is too large an overload. I would suggest a much lower
>> value, say around 1.1...
>> The other issue with this function is that it should only degrade
>> performance by disrupting the write distribution if someone has WAL on
>> a different disk. As I understand it this thing does only make sense if
>> the WAL & the data are on the samee disk. This really suggest a guc.
>
> I am a bit skeptical about this. We need test scenarios that clearly
> show the benefit of having and of not having this behavior. It might be
> that doing this always is fine for everyone.
Do you mean I have to proove that there is an actual problem induced from
this patch?
The logic fails me: I thought the patch submitter would have to show that
his/her patch did not harm performance in various reasonable cases. At
least this is what I'm told in another thread:-)
Currently this patch changes heavily the checkpoint write load
distribution in many cases with a proof which consist in showing that it
may improve tps *briefly* on *one* example, as far as I understood the
issue and the tests. If this is enough proof to apply the patch, then the
minimum is that it should be possible to desactivate it, hence a guc.
Having a guc would also help to test the feature with different values
than 1.5, which really seems harmful from a math point of view. I'm not
sure at all that a power formula is the right approach.
The potential impact I see would be to aggravate significantly the write
stall issues I'm working on, but the measures provided in these tests do
not even look at that or measure that.
--
Fabien.