Hello Heikki,
> The reason I didn't commit this back then was lack of performance testing.
> I'm fairly confident that this would be a significant improvement for some
> workloads, and shouldn't hurt much even in the worst case. But I did only a
> little testing on my laptop. I think Simon was in favor of just committing it
> immediately, and
> Fabien wanted to see more performance testing before committing.
I confirm. To summarize my opinion:
I think that the 1.5 value somewhere in the patch is much too high for the
purpose because it shifts the checkpoint load quite a lot (50% more load
at the end of the checkpoint) just for the purpose of avoiding a spike
which lasts a few seconds (I think) at the beginning. A much smaller value
should be used (1.0 <= factor < 1.1), as it would be much less disruptive
and would probably avoid the issue just the same. I recommend not to
commit with a 1.5 factor in any case.
Another issue I raised is that the load change occurs both with xlog and
time triggered checkpoints, and I'm sure it should be applied in both
case.
Another issue is that the patch makes sense when the WAL & relations are
on the same disk, but might degrade performance otherwise.
Another point that it interacts potentially with a patch I submitted which
has a large impact on performance (order of magnitude better in some cases
by sorting & flushing blocks on checkpoints), so it would make sense to
check that.
So more testing is definitely needed. A guc would be nice for this
purpose, especially to look at different factors.
> I was hoping that Digoal would re-ran his original test case, and report
> back on whether it helps. Fabien had a performance test setup, for
> testing another patch, but he didn't want to run it to test this patch.
Indeed, I have, but I'm quite behind at the moment, I cannot promise
anything. Moreover, I'm not sure I see this "spike" issue in my setting,
AFAICR.
--
Fabien.