>>>> In summary, the X^1.5 correction seems to work pretty well. It doesn't
>>>> completely eliminate the problem, but it makes it a lot better.
I've looked at the maths.
I think that the load is distributed as the derivative of this function,
that is (1.5 * x ** 0.5): It starts at 0 but very quicky reaches 0.5, it
pass the 1.0 (average load) around 40% progress, and ends up at 1.5, that
is the finishing load is 1.5 the average load, just before fsyncing files.
This looks like a recipee for a bad time: I would say this is too large an
overload. I would suggest a much lower value, say around 1.1...
The other issue with this function is that it should only degrade
performance by disrupting the write distribution if someone has WAL on a
different disk. As I understand it this thing does only make sense if the
WAL & the data are on the samee disk. This really suggest a guc.
> I have ran some tests with this patch and the detailed results of the
> runs are attached with this mail.
I do not understand really the aggregated figures in the files attached.
I guess that maybe between "end" markers there is a summary of figures
collected for 28 backends over 300-second runs (?), but I do not know what
the min/max/avg/sum/count figures are about.
> I thought the patch should show difference if I keep max_wal_size to
> somewhat lower or moderate value so that checkpoint should get triggered
> due to wal size, but I am not seeing any major difference in the writes
> spreading.
I'm not sure I understand your point. I would say that at full speed
pgbench the disk is always busy writing as much as possible, either
checkpoint writes or wal writes, so the write load as such should not be
that different anyway?
I understood that the point of the patch is to check whether there is a
tps dip or not when the checkpoint begins, but I'm not sure how this can
be infered from the many aggregated data you sent, and from my recent
tests the tps is very variable anyway on HDD.
--
Fabien.