>> I'm getting increasingly unhappy about the checkpoint flush control.
>> I saw major regressions on my parallel COPY test, too:
>
> Yes, I'm concerned too.
A few thoughts:
- focussing on raw tps is not a good idea, because it may be a lot of tps followed by a sync panic, with an
unresponsivedatabase. I wish the performance reports would include some indication of the distribution (eg
min/q1/median/d3/maxtps per second seen, standard deviation), not just the final "tps" figure.
- checkpoint flush control (checkpoint_flush_after) should mostly always beneficial because it flushes sorted data.
Iwould be surprised to see significant regressions with this on. A lot of tests showed maybe improved tps, but
mostlygreatly improved performance stability, where a database unresponsive 60% of the time (60% of seconds in the
thetps show very low or zero tps) and then becomes always responsive.
- other flush controls ({backend,bgwriter}_flush_after) may just increase random writes, so are more risky in nature
becausethe data is not sorted, and it may or may not be a good idea depending on detailed conditions. A "parallel
copy"would be just such a special IO load which degrade performance under these settings.
Maybe these two should be disabled by default because they lead to possibly surprising regressions?
- for any particular load, the admin can decide to disable these if they think it is better not to flush. Also, as
suggestedby Andres, with 128 parallel queries the default value may not be appropriate at all.
--
Fabien.