Hello,
>> I read your patch and I know what I want to try to have a small and simple
>> fix. I must admit that I have not really understood in which condition the
>> checkpointer would decide to close a file, but that does not mean that the
>> potential issue should not be addressed.
>
> There's a trivial example: Consider three tablespaces and
> max_files_per_process = 2. The balancing can easily cause three files
> being flushed at the same time.
Indeed. Thanks for this explanation!
> But more importantly: You designed the API to be generic because you
> wanted it to be usable for other purposes as well. And for that it
> certainly needs to deal with that.
Yes, I'm planning to try to do the minimum possible damage to the current
API to fix the issue.
>> Also, I gave some thoughts about what should be done for bgwriter random
>> IOs. The idea is to implement some per-file sorting there and then do some
>> LRU/LFU combing. It would not interact much with the checkpointer, so for me
>> the two issues should be kept separate and this should not preclude changing
>> the checkpointer, esp. given the significant performance benefit of the
>> patch.
>
> Well, the problem is that the patch significantly regresses some cases
> right now. So keeping them separate isn't particularly feasible.
I have not seen significant regressions on my many test runs. In
particular, I would not consider that having a tps deep in cases where
postgresql is doing 0 tps most of the time anyway (ie pg is offline)
because of random IO issues should be blocker.
As I understood it, the regressions occur when the checkpointer is less
used, i.e. bgwriter is doing most of the writes, but this does not change
much whether the checkpointer sorts buffers or not, and the overall
behavior of pg is very bad anyway in these cases.
Also I think that coupling the two issues is a recipee for never having
anything done in the end and keep the current awful behavior:-(
The solution on the bgwriter front is somehow similar to the checkpointer,
but from a code point of view there is minimum interaction, so I would
really separate them, esp. as the bgwriter part will require extensive
testing and discussions as well.
--
Fabien.