Hello Takashi-san,
I wanted to do some tests with this POC patch. For this purpose, it would
be nice to have a guc which would allow to activate or not this feature.
Could you provide a patch with such a guc? I would suggest to have the
number of partitions as a guc, so that choosing 1 would basically reflect
the current behavior.
Some general comments :
I understand that what this patch does is cutting the checkpoint of
buffers in 16 partitions, each addressing 1/16 of buffers, and each with
its own wal-log entry, pacing, fsync and so on.
I'm not sure why it would be much better, although I agree that it may
have some small positive influence on performance, but I'm afraid it may
also degrade performance in some conditions. So I think that maybe a
better understanding of why there is a better performance and focus on
that could help obtain a more systematic gain.
This method interacts with the current proposal to improve the
checkpointer behavior by avoiding random I/Os, but it could be combined.
I'm wondering whether the benefit you see are linked to the file flushing
behavior induced by fsyncing more often, in which case it is quite close
the "flushing" part of the current "checkpoint continuous flushing" patch,
and could be redundant/less efficient that what is done there, especially
as test have shown that the effect of flushing is *much* better on sorted
buffers.
Another proposal around, suggested by Andres Freund I think, is that
checkpoint could fsync files while checkpointing and not wait for the end
of the checkpoint. I think that it may also be one of the reason why your
patch does bring benefit, but Andres approach would be more systematic,
because there would be no need to fsync files several time (basically your
patch issues 16 fsync per file). This suggest that the "partitionning"
should be done at a lower level, from within the CheckPointBuffers, which
would take care of fsyncing files some time after writting buffers to them
is finished.
--
Fabien.