> From the postgresql.conf, I can see that the shared_buffers is > set to 48GB which is not small, it would be possible that the > large buffer cache could be "dirty", when a checkpoint starts, it > would cause a checkpoint I/O spike. > > > I would like to suggest you about using pgtune to get recommended > conf for postgresql.
I have seen symptoms like those described which were the result of too many dirty pages accumulating inside PostgreSQL shared_buffers. It might be something else entirely in this case, but it would at least be worth trying a reduced shared_buffers setting combined with more aggressive bgwriter settings. I might try something like the following changes, as an experiment: