Hello Andres,
>> checkpoint when the segments are full... the server is unresponsive about
>> 10% of the time (one in ten transaction is late by more than 200 ms).
>
> That's ext4 I guess?
Yes!
> Did you check whether xfs yields a, err, more predictable performance?
No. I cannot test that easily without reinstalling the box. I did some
quick tests with ZFS/FreeBSD which seemed to freeze the same, but not in
the very same conditions. Maybe I could try again.
> [...] Note that it would *not* be a good idea to make the bgwriter write
> out everything, as much as possible - that'd turn sequential write io
> into random write io.
Hmmm. I'm not sure it would be necessary the case, it depends on how
bgwriter would choose the pages to write? If they are chosen randomly then
indeed that could be bad. If there is a big sequential write, should not
the backend do the write directly anyway? ISTM that currently checkpoint
is mostly random writes anyway, at least with the OLTP write load of
pgbench. I'm just trying to be able to start them ealier so that they can
be completed quickly.
So although bgwriter is not the solution, ISTM that pg has no reason to
wait for minutes before starting to write dirty pages, if it has nothing
else to do. If the OS does some retention later and cannot spread the
load, as Josh suggest, this could also be a problem, but currently the OS
seems not to have much to write (but WAL) till the checkpoint.
--
Fabien.