Hello Heikki,
> For the kicks, I wrote a quick & dirty patch for interleaving the fsyncs, see
> attached. It works by repeatedly scanning the buffer pool, writing buffers
> belonging to a single relation segment at a time.
I tried this patch on the same host I used with the same "-R 25 -L 200 -T
5000", alas without significant positive effects, about 6% of transactions
were "lost", including stretches of several seconds of unresponsiveness.
Maybe this is because "pgbench -N" only basically touches 2 tables
(accounts & history), so there are few file syncs involved, thus it does
not help much with spreading a lot of writes.
2014-08-30 23:52:24.167 CEST: LOG: checkpoint starting: xlog 2014-08-30 23:54:09.239 CEST: LOG: checkpoint
sync:number=1 file=base/16384/11902 time=24.529 msec 2014-08-30 23:54:23.812 CEST: LOG: checkpoint sync: number=1
file=base/16384/16397_vmtime=32.547 msec 2014-08-30 23:54:24.771 CEST: LOG: checkpoint sync: number=1
file=base/16384/11873time=557.470 msec 2014-08-30 23:54:37.419 CEST: LOG: checkpoint complete: wrote 4931 buffers
(30.1%); 0 transaction log file(s) added, 0 removed, 3 recycled; write=122.946 s, sync=10.129 s, total=133.252
s; sync files=6, longest=9.854 s, average=1.790 s
Note that given the small load and table size, pgbench implies random
I/Os and basically nothing else.
--
Fabien.