On Fri, 2012-03-16 at 11:25 +0100, Andres Freund wrote:
> > I take that back. There was something wrong with my test -- fadvise
> > helps, but it only takes it from ~10s to ~6.5s. Not quite as good as I
> > hoped.
> Thats surprising. I wouldn't expect such a big difference between fadvise +
> sync_file_range. Rather strange.
I discussed this with my colleague who knows linux internals, and he
pointed me directly at the problem. fadvise and sync_file_range in this
case are both trying to put the data in the io scheduler queue (still in
the kernel, not on the device). The difference is that fadvise doesn't
wait, and sync_file_range does (keep in mind, this is waiting to get in
a queue to go to the device, not waiting for the device to write it or
even receive it).
He indicated that 4096 is a normal number that one might use for the
queue size. But on my workstation at home (ubuntu 11.10), the queue is
only 128. I bumped it up to 256 and now fadvise is just as fast!
This won't be a problem on production systems, but that doesn't help us
a lot. People setting up a production system don't care about 6.5
seconds of set-up time anyway. Casual users and developers do (the
latter problem can be solved with the --nosync switch, but the former
problem is the one we're discussing).
So, it looks like fadvise is the "right" thing to do, but I expect we'll
get some widely varying results from actual users. Then again, maybe
casual users don't care much about ~10s for initdb anyway? It's a fairly
rare operation for everyone except developers.
Regards,Jeff Davis