On Mon, May 13, 2013 at 2:49 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> Sure, the initial file creation will be faster. But are the actual
> individual wal writes (small, frequently fdatasync()ed) still faster?
> That's the critical path currently.
> Whether it is pretty much depends on how the filesystem manages
> allocated but not initialized blocks...
In ext4 aIui it doesn't actually pick target blocks. It just adjusts
the accounting so it knows that many blocks will be needed for this
file and guarantees they'll be available. If you read from them it
knows to provide 0s. So in theory the performance in the critical path
would be worse but I think by an insignificant amount.
The reason Postgres pre-allocates the blocks is not for the
performance optimization. It's for safety. To guarantee -- as best as
possible -- that it won't get a write error when the time comes to
write to it. Especially to guarantee that the disk won't suddenly turn
out to be full.
It seems possible that some file systems would not protect you against
media errors nearly as well using it. It might take time to respond to
a media error and in a poorly written filesystem it might even be
reported to the application even though there's no need. But media
errors can occur any time, even after the initial write so I don't
think this should be a blocker. I think posix_fallocate is good
enough for us and I would support using it.
--
greg