On Thu, Aug 8, 2013 at 4:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jon Nelson <jnelson+pgsql@jamponi.net> writes:
>> At this point I'm convinced that the issue is a pathological case in
>> ext4. The performance impact disappears as soon as the unwritten
>> extent(s) are written to with real data. Thus, even though allocating
>> files with posix_fallocate is - frequently - orders of magnitude
>> quicker than doing it with write(2), the subsequent re-write can be
>> more expensive. At least, that's what I'm gathering from the various
>> threads. Why this issue didn't crop up in earlier testing and why I
>> can't seem to make test_fallocate do it (even when I modify
>> test_fallocate to write to the newly-allocated file in a mostly-random
>> fashion) has me baffled.
>
> Does your test program use all the same writing options that the real
> WAL writes do (like O_DIRECT)?
I believe so.
From xlog.c:
/* do not use get_sync_bit() here --- want to fsync only at end of fill */ fd = BasicOpenFile(tmppath, O_RDWR |
O_CREAT| O_EXCL | PG_BINARY, S_IRUSR | S_IWUSR);
and from the test program:
fd = open(filename, O_CREAT | O_EXCL | O_WRONLY, 0600);
PG_BINARY expands to 0 on non-Windows. I also tried using O_WRONLY in
xlog.c without change.
--
Jon