> > Even with true fdatasync it's not obviously good for
> > performance - it takes too long time to write 16Mb files
> > and fills OS buffer cache with trash-:(
>
> True. But at least the write is (hopefully) being done at a
> non-performance-critical time.
There is no such hope: XLogWrite may be called from XLogFlush
(at commit time and from bufmgr on replacements) *and* from XLogInsert
- ie new log file may be required at any time.
> > Probably, we need in separate process like LGWR (log
> > writer) in Oracle.
>
> I think the create-ahead feature in the checkpoint maker should be
> on by default.
I'm not sure - it increases disk requirements.
> > I considered this mostly as hint for OS about how log file should be
> > allocated (to decrease fragmentation). Not sure how OSes
> > use such hints but seek+write costs nothing.
>
> AFAIK, extant Unixes will not regard this as a hint at all; they'll
> think it is a great opportunity to not store zeroes :-(.
Yes, but if I would write file system then I wouldn't allocate space
for file block by block - I would try to pre-allocate more than required
by write(). So I hoped that seek+write is hint for OS: "Hey, I need in
16Mb file - try to make it as continuous as possible". Don't know does
it work, though -:)
> One reason that I like logfile fill to be done separately is that it's
> easier to convince ourselves that failure (due to out of disk space)
> need not require elog(STOP) than if we have the same failure during
> XLogWrite. You are right that we don't have time to consider
> each STOP in the WAL code, but I think we should at least look at
> that case...
What problem with elog(STOP) in the absence of disk space?
I think running out of disk is bad enough to stop DB operations.
Vadim