Thread: RE: WAL does not recover gracefully from out-of-disk-sp ace

RE: WAL does not recover gracefully from out-of-disk-sp ace

From
"Mikheev, Vadim"
Date:
> > Even with true fdatasync it's not obviously good for 
> > performance - it takes too long time to write 16Mb files
> > and fills OS buffer cache with trash-:(
> 
> True.  But at least the write is (hopefully) being done at a
> non-performance-critical time.

There is no such hope: XLogWrite may be called from XLogFlush
(at commit time and from bufmgr on replacements) *and* from XLogInsert
- ie new log file may be required at any time.

> > Probably, we need in separate process like LGWR (log 
> > writer) in Oracle.
> 
> I think the create-ahead feature in the checkpoint maker should be
> on by default.

I'm not sure - it increases disk requirements.

> > I considered this mostly as hint for OS about how log file should be
> > allocated (to decrease fragmentation). Not sure how OSes 
> > use such hints but seek+write costs nothing.
> 
> AFAIK, extant Unixes will not regard this as a hint at all; they'll
> think it is a great opportunity to not store zeroes :-(.

Yes, but if I would write file system then I wouldn't allocate space
for file block by block - I would try to pre-allocate more than required
by write(). So I hoped that seek+write is hint for OS: "Hey, I need in
16Mb file - try to make it as continuous as possible". Don't know does
it work, though -:)

> One reason that I like logfile fill to be done separately is that it's
> easier to convince ourselves that failure (due to out of disk space)
> need not require elog(STOP) than if we have the same failure during
> XLogWrite.  You are right that we don't have time to consider 
> each STOP in the WAL code, but I think we should at least look at
> that case...

What problem with elog(STOP) in the absence of disk space?
I think running out of disk is bad enough to stop DB operations.

Vadim


Re: WAL does not recover gracefully from out-of-disk-sp ace

From
Tom Lane
Date:
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:
>> True.  But at least the write is (hopefully) being done at a
>> non-performance-critical time.

> There is no such hope: XLogWrite may be called from XLogFlush
> (at commit time and from bufmgr on replacements) *and* from XLogInsert
> - ie new log file may be required at any time.

Sure, but if we have create-ahead enabled then there's a good chance of
the log files being made by the checkpoint process, rather than by
working backends.  In that case the prefill is not time critical.

In any case, my tests so far show that prefilling and then writing with
O_SYNC or better O_DSYNC is in fact faster than not prefilling; this
matches pretty well the handwaving argument I gave Andreas this morning.
(With fsync() or fdatasync() it seems we're at the mercy of inefficient
kernel algorithms, a factor I didn't consider before.)
        regards, tom lane