Thread: RE: WAL does not recover gracefully from out-of-disk-sp ace

RE: WAL does not recover gracefully from out-of-disk-sp ace

From
"Mikheev, Vadim"
Date:
> > I see that seek+write was changed to write-s in XLogFileInit
> > (that was induced by subj, right?), but what about problem
> > itself?
> 
> > BTW, were performance tests run after seek+write --> write-s
> > change?
> 
> That change was for safety, not for performance.  It might be a
> performance win on systems that support fdatasync properly (because it
> lets us use fdatasync), otherwise it's probably not a performance win.

Even with true fdatasync it's not obviously good for performance - it takes
too long time to write 16Mb files and fills OS buffer cache with trash-:(
Probably, we need in separate process like LGWR (log writer) in Oracle.
I also like the Andreas idea about re-using log files.

> But we need it regardless --- if you didn't want a fully-allocated WAL
> file, why'd you bother with the original seek-and-write-1-byte code?

I considered this mostly as hint for OS about how log file should be
allocated (to decrease fragmentation). Not sure how OSes use such hints
but seek+write costs nothing.

Vadim


Re: WAL does not recover gracefully from out-of-disk-sp ace

From
Ian Lance Taylor
Date:
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:

> > But we need it regardless --- if you didn't want a fully-allocated WAL
> > file, why'd you bother with the original seek-and-write-1-byte code?
> 
> I considered this mostly as hint for OS about how log file should be
> allocated (to decrease fragmentation). Not sure how OSes use such hints
> but seek+write costs nothing.

Doing a seek to a large value and doing a write is not a hint to a
Unix system that you are going to write a large sequential file.  If
anything, it's a hint that you are going to write a sparse file.  A
Unix kernel will optimize by not allocating blocks you aren't going to
write to.

Ian

---------------------------(end of broadcast)---------------------------
TIP 97: Oh this age!  How tasteless and ill-bred it is.    -- Gaius Valerius Catullus


Re: WAL does not recover gracefully from out-of-disk-sp ace

From
Tom Lane
Date:
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:
> Even with true fdatasync it's not obviously good for performance - it takes
> too long time to write 16Mb files and fills OS buffer cache with trash-:(

True.  But at least the write is (hopefully) being done at a
non-performance-critical time.

> Probably, we need in separate process like LGWR (log writer) in Oracle.

I think the create-ahead feature in the checkpoint maker should be on
by default.

>> But we need it regardless --- if you didn't want a fully-allocated WAL
>> file, why'd you bother with the original seek-and-write-1-byte code?

> I considered this mostly as hint for OS about how log file should be
> allocated (to decrease fragmentation). Not sure how OSes use such hints
> but seek+write costs nothing.

AFAIK, extant Unixes will not regard this as a hint at all; they'll
think it is a great opportunity to not store zeroes :-(.

One reason that I like logfile fill to be done separately is that it's
easier to convince ourselves that failure (due to out of disk space)
need not require elog(STOP) than if we have the same failure during
XLogWrite.  You are right that we don't have time to consider each STOP
in the WAL code, but I think we should at least look at that case...
        regards, tom lane