Re: Fwd: Is the fsync() fake on FreeBSD6.1? - Mailing list pgsql-hackers

From Andrew - Supernews
Subject Re: Fwd: Is the fsync() fake on FreeBSD6.1?
Date
Msg-id slrnehb51e.2ea3.andrew+nonews@atlantis.supernews.net
Whole thread Raw
In response to Fwd: Is the fsync() fake on FreeBSD6.1?  (Jim Nasby <jim@nasby.net>)
List pgsql-hackers
On 2006-09-23, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andrew - Supernews <andrew+nonews@supernews.com> writes:
>> Whether the underlying device lies about the write completion is another
>> matter. All current SCSI disks have WCE enabled by default, which means
>> that they will lie about write completion if FUA was not set in the
>> request, which FreeBSD never sets.
>
> Huh?  The entire point of the SCSI command set is that it's not
> necessary to lie about write completion for performance reasons, because
> the architecture has always supported the concept of multiple requests
> in-flight concurrently.

I seem to recall we've had this conversation previously.

> Has the disk drive industry gotten a whole lot
> stupider in the fifteen years since I last wrote a SCSI driver?

Quite possibly, yes.

I certainly would never claim that WCE is a good idea, or that having it
enabled by default is a good idea, I merely report the _fact_ that it is
indeed enabled by default on every SCSI drive that I have recently
encountered (over several different vendors).

On my database machines I am careful to disable it (and check that this
does indeed take effect). I would recommend that others do likewise. The
performance impact of disabling WCE is not serious (other than removing
the unsafe speed gains of course).

Since posting the previous response I've been directed to a document that
seems to imply that Linux drivers now attempt to handle write-order
guarantees by introducing the concept of a "write barrier", i.e. a write
request which must complete after all previous writes and before all
subsequent ones.  Achieving this requires different strategies depending
on whether the underlying device allows command-queueing and/or exposes a
useful cache flush command; the implication of this is that for SCSI disks
with WCE, the linux driver will actually send SYNCHRONIZE CACHE when doing
a write barrier (which could be expensive of course). If (and I have no
idea if this is true) fsync() is implemented by means of such a barrier,
then this implies that an fsync()-heavy workload will perform much worse
on Linux when WCE is enabled than when it is disabled, since in the latter
case the driver will not issue SYNCHRONIZE CACHE and will simply ensure
that the relevent writes are all completed.

It would be interesting to see benchmarks of this.

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services


pgsql-hackers by date:

Previous
From: Martijn van Oosterhout
Date:
Subject: Re: pgsql: We're going to have to spell dotless i
Next
From: "Dave Page"
Date:
Subject: Buildfarm alarms