Ron Mayer wrote:
> Marco Colombo wrote:
>> Ron Mayer wrote:
>>> Greg Smith wrote:
>>>> There are some known limitations to Linux fsync that I remain somewhat
>>>> concerned about, independantly of LVM, like "ext3 fsync() only does a
>>>> journal commit when the inode has changed" (see
>>>> http://kerneltrap.org/mailarchive/linux-kernel/2008/2/26/990504 )....
>>> I wonder if there should be an optional fsync mode
>>> in postgres should turn fsync() into
>>> fchmod (fd, 0644); fchmod (fd, 0664);
> 'course I meant: "fchmod (fd, 0644); fchmod (fd, 0664); fsync(fd);"
>>> to work around this issue.
>> Question is... why do you care if the journal is not flushed on fsync?
>> Only the file data blocks need to be, if the inode is unchanged.
>
> You don't - but ext3 fsync won't even push the file data blocks
> through a disk cache unless the inode was changed.
>
> The point is that ext3 only does the "write barrier" processing
> that issues the FLUSH CACHE (IDE) or SYNCHRONIZE CACHE (SCSI)
> commands on inode changes, not data changes. And with no FLUSH
> CACHE or SYNCHRONINZE IDE the data blocks may sit in the disks
> cache after the fsync() as well.
Yes, but we knew it already, didn't we? It's always been like
that, with IDE disks and write-back cache enabled, fsync just
waits for the disk reporting completion and disks lie about
that. Write barriers enforce ordering, WHEN writes are
committed to disk, they will be in order, but that doesn't mean
NOW. Ordering is enough for FS a journal, the only requirement
is consistency.
Anyway, it's the block device job to control disk caches. A
filesystem is just a client to the block device, it posts a
flush request, what happens depends on the block device code.
The FS doesn't talk to disks directly. And a write barrier is
not a flush request, is a "please do not reorder" request.
On fsync(), ext3 issues a flush request to the block device,
that's all it's expected to do.
Now, some block devices may implement write barriers issuing
FLUSH commands to the disk, but that's another matter. A FS
shouldn't rely on that.
You can replace a barrier with a flush (not as efficently),
but not the other way around.
If a block device driver issues FLUSH for a barrier, and
doesn't issue a FLUSH for a flush, well, it's a buggy driver,
IMHO.
.TM.