Re: New Linux xfs/reiser file systems - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: New Linux xfs/reiser file systems
Date
Msg-id 200105041749.f44HnsJ29002@candle.pha.pa.us
Whole thread Raw
In response to Re: New Linux xfs/reiser file systems  (teg@redhat.com (Trond Eivind Glomsrød))
Responses Re: New Linux xfs/reiser file systems  ("Stephen C. Tweedie" <sct@redhat.com>)
List pgsql-hackers
[ Charset ISO-8859-1 unsupported, converting... ]
> I got some information from Stephen Tweedie on this - please keep him
> "Cc:" as he's not on this list
> 
> ************************************************************************
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> 
> > I was talking to a Linux user yesterday, and he said that performance
> > using the xfs file system is pretty bad.  He believes it has to do with
> > the fact that fsync() on log-based file systems requires more writes.
> 
> 
> Performance doing what?  XFS has known performance problems doing
> unlinks and truncates, but not synchronous IO.  The user should be
> using fdatasync() for databases, btw, not fsync().

This is hugely helpful.  In PostgreSQL 7.1, we do use fdatasync() by
default it is available on a platform.


> First, XFS, ext3 and reiserfs are *NOT* log-based filesystems.  They
> are journaling filesystems.  They have a log, but they are not
> log-based because they do not store data permanently in a log
> structure.  Berkeley LFS, Sprite and Spiralog are log-based
> filesystems.

Sorry, I get those mixed up.

> > With a standard BSD/ext2 file system, WAL writes can stay on the same
> > cylinder to perform fsync.  Is that true of log-based file systems?
> 
> Not true on ext2 or BSD.  Write-aheads are _usually_ close to the
> inode, but not always.  For true log-based filesystems, writes are
> always completely sequential, so the issue just goes away.  For
> journaling filesystems, depending on the setup there may be a seek to
> the journal involved, but some journaling filesystems can use a
> separate disk for the journal so no seek is required.
> 
> > I know xfs and reiser are both log based.  Do we need to be concerned
> > about PostgreSQL performance on these file systems?  I use BSD FFS with
> > soft updates here, so it doesn't affect me.
> 
> A database normally preallocates its data files and then performs most
> of its writes using update-in-place.  In such cases, fsync() is almost
> always the wrong thing to be doing --- the data writes have changed
> nothing in the inode except for the timestamps, and there's no need to
> flush the timestamps to disk for every write.  fdatasync() is
> designed for this --- if the only inode change is timestamps,
> fdatasync() will skip the seek to the inode and will only update the
> data.  If any significant inode fields have been changed, then a full
> flush is done.

We do pre-allocate our log file space in chunks to avoid inode/block
index writes.

> Using fdatasync, most filesystems will incur no seeks for data flush,
> regardless of whether the filesystem is journaling or not.

Thanks.  That is a big help.  I wonder if people reporting performance
problems were using 7.0.3.  We only added fdatasync() in 7.1.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


pgsql-hackers by date:

Previous
From: mlw
Date:
Subject: Re: New Linux xfs/reiser file systems
Next
From: Bruce Momjian
Date:
Subject: Re: Re: New Linux xfs/reiser file systems