Thread: New Linux xfs/reiser file systems

New Linux xfs/reiser file systems

From
Bruce Momjian
Date:
I was talking to a Linux user yesterday, and he said that performance
using the xfs file system is pretty bad.  He believes it has to do with
the fact that fsync() on log-based file systems requires more writes.

With a standard BSD/ext2 file system, WAL writes can stay on the same
cylinder to perform fsync.  Is that true of log-based file systems?

I know xfs and reiser are both log based.  Do we need to be concerned
about PostgreSQL performance on these file systems?  I use BSD FFS with
soft updates here, so it doesn't affect me.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: New Linux xfs/reiser file systems

From
Bruce Momjian
Date:
> The "problem" with log based filesystems is that they most likely
> do not know the consequences of a write so an fsync on a file may
> require double writing to both the log and the "real" portion of
> the disk.  They can also exhibit the problem that an fsync may
> cause all pending writes to require scheduling unless the log is
> constructed on the fly rather than incrementally.

Yes, this double-writing is a problem.  Suppose you have your WAL on a
separate drive.  You can fsync() WAL with zero head movement.  With a
log based file system, you need two head movements, so you have gone
from zero movements to two.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: New Linux xfs/reiser file systems

From
Alfred Perlstein
Date:
* Bruce Momjian <pgman@candle.pha.pa.us> [010502 14:01] wrote:
> I was talking to a Linux user yesterday, and he said that performance
> using the xfs file system is pretty bad.  He believes it has to do with
> the fact that fsync() on log-based file systems requires more writes.
> 
> With a standard BSD/ext2 file system, WAL writes can stay on the same
> cylinder to perform fsync.  Is that true of log-based file systems?
> 
> I know xfs and reiser are both log based.  Do we need to be concerned
> about PostgreSQL performance on these file systems?  I use BSD FFS with
> soft updates here, so it doesn't affect me.

The "problem" with log based filesystems is that they most likely
do not know the consequences of a write so an fsync on a file may
require double writing to both the log and the "real" portion of
the disk.  They can also exhibit the problem that an fsync may
cause all pending writes to require scheduling unless the log is
constructed on the fly rather than incrementally.

There was also the problem that was brought up recently that
certain versions (maybe all?) of Linux perform fsync() in a very
non-optimal manner, if the user is able to use the O_FSYNC option
rather than fsync he may see a performance increase.

But his guess is probably nearly as good as mine. :)


-- 
-Alfred Perlstein - [alfred@freebsd.org]
http://www.egr.unlv.edu/~slumos/on-netbsd.html


Re: New Linux xfs/reiser file systems

From
Alfred Perlstein
Date:
* Bruce Momjian <pgman@candle.pha.pa.us> [010502 15:20] wrote:
> > The "problem" with log based filesystems is that they most likely
> > do not know the consequences of a write so an fsync on a file may
> > require double writing to both the log and the "real" portion of
> > the disk.  They can also exhibit the problem that an fsync may
> > cause all pending writes to require scheduling unless the log is
> > constructed on the fly rather than incrementally.
> 
> Yes, this double-writing is a problem.  Suppose you have your WAL on a
> separate drive.  You can fsync() WAL with zero head movement.  With a
> log based file system, you need two head movements, so you have gone
> from zero movements to two.

It may be worse depending on how the filesystem actually does
journalling.  I wonder if an fsync() may cause ALL pending
meta-data to be updated (even metadata not related to the 
postgresql files).

Do you know if reiser or xfs have this problem?

-- 
-Alfred Perlstein - [alfred@freebsd.org]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/


Re: New Linux xfs/reiser file systems

From
Bruce Momjian
Date:
> > Yes, this double-writing is a problem.  Suppose you have your WAL on a
> > separate drive.  You can fsync() WAL with zero head movement.  With a
> > log based file system, you need two head movements, so you have gone
> > from zero movements to two.
> 
> It may be worse depending on how the filesystem actually does
> journalling.  I wonder if an fsync() may cause ALL pending
> meta-data to be updated (even metadata not related to the 
> postgresql files).
> 
> Do you know if reiser or xfs have this problem?

I don't know, but the Linux user reported xfs was really slow.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: New Linux xfs/reiser file systems

From
mlw
Date:
Bruce Momjian wrote:
> 
> I was talking to a Linux user yesterday, and he said that performance
> using the xfs file system is pretty bad.  He believes it has to do with
> the fact that fsync() on log-based file systems requires more writes.
> 
> With a standard BSD/ext2 file system, WAL writes can stay on the same
> cylinder to perform fsync.  Is that true of log-based file systems?
> 
> I know xfs and reiser are both log based.  Do we need to be concerned
> about PostgreSQL performance on these file systems?  I use BSD FFS with
> soft updates here, so it doesn't affect me.

I did see poor performance on reiserfs, I have not as yet ventured into using
xfs.

I occurs to me that journalizing file systems will almost always be slower on
an application such as postgres. The journalizing file system is trying to
maintain data integrity for an application which is also trying to maintain
data integrity. There will always be extra work involved.

This behavior raises the question about file system usage in Postgres. Many
databases, such as Oracle, create table space files and operate directly on the
raw blocks, bypassing the file system altogether.

On one hand, Postgres is easy to use and maintain because it cooperates with
the native file system, on the other hand it incurs the overhead of whatever
silliness the file system wants to do. 

I would bet it is a huge amount of work to use a "table space" system and no
one wants that. lol. However, it should be noted that a bit more control over
database layout would make some great performance improvements.

The ability to put indexes on a separate volume from data.
The ability to put different tables on different volumes.
And so on.

In the short term, I think poor performance on a journalizing file system is to
be expected, unless there is an IOCTL to tell the FS to leave the files alone
(and postgres calls it). A Linux HOWTO which informs people that certain file
systems will have performance issues and why should handle the problem.

Perhaps we can convince the Linux community to create a "dbfs" which is a
stripped down simple no nonsense file system designed for applications like
databases?

-- 
I'm not offering myself as an example; every life evolves by its own laws.
------------------------
http://www.mohawksoft.com


Re: Re: New Linux xfs/reiser file systems

From
Matthew Kirkwood
Date:
On Thu, 3 May 2001, mlw wrote:

> I would bet it is a huge amount of work to use a "table space" system
> and no one wants that.

From some stracing of 7.1, the most common syscall issued by
postgres is an lseek() to the end of the file, presumably to
find its length, which seems to happen up to about a dozen
times per (pgbench) transaction.

Tablespaces would solve this (not that lseek is a particularly
expensive operation, of course).

> Perhaps we can convince the Linux community to create a "dbfs" which
> is a stripped down simple no nonsense file system designed for
> applications like databases?

Sync-metadata ext2 should be fine.  Filesystems fsck pretty
quick when they contain only a few large files.

Otherwise, something like "smugfs" (now obsolete) might do.

Matthew.



Re: Re: New Linux xfs/reiser file systems

From
Tom Lane
Date:
Matthew Kirkwood <matthew@hairy.beasts.org> writes:
> From some stracing of 7.1, the most common syscall issued by
> postgres is an lseek() to the end of the file, presumably to
> find its length, which seems to happen up to about a dozen
> times per (pgbench) transaction.

> Tablespaces would solve this (not that lseek is a particularly
> expensive operation, of course).

No, they wouldn't; or at least they'd just create a different problem.
The reason for the lseek is that the file length may have changed since
the current backend last checked it.  To avoid lseek we'd need some
shared data structure that maintains the current length of every active
table, which would be a nuisance to maintain and probably a source of
contention delays.

(Of course, such a data structure would just be the tip of the iceberg
of what we'd have to maintain for ourselves if we couldn't depend on the
kernel to do it for us.  Reimplementing a filesystem doesn't strike me
as a profitable use of our time.)
        regards, tom lane


Re: New Linux xfs/reiser file systems

From
Bruce Momjian
Date:
> > I know xfs and reiser are both log based.  Do we need to be concerned
> > about PostgreSQL performance on these file systems?  I use BSD FFS with
> > soft updates here, so it doesn't affect me.
> 
> I did see poor performance on reiserfs, I have not as yet ventured into using
> xfs.
> 
> I occurs to me that journalizing file systems will almost always be slower on
> an application such as postgres. The journalizing file system is trying to
> maintain data integrity for an application which is also trying to maintain
> data integrity. There will always be extra work involved.

Yes, the problem is that extra work is required on PostgreSQL's part. 
Log-based file systems make sure all the changes get onto the disk in an
orderly way, but I believe it can delay what gets written to the drive. 
PostgreSQL wants to be sure all the data is on the disk, period. 
Unfortunately, the _orderly_ part makes the _fsync_ part do more work. 
By going from ext2 to a log-based file system, we are getting _farther_
from a raw device that if we just sayed with ext2.

ext2 has serious problems with corrupt file systems after a crash, so I
understand the need to move to another file system type.  I have been
waitin for Linux to get a more modern file system. Unfortunately, the
new ones seem to be worse for PostgreSQL.

> This behavior raises the question about file system usage in Postgres. Many
> databases, such as Oracle, create table space files and operate directly on the
> raw blocks, bypassing the file system altogether.

OK, we have considered this, but frankly, the new, modern file systems
like FFS/softupdates have i/o rates near raw speed, with all the
advantages a file system gives us.  I believe most commercial dbs are
moving away from raw devices and toward file systems.  In the old days
the SysV file system was pretty bad at i/o & fragmentation, so they used
raw devices.

> The ability to put indexes on a separate volume from data.
> The ability to put different tables on different volumes.
> And so on.

We certainly need that, but raw devices would not make this any easier,
I think.

> In the short term, I think poor performance on a journalizing file system is to
> be expected, unless there is an IOCTL to tell the FS to leave the files alone
> (and postgres calls it). A Linux HOWTO which informs people that certain file
> systems will have performance issues and why should handle the problem.
> 
> Perhaps we can convince the Linux community to create a "dbfs" which is a
> stripped down simple no nonsense file system designed for applications like
> databases?

It could become a serious problem as people start using reiser/xfs for
their file systems and don't understand the performance problems.  Even
more likely is that they will turn off fsync, thinking reiser doesn't
need it, when in fact, I think it does.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Re: New Linux xfs/reiser file systems

From
Bruce Momjian
Date:
> Matthew Kirkwood <matthew@hairy.beasts.org> writes:
> > From some stracing of 7.1, the most common syscall issued by
> > postgres is an lseek() to the end of the file, presumably to
> > find its length, which seems to happen up to about a dozen
> > times per (pgbench) transaction.
> 
> > Tablespaces would solve this (not that lseek is a particularly
> > expensive operation, of course).
> 
> No, they wouldn't; or at least they'd just create a different problem.
> The reason for the lseek is that the file length may have changed since
> the current backend last checked it.  To avoid lseek we'd need some
> shared data structure that maintains the current length of every active
> table, which would be a nuisance to maintain and probably a source of
> contention delays.

Seems we should cache the file lengths somehow.  Not sure how to do it
because our file system cache is local to each backend.


> (Of course, such a data structure would just be the tip of the iceberg
> of what we'd have to maintain for ourselves if we couldn't depend on the
> kernel to do it for us.  Reimplementing a filesystem doesn't strike me
> as a profitable use of our time.)

Ditto.  The database is complicated enough.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Re: New Linux xfs/reiser file systems

From
Kaare Rasmussen
Date:
> > kernel to do it for us.  Reimplementing a filesystem doesn't strike me
> > as a profitable use of our time.)
> Ditto.  The database is complicated enough.

Maybe some kind of recommendation would be a good thing. That is, if the 
PostgreSQL community has enough knowledge.

A section in the docs that discusses various file systems, so people can make 
an intelligent choice.

-- 
Kaare Rasmussen            --Linux, spil,--        Tlf:        3816 2582
Kaki Data                tshirts, merchandize      Fax:        3816 2501
Howitzvej 75               Åben 14.00-18.00        Web:      www.suse.dk
2000 Frederiksberg        Lørdag 11.00-17.00       Email: kar@webline.dk


Re: Re: New Linux xfs/reiser file systems

From
bpalmer
Date:
> > This behavior raises the question about file system usage in Postgres. Many
> > databases, such as Oracle, create table space files and operate directly on the
> > raw blocks, bypassing the file system altogether.
>
> OK, we have considered this, but frankly, the new, modern file systems
> like FFS/softupdates have i/o rates near raw speed, with all the
> advantages a file system gives us.  I believe most commercial dbs are
> moving away from raw devices and toward file systems.  In the old days
> the SysV file system was pretty bad at i/o & fragmentation, so they used
> raw devices.

I'm starting to like the idea of raw FS for a few reasons:

1)  Considering that postgresql now does WAL,  the need for a logging FS
for the database doesn't seem as needed (is it needed at all?).

2)  Given the fact that postgresql is trying to support many OSs,
depending on,  for example,  XFS on a linux system will cause many
problems.  What about solaris?  How about BSD?  Etc..  Using raw db MAY be
easier than dealing with the problems that will arise from supporting
multiple filesystems.

That said,  the ability to use the system's FS does have it's advantages
(backup,  moving files,  etc).

Just some thoughts..

- Brandon

b. palmer,  bpalmer@crimelabs.net
pgp:  www.crimelabs.net/bpalmer.pgp5




Re: Re: New Linux xfs/reiser file systems

From
Gavin Sherry
Date:
On Thu, 3 May 2001, mlw wrote:

> This behavior raises the question about file system usage in Postgres. Many
> databases, such as Oracle, create table space files and operate directly on the
> raw blocks, bypassing the file system altogether.
> 
> On one hand, Postgres is easy to use and maintain because it cooperates with
> the native file system, on the other hand it incurs the overhead of whatever
> silliness the file system wants to do. 

It is not *that* hard to write a 'postgresfs' but you have to look at
the problems it creates. One of the biggest problems facing sys admins of
large sites is that the Oracle/DB2/etc DBA, having created the
purpose-build database filesystem, has not allowed enough room for
growth. Like I said, a basic file system is not difficult, but volume
management tools and the maintenance of the whole thing is. Currently,
postgres administrators are not faced with such a problem.

There is, of course, the argument that pgfs need not been enforced. The
problem is that many people would probably use it so as to have a
'superior' installation. This then entails the problems above, creating
more work for core developers.

Gavin



RE: Re: New Linux xfs/reiser file systems

From
"Christopher Kings-Lynne"
Date:
Just put a note in the installation docs that the place where the database
is initialised to should be on a non-Reiser, non-XFS mount...

Chris

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of mlw
Sent: Thursday, 3 May 2001 8:09 PM
To: Bruce Momjian; Hackers List
Subject: [HACKERS] Re: New Linux xfs/reiser file systems


Bruce Momjian wrote:
>
> I was talking to a Linux user yesterday, and he said that performance
> using the xfs file system is pretty bad.  He believes it has to do with
> the fact that fsync() on log-based file systems requires more writes.
>
> With a standard BSD/ext2 file system, WAL writes can stay on the same
> cylinder to perform fsync.  Is that true of log-based file systems?
>
> I know xfs and reiser are both log based.  Do we need to be concerned
> about PostgreSQL performance on these file systems?  I use BSD FFS with
> soft updates here, so it doesn't affect me.

I did see poor performance on reiserfs, I have not as yet ventured into
using
xfs.

I occurs to me that journalizing file systems will almost always be slower
on
an application such as postgres. The journalizing file system is trying to
maintain data integrity for an application which is also trying to maintain
data integrity. There will always be extra work involved.

This behavior raises the question about file system usage in Postgres. Many
databases, such as Oracle, create table space files and operate directly on
the
raw blocks, bypassing the file system altogether.

On one hand, Postgres is easy to use and maintain because it cooperates with
the native file system, on the other hand it incurs the overhead of whatever
silliness the file system wants to do.

I would bet it is a huge amount of work to use a "table space" system and no
one wants that. lol. However, it should be noted that a bit more control
over
database layout would make some great performance improvements.

The ability to put indexes on a separate volume from data.
The ability to put different tables on different volumes.
And so on.

In the short term, I think poor performance on a journalizing file system is
to
be expected, unless there is an IOCTL to tell the FS to leave the files
alone
(and postgres calls it). A Linux HOWTO which informs people that certain
file
systems will have performance issues and why should handle the problem.

Perhaps we can convince the Linux community to create a "dbfs" which is a
stripped down simple no nonsense file system designed for applications like
databases?

--
I'm not offering myself as an example; every life evolves by its own laws.
------------------------
http://www.mohawksoft.com

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command   (send "unregister YourEmailAddressHere" to
majordomo@postgresql.org)



Reiser and XFS -- tell the maintainers

From
Date:
There might be a problem, but if no one mentions it to the maintainers of
those
fs's, it will not get fixed...

Regards
John



RE: Re: New Linux xfs/reiser file systems

From
"Christopher Kings-Lynne"
Date:
Well, arguably if you're setting up a database server then a reasonable DBA
should think about such things...

(My 2c)

Chris

-----Original Message-----
From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
Sent: Friday, 4 May 2001 9:42 AM
To: Christopher Kings-Lynne
Cc: mlw; Hackers List
Subject: Re: [HACKERS] Re: New Linux xfs/reiser file systems


> Just put a note in the installation docs that the place where the database
> is initialised to should be on a non-Reiser, non-XFS mount...

Sure, we can do that now.  What do we do when these are the default file
systems for Linux?  We can tell them to create other types of file
systems, but that is a pretty big hurdle.  I wonder if it would be
easier to get reiser/xfs to make some modifications.

-- Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 



Re: Re: New Linux xfs/reiser file systems

From
Bruce Momjian
Date:
> Just put a note in the installation docs that the place where the database
> is initialised to should be on a non-Reiser, non-XFS mount...

Sure, we can do that now.  What do we do when these are the default file
systems for Linux?  We can tell them to create other types of file
systems, but that is a pretty big hurdle.  I wonder if it would be
easier to get reiser/xfs to make some modifications.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Re: New Linux xfs/reiser file systems

From
Bruce Momjian
Date:
> Well, arguably if you're setting up a database server then a reasonable DBA
> should think about such things...

Yes, but people have trouble installing PostgreSQL.  I can't imagine
walking them through a newfs.


> 
> (My 2c)
> 
> Chris
> 
> -----Original Message-----
> From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
> Sent: Friday, 4 May 2001 9:42 AM
> To: Christopher Kings-Lynne
> Cc: mlw; Hackers List
> Subject: Re: [HACKERS] Re: New Linux xfs/reiser file systems
> 
> 
> > Just put a note in the installation docs that the place where the database
> > is initialised to should be on a non-Reiser, non-XFS mount...
> 
> Sure, we can do that now.  What do we do when these are the default file
> systems for Linux?  We can tell them to create other types of file
> systems, but that is a pretty big hurdle.  I wonder if it would be
> easier to get reiser/xfs to make some modifications.
> 
> --
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 853-3000
>   +  If your life is a hard drive,     |  830 Blythe Avenue
>   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
> 
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: New Linux xfs/reiser file systems

From
mlw
Date:
Bruce Momjian wrote:
> 
> > Just put a note in the installation docs that the place where the database
> > is initialised to should be on a non-Reiser, non-XFS mount...
> 
> Sure, we can do that now.  What do we do when these are the default file
> systems for Linux?  We can tell them to create other types of file
> systems, but that is a pretty big hurdle.  I wonder if it would be
> easier to get reiser/xfs to make some modifications.


I have looked at Reiser, and I don't think it is a file system suited for very
large files, or applications such as postgres. The Linux crowd should lobby
against any such trend. It is ok for many moderately small files. ReiserFS
would be great for a cddb server, but poor for a database box.

XFS is a real big file system project, I'd bet that there are file properties
or management tools to tell it to leave directories and files alone. They
should have addressed that years ago.

One last mention..

Having better control over WHERE various files in a database are located can
make it easier to deal with these things.

Just a thought. ;-)

-- 
I'm not offering myself as an example; every life evolves by its own laws.
------------------------
http://www.mohawksoft.com


Re: Re: New Linux xfs/reiser file systems

From
"carl garland"
Date:
>
> > Just put a note in the installation docs that the place where the 
>database
> > is initialised to should be on a non-Reiser, non-XFS mount...
>
>Sure, we can do that now.

I still think this is not necessarily the right approach either. One
major purpose of using a journaling fs is for fast boot up time after
crash.  If you have a 100 GB database you may wish to have the data
on XFS.  I do think that the WAL log should be on a separate disk and
on a non-journaling fs for performance.

Best Regards,
Carl Garland

_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com



Re: New Linux xfs/reiser file systems

From
mlw
Date:
Here is a radical idea...

What is it that is causing Postgres trouble? It is the file system's attempts
to maintain some integrity. So I proposed a simple "dbfs" sort of thing which
was the most basic sort of file system possible.

I'm not sure, but I think we can test this hypothesis on the FAT32 file system
on Linux. As far as I know, FAT32 (FAT in general) is a very simple file system
and does very little during operation, except read and write the files and
manage what's been allocated. Plus, the allocation table is very simple in
comparison all the other file systems.

Would pgbench run on a system using ext2, Reiser, then FAT32 be sufficient to
get a feeling for the type of performance Postgres would get, or am I just off
the wall?

If this idea has some merit, what would be the best way to test it? Move the
pg_xlog directory first, then try base? What's the best methodology to try?


carl garland wrote:
> 
> >
> > > Just put a note in the installation docs that the place where the
> >database
> > > is initialised to should be on a non-Reiser, non-XFS mount...
> >
> >Sure, we can do that now.
> 
> I still think this is not necessarily the right approach either. One
> major purpose of using a journaling fs is for fast boot up time after
> crash.  If you have a 100 GB database you may wish to have the data
> on XFS.  I do think that the WAL log should be on a separate disk and
> on a non-journaling fs for performance.
> 
> Best Regards,
> Carl Garland
> 
> _________________________________________________________________
> Get your FREE download of MSN Explorer at http://explorer.msn.com
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

-- 
I'm not offering myself as an example; every life evolves by its own laws.
------------------------
http://www.mohawksoft.com


Re: Re: New Linux xfs/reiser file systems

From
Michael Samuel
Date:
On Thu, May 03, 2001 at 11:41:24AM -0400, Bruce Momjian wrote:
> ext2 has serious problems with corrupt file systems after a crash, so I
> understand the need to move to another file system type.  I have been
> waitin for Linux to get a more modern file system. Unfortunately, the
> new ones seem to be worse for PostgreSQL.

If you fsync() a directory in Linux, all the metadata within that directory
will be written out to disk.

As for filesystem corruption, I can say the e2fsck is among the best fsck
programs out there, and I've only ever had 1 occasion where I've lost any
data on an ext2 filesystem, and that was due to bad sectors causing me to
lose the root directory. (Well, apart from human errors, but that doesn't
count)

> OK, we have considered this, but frankly, the new, modern file systems
> like FFS/softupdates have i/o rates near raw speed, with all the
> advantages a file system gives us.  I believe most commercial dbs are
> moving away from raw devices and toward file systems.  In the old days
> the SysV file system was pretty bad at i/o & fragmentation, so they used
> raw devices.

And Solaris' 1/01 media has better support for O_DIRECT (?), which they claim
gives you 93% of the speed of a raw device. (Or something like that; I read
this in marketing material a couple of months ago)

Raw devices are designed to have filesystems on them.  The only excuses for
userland tools accessing them, are fs-specific tools (eg. dump, fsck, etc),
or for non-unix filesystem tools, where the unix VFS doesn't handle things
properly (hfstools).

> > The ability to put indexes on a separate volume from data.
> > The ability to put different tables on different volumes.
> > And so on.
> 
> We certainly need that, but raw devices would not make this any easier,
> I think.

It would be cool if either at compile time or at database creation time, we
could specify a printf-like format for placing tables, indexes, etc.

> It could become a serious problem as people start using reiser/xfs for
> their file systems and don't understand the performance problems.  Even
> more likely is that they will turn off fsync, thinking reiser doesn't
> need it, when in fact, I think it does.

ReiserFS only supports metadata logging.  The performance slowdown must be
due to logging things like mtime or atime, because otherwise ReiserFS is a
very high performance FS. (Although, I admittedly haven't used it since it
was early in it's development)

-- 
Michael Samuel <michael@miknet.net>


Re: New Linux xfs/reiser file systems

From
mlw
Date:
Michael Samuel wrote:
> 
> ReiserFS only supports metadata logging.  The performance slowdown must be
> due to logging things like mtime or atime, because otherwise ReiserFS is a
> very high performance FS. (Although, I admittedly haven't used it since it
> was early in it's development)

The way I understand it is that ReiserFS does not attempt to separate files at
the block level. Multiple files can live in the same disk block. This is cool
if you have many small files, but the extra overhead for large files such as
those used by a database, is a bit much.

I read some stuff about a year ago, and my impressions forced me to conclude
that ReiserFS was geared toward applications. Which is a pretty good thing for
applications, but not for databases. 

I really think a simple low down dirty file system is just what the doctor
ordered for postgres.

Remember, general purpose file systems must do for files what Postgres is
already doing for records. You will always have extra work. I am seriously
thinking of trying a FAT32 as pg_xlog. I wonder if it will improve performance,
or if there is just something fundamentally stupid about FAT32 that will make
it worse?

-- 
I'm not offering myself as an example; every life evolves by its own laws.
------------------------
http://www.mohawksoft.com


Re: New Linux xfs/reiser file systems

From
"Ken Hirsch"
Date:
Before we get too involved in speculating, shouldn't we actually measure the
performance of 7.1 on XFS and Reiserfs?  Since it's easy to disable fsync,
we can test whether that's the problem.  I don't think that logging file
systems must intrinsically give bad performance on fsync since they only log
metadata changes.

I don't have a machine with XFS installed and it will be at least a week
before I could get around to a build.  Any volunteers?

Ken Hirsch




Re: Re: New Linux xfs/reiser file systems

From
teg@redhat.com (Trond Eivind Glomsrød)
Date:
mlw <markw@mohawksoft.com> writes:

> I have looked at Reiser, and I don't think it is a file system suited for very
> large files, or applications such as postgres.

What's the problem with big files? ReiserFS v2 doesn't seem to support
it, while v3 seems just fine (of the ondisk format)

That said, I'm certainly looking forward to xfs - I believe it will be
the most widely used of the current batch of journaling file systems
(reiserfs, jfs, XFS and ext3, the latter mainly focusing on an easy
migration path for existing system)

-- 
Trond Eivind Glomsrød
Red Hat, Inc.


Re: Re: New Linux xfs/reiser file systems

From
Michael Samuel
Date:
On Fri, May 04, 2001 at 08:02:17AM -0400, mlw wrote:
> The way I understand it is that ReiserFS does not attempt to separate files at
> the block level. Multiple files can live in the same disk block. This is cool
> if you have many small files, but the extra overhead for large files such as
> those used by a database, is a bit much.

It should be at least as fast as other filesystems for large files. I suspect
that it would be faster in fact.  The only catch is that the performance of
reiserfs sucks when it gets past 85% or so full. (ext2 has similar problems)

You can read about all this stuff at http://www.namesys.com/

> I really think a simple low down dirty file system is just what the doctor
> ordered for postgres.

Traditional BSD FFS or Solaris UFS is probably the best bet for postgres.

> Remember, general purpose file systems must do for files what Postgres is
> already doing for records. You will always have extra work. I am seriously
> thinking of trying a FAT32 as pg_xlog. I wonder if it will improve performance,
> or if there is just something fundamentally stupid about FAT32 that will make
> it worse?

Well, for a starters, file permissions...

Ext2 would kick arse over FAT32 for performance.

-- 
Michael Samuel <michael@miknet.net>


Re: Re: New Linux xfs/reiser file systems

From
Roland Roberts
Date:
>>>>> "Bruce" == Bruce Momjian <pgman@candle.pha.pa.us> writes:
   >> Well, arguably if you're setting up a database server then a   >> reasonable DBA should think about such
things...
   Bruce> Yes, but people have trouble installing PostgreSQL.  I   Bruce> can't imagine walking them through a newfs.

In most of linux-land, the DBA is probably also the sysadmin.  In
bigger shops, and those which currently run, say Oracle or Sybase, the
two roles are separate.  When they are separate, you don't have to
walk the DBA through it; he just walks over to the sysadmin and says
"I need X megabytes of space on a new Y filesystem."

roland
--            PGP Key ID: 66 BC 3B CD
Roland B. Roberts, PhD                             RL Enterprises
roland@rlenter.com                     76-15 113th Street, Apt 3B
rbroberts@acm.org                          Forest Hills, NY 11375


Re: New Linux xfs/reiser file systems

From
teg@redhat.com (Trond Eivind Glomsrød)
Date:
I got some information from Stephen Tweedie on this - please keep him
"Cc:" as he's not on this list

************************************************************************
Bruce Momjian <pgman@candle.pha.pa.us> writes:

> I was talking to a Linux user yesterday, and he said that performance
> using the xfs file system is pretty bad.  He believes it has to do with
> the fact that fsync() on log-based file systems requires more writes.


Performance doing what?  XFS has known performance problems doing
unlinks and truncates, but not synchronous IO.  The user should be
using fdatasync() for databases, btw, not fsync().

First, XFS, ext3 and reiserfs are *NOT* log-based filesystems.  They
are journaling filesystems.  They have a log, but they are not
log-based because they do not store data permanently in a log
structure.  Berkeley LFS, Sprite and Spiralog are log-based
filesystems.

> With a standard BSD/ext2 file system, WAL writes can stay on the same
> cylinder to perform fsync.  Is that true of log-based file systems?

Not true on ext2 or BSD.  Write-aheads are _usually_ close to the
inode, but not always.  For true log-based filesystems, writes are
always completely sequential, so the issue just goes away.  For
journaling filesystems, depending on the setup there may be a seek to
the journal involved, but some journaling filesystems can use a
separate disk for the journal so no seek is required.

> I know xfs and reiser are both log based.  Do we need to be concerned
> about PostgreSQL performance on these file systems?  I use BSD FFS with
> soft updates here, so it doesn't affect me.

A database normally preallocates its data files and then performs most
of its writes using update-in-place.  In such cases, fsync() is almost
always the wrong thing to be doing --- the data writes have changed
nothing in the inode except for the timestamps, and there's no need to
flush the timestamps to disk for every write.  fdatasync() is
designed for this --- if the only inode change is timestamps,
fdatasync() will skip the seek to the inode and will only update the
data.  If any significant inode fields have been changed, then a full
flush is done.

Using fdatasync, most filesystems will incur no seeks for data flush,
regardless of whether the filesystem is journaling or not.

Cheers,Stephen
************************************************************************

-- 
Trond Eivind Glomsrød
Red Hat, Inc.


Re: Re: New Linux xfs/reiser file systems

From
Kaare Rasmussen
Date:
> Sure, we can do that now.  What do we do when these are the default file
> systems for Linux?  We can tell them to create other types of file

What is a 'default file system' ? I know that untill now, everybody is using 
ext2. But that's only because there hasn't been anything comparable. Now we 
se ReiserFS, and my SuSE installation offers the choice. In the future, I 
believe that people can choose from ext2, ReiserFS,xfs, ext3 and maybe more.

> systems, but that is a pretty big hurdle.  I wonder if it would be
> easier to get reiser/xfs to make some modifications.

No, I don't think it's a big hurdle. If you just want to play with 
PostgreSQL, you wont care. If you're serious, you'll repartition.

-- 
Kaare Rasmussen            --Linux, spil,--        Tlf:        3816 2582
Kaki Data                tshirts, merchandize      Fax:        3816 2501
Howitzvej 75               Åben 14.00-18.00        Web:      www.suse.dk
2000 Frederiksberg        Lørdag 11.00-17.00       Email: kar@webline.dk


Re: Re: New Linux xfs/reiser file systems

From
Bruce Momjian
Date:
[ Charset ISO-8859-1 unsupported, converting... ]
> Before we get too involved in speculating, shouldn't we actually measure the
> performance of 7.1 on XFS and Reiserfs?  Since it's easy to disable fsync,
> we can test whether that's the problem.  I don't think that logging file
> systems must intrinsically give bad performance on fsync since they only log
> metadata changes.
> 
> I don't have a machine with XFS installed and it will be at least a week
> before I could get around to a build.  Any volunteers?

There have been multiple reports of poor PostgreSQL performance on
Reiser and xfs.  I don't have numbers, though.  Frankly, I think we need
xfs and reiser experts involved to figure out our options here.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Re: New Linux xfs/reiser file systems

From
Bruce Momjian
Date:
[ Charset ISO-8859-1 unsupported, converting... ]
> > Sure, we can do that now.  What do we do when these are the default file
> > systems for Linux?  We can tell them to create other types of file
> 
> What is a 'default file system' ? I know that untill now, everybody is using 
> ext2. But that's only because there hasn't been anything comparable. Now we 
> se ReiserFS, and my SuSE installation offers the choice. In the future, I 
> believe that people can choose from ext2, ReiserFS,xfs, ext3 and maybe more.

But some day the default will be a log-based file system, and people
will have to hunt around to create a non-log based one.

> > systems, but that is a pretty big hurdle.  I wonder if it would be
> > easier to get reiser/xfs to make some modifications.
> 
> No, I don't think it's a big hurdle. If you just want to play with 
> PostgreSQL, you wont care. If you're serious, you'll repartition.

Yes, but we could get a reputation for slowness on these log-based file
systems.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Re: New Linux xfs/reiser file systems

From
Bruce Momjian
Date:
> On Fri, May 04, 2001 at 08:02:17AM -0400, mlw wrote:
> > The way I understand it is that ReiserFS does not attempt to separate files at
> > the block level. Multiple files can live in the same disk block. This is cool
> > if you have many small files, but the extra overhead for large files such as
> > those used by a database, is a bit much.
> 
> It should be at least as fast as other filesystems for large files. I suspect
> that it would be faster in fact.  The only catch is that the performance of
> reiserfs sucks when it gets past 85% or so full. (ext2 has similar problems)

That is pretty standard for most modern file systems.  They need that
free space to optimize.


> 
> You can read about all this stuff at http://www.namesys.com/
> 
> > I really think a simple low down dirty file system is just what the doctor
> > ordered for postgres.
> 
> Traditional BSD FFS or Solaris UFS is probably the best bet for postgres.

That is my opinion.  BSD FFS seems to be general enough to give good
performance for a large scale of application needs.  It is not as fast
as XFS for streaming large files (media), and it doesn't optimize small
files below the 1k size (fragments), and it does require fsck on reboot.

However, looking at all those for PostgreSQL, the costs of the new Linux
file systems seems pretty high, especially considering our need for
fsync().

What I am really concerned about is when xfs/reiser become the default
file systems for Linux, and people complain about PostgreSQL
performance.  And if we require special file systems, we lose some of
our ability to easily grow.  Because of ext2's problems with crash
recovery, who is going to want to put other data on that file system
when they have xfs/reiser available.  And boots are going to have to
fsck that ext2 file system.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: New Linux xfs/reiser file systems

From
Thomas Swan
Date:
mlw wrote:<br /><blockquote cite="mid:3AF2200D.922E5723@mohawksoft.com" type="cite"><pre wrap="">Bruce Momjian
wrote:<br/></pre><blockquote type="cite"><blockquote type="cite"><pre wrap="">Just put a note in the installation docs
thatthe place where the database<br />is initialised to should be on a non-Reiser, non-XFS mount...<br
/></pre></blockquote><prewrap="">Sure, we can do that now.  What do we do when these are the default file<br />systems
forLinux?  We can tell them to create other types of file<br />systems, but that is a pretty big hurdle.  I wonder if
itwould be<br />easier to get reiser/xfs to make some modifications.<br /></pre></blockquote><pre wrap=""><br /><br />I
havelooked at Reiser, and I don't think it is a file system suited for very<br />large files, or applications such as
postgres.The Linux crowd should lobby<br />against any such trend. It is ok for many moderately small files.
ReiserFS<br/>would be great for a cddb server, but poor for a database box.<br /><br />XFS is a real big file system
project,I'd bet that there are file properties<br />or management tools to tell it to leave directories and files
alone.They<br />should have addressed that years ago.<br /><br />One last mention..<br /><br />Having better control
overWHERE various files in a database are located can<br />make it easier to deal with these things.</pre></blockquote>
Ithink it's worth noting that Oracle has been petitioning the kernel developers for better raw device support: in other
words,the ability to write directly to the hard disk and bypassing the filesystem all together.   <br /><br /> If the
dbis going to assume the responsibility of disk write verification it seems reasonable to assume you might want to
investigatethe raw disk i/o options.<br /><br /> Telling your installers that a major performance gain is attainable by
doingso might be a start in the opposite direction.   I've monitored a lot of discussions and from what I can gather,
postgresqldoes it's own set of journaling operations.  I don't think that it's necessary for writes to be double
journalledanyway.<br /><br /> Again, just my two cents worth...<br /> 

Re: Re: New Linux xfs/reiser file systems

From
teg@redhat.com (Trond Eivind Glomsrød)
Date:
"Ken Hirsch" <kenhirsch@myself.com> writes:

> I don't have a machine with XFS installed and it will be at least a week
> before I could get around to a build.  Any volunteers?

I think I could do that... any useful benchmarks to run?

-- 
Trond Eivind Glomsrød
Red Hat, Inc.


Re: New Linux xfs/reiser file systems

From
Bruce Momjian
Date:
> Hi,
> 
> On Fri, May 04, 2001 at 01:49:54PM -0400, Bruce Momjian wrote:
> > > 
> > > Performance doing what?  XFS has known performance problems doing
> > > unlinks and truncates, but not synchronous IO.  The user should be
> > > using fdatasync() for databases, btw, not fsync().
> > 
> > This is hugely helpful.  In PostgreSQL 7.1, we do use fdatasync() by
> > default it is available on a platform.
> 
> Good --- fdatasync is defined in SingleUnix, so it's probably safe to
> probe for it and use it by default if it is there.
> 
> The 2.2 Linux kernel does not have fdatasync implemented, but glibc
> will fall back to fsync if that's all that the kernel supports.  2.4
> implements both with the required semantics.

OK, that is something we found too, that fdatasync() was there on some
platforms, but was really just an fsync().  I believe some HPUX
platforms had that.

OK, so they need a 2.4 kernel to properly test performance of Reiser/xfs
with fdatasync().

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: New Linux xfs/reiser file systems

From
mlw
Date:
Michael Samuel wrote:

>
> > Remember, general purpose file systems must do for files what Postgres is
> > already doing for records. You will always have extra work. I am seriously
> > thinking of trying a FAT32 as pg_xlog. I wonder if it will improve performance,
> > or if there is just something fundamentally stupid about FAT32 that will make
> > it worse?
>
> Well, for a starters, file permissions...
>
> Ext2 would kick arse over FAT32 for performance.

OK, I'll bite.

In a database environment where file creation is not such an issue, why would ext2
be faster?

The FAT file system has, AFAIK, very little overhead for file writes. It simply
writes the two FAT tables on file extension, and data. Depending on cluster size,
there is probably even less happening there.

I don't think that anyone is saying that FAT is the answer in a production
environment, but maybe we can do a comparison of various file systems and see if any
performance issues show up.

I mentioned FAT only because I was thinking about how postgres would perform on a
very simple file system, one which bypasses most of the normal stuff a "good"
general purpose file system would do. While I was thinking this, it occurred to me
that FAT was about he cheesiest simple file system one could find, short of a ram
disk, and maybe we could use it to test the assumptions about performance impact of
the file system on postgres.

Just a thought. If you know of some reason why ext2 would perform better in the
postgres environment, I would love to hear why, I'm very curious.



Re: New Linux xfs/reiser file systems

From
Bruce Momjian
Date:
[ Charset ISO-8859-1 unsupported, converting... ]
> I got some information from Stephen Tweedie on this - please keep him
> "Cc:" as he's not on this list
> 
> ************************************************************************
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> 
> > I was talking to a Linux user yesterday, and he said that performance
> > using the xfs file system is pretty bad.  He believes it has to do with
> > the fact that fsync() on log-based file systems requires more writes.
> 
> 
> Performance doing what?  XFS has known performance problems doing
> unlinks and truncates, but not synchronous IO.  The user should be
> using fdatasync() for databases, btw, not fsync().

This is hugely helpful.  In PostgreSQL 7.1, we do use fdatasync() by
default it is available on a platform.


> First, XFS, ext3 and reiserfs are *NOT* log-based filesystems.  They
> are journaling filesystems.  They have a log, but they are not
> log-based because they do not store data permanently in a log
> structure.  Berkeley LFS, Sprite and Spiralog are log-based
> filesystems.

Sorry, I get those mixed up.

> > With a standard BSD/ext2 file system, WAL writes can stay on the same
> > cylinder to perform fsync.  Is that true of log-based file systems?
> 
> Not true on ext2 or BSD.  Write-aheads are _usually_ close to the
> inode, but not always.  For true log-based filesystems, writes are
> always completely sequential, so the issue just goes away.  For
> journaling filesystems, depending on the setup there may be a seek to
> the journal involved, but some journaling filesystems can use a
> separate disk for the journal so no seek is required.
> 
> > I know xfs and reiser are both log based.  Do we need to be concerned
> > about PostgreSQL performance on these file systems?  I use BSD FFS with
> > soft updates here, so it doesn't affect me.
> 
> A database normally preallocates its data files and then performs most
> of its writes using update-in-place.  In such cases, fsync() is almost
> always the wrong thing to be doing --- the data writes have changed
> nothing in the inode except for the timestamps, and there's no need to
> flush the timestamps to disk for every write.  fdatasync() is
> designed for this --- if the only inode change is timestamps,
> fdatasync() will skip the seek to the inode and will only update the
> data.  If any significant inode fields have been changed, then a full
> flush is done.

We do pre-allocate our log file space in chunks to avoid inode/block
index writes.

> Using fdatasync, most filesystems will incur no seeks for data flush,
> regardless of whether the filesystem is journaling or not.

Thanks.  That is a big help.  I wonder if people reporting performance
problems were using 7.0.3.  We only added fdatasync() in 7.1.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Re: New Linux xfs/reiser file systems

From
Bruce Momjian
Date:
> > There have been multiple reports of poor PostgreSQL performance on
> > Reiser and xfs.  I don't have numbers, though.  Frankly, I think we need
> > xfs and reiser experts involved to figure out our options here.
> 
> I've done some testing to see how Reiserfs performs
> vs ext2, and also various for various values of wal_sync_method while on a
> reiserfs partition. The attached graph shows the results. The y axis is
> transactions per second and the x axis is the transaction number. It was
> clear that, at least for my specific app, ext2 was significantly faster.
> 
> The hardware I tested on has an Athalon 1 Ghz cpu and 512 MB ram. The
> harddrive is a 2 year old IDE drive. I'm running Red Hat 7 with all the
> latest updates, and a freshly compiled 2.4.2 kernel with the latest Reiserfs
> patch, and of course PostgreSQL 7.1. The transactions were run in a loop,
> 700 times per test, to insert sample data into 4 tables. I used a PHP script
> running on the same machine to do the inserts.
> 
> I'd be happy to provide more detail or try a different variation if anyone
> is interested.

This is hugely helpful.

Yikes, look at those lines.  It shows a few things.  

First, under Reiser, nosync, fsync, and fdatasync are pretty much the
same.  The big surprise here is that fsync doesn't seem to have any
effect.

Second surprise is that open fsync, which synces on every write rather
than on end of transaction, was slower.  I believe this should be slower
if multiple WAL writes are being made in one transaction.  fdatasync
would sync just at end of transaction, while each WAL write would be
synced by open fsync.

And the largest surpise is that ext2 is faster, but not because of
fsync, and almost double so.  Keep in mind that WAL writes are no the
only write happening.  Though in 7.1 we don't flush the data blocks to
disk, we do write to disk as the buffer cache fill up with dirty
buffers.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Re: New Linux xfs/reiser file systems

From
"Ken Hirsch"
Date:
Joe Conway <joe@conway-family.com> wrote:
>
> I've done some testing to see how Reiserfs performs
> vs ext2, and also various for various values of wal_sync_method while on a
> reiserfs partition. The attached graph shows the results. The y axis is
> transactions per second and the x axis is the transaction number. It was
> clear that, at least for my specific app, ext2 was significantly faster.

This is great, thanks a lot!  Among other things it tells us, it appears
that fsync() is not the problem on Reiserfs.  I don't know the details of
Reiserfs, but I think a lot of work has gone into optimizing it for very
small files, so you can use the file system as a simple database for
strings, a la Windows registry.  I don't remember hearing about optimizing
for large files and large block reads and writes.

XFS, on the other hand, is used for very large files on SGI systems.

I think the XFS and Reiserfs folks will be happy to look at the performance
problem, but it would be very helpful for them to have a prepackaged
benchmark (or two or three) to use.   We should set up an FTP area to share
them.  Joe, can you contribute yours?   Does anybody else have anything?

Already, Trond Eivind Glomsrød teg@redhat.com has volunteered to test on
XFS.  The easier we make it, the more help we'll get.

Ken Hirsch







Re: Re: New Linux xfs/reiser file systems

From
"Joe Conway"
Date:
> I think the XFS and Reiserfs folks will be happy to look at the
performance
> problem, but it would be very helpful for them to have a prepackaged
> benchmark (or two or three) to use.   We should set up an FTP area to
share
> them.  Joe, can you contribute yours?   Does anybody else have anything?
>

I don't mind contributing the script and schema that I used, but one thing I
failed to mention in my first post is that the first thing the script does
is open connections to 256 databases (all on this same machine), and the
transactions are relatively evenly dispersed among the 256 connections. The
test was originally written to try out an idea to allow scalability by
partitioning the data into seperate databases (which could eventually each
live on its own server). If you are interested I can modify the test to use
only one database and rerun the same tests this weekend.

Joe



Re: New Linux xfs/reiser file systems

From
Lincoln Yeoh
Date:
At 02:09 AM 5/4/01 -0500, Thomas Swan wrote:
> I think it's worth noting that Oracle has been petitioning the
> kernel developers for better raw device support: in other words,
> the ability to write directly to the hard disk and bypassing the
> filesystem all together.   

But there could be other reasons why Oracle would want to do raw stuff.

1) They have more things to sell - management modules/software. More
training courses. Certified blahblahblah. More features in brochure.
2) It just helps make things more proprietary. Think lock in.

All that for maybe 10% performance increase?

I think it's more advantageous for Postgresql to keep the filesystem layer
of abstraction, than to do away with it, and later reinvent certain parts
of it along with new bugs.

What would be useful is if one can specify where the tables, indexes, WAL
and other files go. That feature would probably help improve performance
far more. 

For example: you could then stick the WAL on a battery backed up RAM disk.
How much total space does a WAL log need?

A battery backed RAM disk might even be cheaper than Brand X RDBMS
Proprietary Feature #5.

Cheerio,
Link.



Re: New Linux xfs/reiser file systems

From
mlw
Date:
Lincoln Yeoh wrote:
> 
> At 02:09 AM 5/4/01 -0500, Thomas Swan wrote:
> > I think it's worth noting that Oracle has been petitioning the
> > kernel developers for better raw device support: in other words,
> > the ability to write directly to the hard disk and bypassing the
> > filesystem all together.
> 
> But there could be other reasons why Oracle would want to do raw stuff.
> 
> 1) They have more things to sell - management modules/software. More
> training courses. Certified blahblahblah. More features in brochure.
> 2) It just helps make things more proprietary. Think lock in.
> 
> All that for maybe 10% performance increase?
> 
> I think it's more advantageous for Postgresql to keep the filesystem layer
> of abstraction, than to do away with it, and later reinvent certain parts
> of it along with new bugs.

I just did a test of putting pg_xlog on a FAT file system, and my first rough
tests (pgbench) show an approximate 20% performance increase over ext2 with
fsync enabled.


-- 
I'm not offering myself as an example; every life evolves by its own laws.
------------------------
http://www.mohawksoft.com


Re: New Linux xfs/reiser file systems

From
thomas graichen
Date:
Bruce Momjian <pgman@candle.pha.pa.us> wrote:
>> > Yes, this double-writing is a problem.  Suppose you have your WAL on a
>> > separate drive.  You can fsync() WAL with zero head movement.  With a
>> > log based file system, you need two head movements, so you have gone
>> > from zero movements to two.
>> 
>> It may be worse depending on how the filesystem actually does
>> journalling.  I wonder if an fsync() may cause ALL pending
>> meta-data to be updated (even metadata not related to the 
>> postgresql files).
>> 
>> Do you know if reiser or xfs have this problem?

> I don't know, but the Linux user reported xfs was really slow.

i think this should be tested in more detail: i once tried this
lightly (running pgbench against postgresql 7.1beta4) with
different filesystems: ext2, reiserfs and XFS and reproducable
i got about 15% better results running on XFS ... ok - it's
not a very big test, but i think it might be worth to really
do an a/b test before seing it as a fact that postgresql is
slow on XFS (and maybe reiserfs too ... but reiserfs has had
performance problems in certain situations anyway)

XFS is a journaling fs, but it does all it's work in a very
clever way (delayed allocation etc.) - so usually you should
under normal conditions get decent performance out of it -
otherwise it might be worth sending a mail to the XFS
mailinglist (resierfs maybe dito)

t

-- 
thomas graichen <tgr@spoiled.org> ... perfection is reached, not
when there is no longer anything to add, but when there is no
longer anything to take away. --- antoine de saint-exupery


Re: New Linux xfs/reiser file systems

From
Lincoln Yeoh
Date:
At 01:16 PM 5/5/01 -0400, mlw wrote:
>Lincoln Yeoh wrote:
>> 
>> All that for maybe 10% performance increase?
>> 
>> I think it's more advantageous for Postgresql to keep the filesystem layer
>> of abstraction, than to do away with it, and later reinvent certain parts
>> of it along with new bugs.
>
>I just did a test of putting pg_xlog on a FAT file system, and my first rough
>tests (pgbench) show an approximate 20% performance increase over ext2 with
>fsync enabled.

OK. I slouch corrected :). It's more than 10%.

However in the same message I did also say:
>What would be useful is if one can specify where the tables, indexes, WAL
>and other files go. That feature would probably help improve performance
>far more. 
>
>For example: you could then stick the WAL on a battery backed up RAM disk.
>How much total space does a WAL log need?
>
>A battery backed RAM disk might even be cheaper than Brand X RDBMS
>Proprietary Feature #5.

And your experiments do help show that it is useful to be able to specify
where things go, that putting just the WAL somewhere else makes things 20%
faster. So you don't have to put everything on a pgfs. Just the WAL on some
other FS (even FAT32, ick ;) ).

---
OK we can do that with symlinks, but is there a PGSQL Recommended or
Standard way to do it, so as to reduce administrative errors, and at least
help improve consistency with multiadmin pgsql installations?

The WAL and DBs are in separate directories, so this makes things easy. But
the object names are now all numbers so that makes things a bit harder -
and what to do with temp tables?

Would it be good to have tables in one directory and indexes in another? Or
most people optimize on a specific table/index basis? Where does PGSQL do
the on-disk sorts?

How about naming the DB objects <object ID>.<object name>?
e.g

121575.testtable
125575.testtableindex

(or the other way round - name.OID - harder for DB, easier for admin?)

They'll still be unique, but now they're admin readable. Slower? e.g. at
that code point, pgsql no longer knows the object's name, and wants to
refer to everything by just numbers? 

I apologize if there was already a long discussion on this. I seem to
recall Bruce saying that the developers agonized over this.

Cheerio,
Link.




Re: Re: New Linux xfs/reiser file systems

From
Hannu Krosing
Date:
Lincoln Yeoh wrote:
> 
> At 01:16 PM 5/5/01 -0400, mlw wrote:
> >Lincoln Yeoh wrote:
> >>
> >> All that for maybe 10% performance increase?
> >>
> >> I think it's more advantageous for Postgresql to keep the filesystem layer
> >> of abstraction, than to do away with it, and later reinvent certain parts
> >> of it along with new bugs.
> >
> >I just did a test of putting pg_xlog on a FAT file system, and my first rough
> >tests (pgbench) show an approximate 20% performance increase over ext2 with
> >fsync enabled.
> 
> OK. I slouch corrected :). It's more than 10%.
> 
> However in the same message I did also say:
> >What would be useful is if one can specify where the tables, indexes, WAL
> >and other files go. That feature would probably help improve performance
> >far more.
> >
> >For example: you could then stick the WAL on a battery backed up RAM disk.
> >How much total space does a WAL log need?
> >
> >A battery backed RAM disk might even be cheaper than Brand X RDBMS
> >Proprietary Feature #5.
> 
> And your experiments do help show that it is useful to be able to specify
> where things go, that putting just the WAL somewhere else makes things 20%
> faster. So you don't have to put everything on a pgfs. Just the WAL on some
> other FS (even FAT32, ick ;) ).

So you propose pgwalfs ? ;)

It may be much easier to implement than a full fs.

How hard would it be to let wal reside on a (raw) device ?

If we already pre-allocate a required number of fixed-size files would
it be too 
hard to replace them with plain (raw) devices and test for possible
performance gains ?

> 
> How about naming the DB objects <object ID>.<object name>?
> e.g
> 
> 121575.testtable
> 125575.testtableindex
> 

This sure seems to be an elegant solution for the problem that seems to
be impossible 
to solve with symlinks and such. Even the IMHO hardest to solve problem
- RENAME - can 
probably be done in a transaction-safe manner by doing a
link(oid.<newname>) in the 
beginning and selective unlink(oid.<newname/oldname>) at commit time.

--------------------
Hannu


Re: New Linux xfs/reiser file systems

From
mlw
Date:
Hannu Krosing wrote:
> 
> Lincoln Yeoh wrote:
> >
> > At 01:16 PM 5/5/01 -0400, mlw wrote:
> > >Lincoln Yeoh wrote:
> > >>
> > >> All that for maybe 10% performance increase?
> > >>
> > >> I think it's more advantageous for Postgresql to keep the filesystem layer
> > >> of abstraction, than to do away with it, and later reinvent certain parts
> > >> of it along with new bugs.
> > >
> > >I just did a test of putting pg_xlog on a FAT file system, and my first rough
> > >tests (pgbench) show an approximate 20% performance increase over ext2 with
> > >fsync enabled.
> >
> > OK. I slouch corrected :). It's more than 10%.
> >
> > However in the same message I did also say:
> > >What would be useful is if one can specify where the tables, indexes, WAL
> > >and other files go. That feature would probably help improve performance
> > >far more.
> > >
> > >For example: you could then stick the WAL on a battery backed up RAM disk.
> > >How much total space does a WAL log need?
> > >
> > >A battery backed RAM disk might even be cheaper than Brand X RDBMS
> > >Proprietary Feature #5.
> >
> > And your experiments do help show that it is useful to be able to specify
> > where things go, that putting just the WAL somewhere else makes things 20%
> > faster. So you don't have to put everything on a pgfs. Just the WAL on some
> > other FS (even FAT32, ick ;) ).
> 
> So you propose pgwalfs ? ;)

I don't know about a "pgwalfs" too much work. I have had some time to grapple
with my feelings about FAT, and you know what? I don't hate the idea. I would,
of course, like to look through the driver code and see if there are any
technical reasons why it should be excluded.

FAT is almost perfect for WAL, and if I can figure out how to get the "base"
directory to get the same performance, I'd think about putting it there as
well.

The ReiserFS issues touched on some vague suspicions I had about fsync. Maybe
I'm over reacting, but there are reasons why the oracles manage their own table
spaces.

Back to FAT. FAT is probably the most simple file system I can think of. As
long as it writes to disk when it gets synched, and doesn't loose things, its
perfect. Postgres maintains much of the coherency issues, there is no real
problem with permissions because it will be owned by the postgres super user,
etc. I would never suggest FAT as a general purpose file system, but, geez, as
a special purpose single user (postgres) it seems an ideal answer to what will
be an increasingly hard problem of advanced file systems.

Aside from a general, and well deserved, disdain for FAT. What are the
technical "cons" of such a proposal. If we can get the Linux kernel (and other
unices) to accept IOCTLs to direct space allocation, and/or write up a white
paper on how to use this for postgres, why wouldn't it be a reasonable
strategy?



-- 
I'm not offering myself as an example; every life evolves by its own laws.
------------------------
http://www.mohawksoft.com


Re: Re: New Linux xfs/reiser file systems

From
Lincoln Yeoh
Date:
>Lincoln Yeoh wrote:
>> 
>> >Lincoln Yeoh wrote:
>> >For example: you could then stick the WAL on a battery backed up RAM disk.
>> >How much total space does a WAL log need?
>> >
>> >A battery backed RAM disk might even be cheaper than Brand X RDBMS
>> >Proprietary Feature #5.
>> 
>> And your experiments do help show that it is useful to be able to specify
>> where things go, that putting just the WAL somewhere else makes things 20%
>> faster. So you don't have to put everything on a pgfs. Just the WAL on some
>> other FS (even FAT32, ick ;) ).

At 02:04 PM 5/6/01 +0200, Hannu Krosing wrote:
>So you propose pgwalfs ? ;)

Nah. I'm proposing the opposite in fact.

I'm saying so far there appears to be no real need to come up with a
special filesystem. Stick to using existing/future filesystems. Just make
it easy and safe enough for DBA's to put the objects on whatever filesystem
they choose. So long as the O/S kernel/driver people support the hardware
or filesystem, postgresql will take advantage of it with little if any
extra work.

In fact as mlw's experiments show, you can put the WAL on FAT (FAT16?) for
a 20% performance increase. How much better would a raw device be? Would it
really be worth all that hassle? For instance if you need to resize the FAT
partition, you could probably use fips, Partition Magic or some other cost
effective solution - no need for pgsql developers or anybody to reinvent
anything.

My proposed but untested idea is that you could get a significant
performance increase by putting the WAL on popular filesystems running on
battery backed RAM drives (or other special hardware). 128MB RAM should be
enough for small setups? 

Don't know how much these things cost, but I believe that when you need the
speed, they'll be more worthwhile than a special proprietary filesystem.

Ok, just found:
http://www.expressdata.com.au/Products/ProductsList.asp?SUPPLIER_NAME=PLATYP
US+TECHNOLOGY&SUBCATEGORY_NAME=QikDrive2#PRODUCTTITLE

AUD$1,624.70 = USD843.06. Not cheap but not way out of reach. Haven't found
other competing products yet. Must be somewhere.

Cheerio,
Link.



Re: Re: New Linux xfs/reiser file systems

From
Tom Lane
Date:
Lincoln Yeoh <lyeoh@pop.jaring.my> writes:
> OK we can do that with symlinks, but is there a PGSQL Recommended or
> Standard way to do it, so as to reduce administrative errors, and at least
> help improve consistency with multiadmin pgsql installations?

Not yet.  There should be support for this.  See
doc/TODO.detail/tablespaces.
        regards, tom lane


Re: Re: New Linux xfs/reiser file systems

From
Tom Lane
Date:
Hannu Krosing <hannu@tm.ee> writes:
> Even the IMHO hardest to solve problem
> - RENAME - can 
> probably be done in a transaction-safe manner by doing a
> link(oid.<newname>) in the 
> beginning and selective unlink(oid.<newname/oldname>) at commit time.

Nope.  Consider
begin;rename a to b;rename b to a;end;

And don't tell me you'll solve this by ignoring failures from link().
That's a recipe for losing your data...

I would ask people who think they have a solution to please go back and
reread the very long discussions we have had on this point in the past.
Nobody particularly likes numeric filenames, but there really isn't any
other workable answer.
        regards, tom lane


Re: Re: New Linux xfs/reiser file systems

From
Lincoln Yeoh
Date:
At 12:03 PM 5/6/01 -0400, Tom Lane wrote:
>Hannu Krosing <hannu@tm.ee> writes:
>> Even the IMHO hardest to solve problem
>> - RENAME - can 
>> probably be done in a transaction-safe manner by doing a
>> link(oid.<newname>) in the 
>> beginning and selective unlink(oid.<newname/oldname>) at commit time.
>
>Nope.  Consider
>
>    begin;
>    rename a to b;
>    rename b to a;
>    end;
>
>And don't tell me you'll solve this by ignoring failures from link().
>That's a recipe for losing your data...
>
>I would ask people who think they have a solution to please go back and
>reread the very long discussions we have had on this point in the past.
>Nobody particularly likes numeric filenames, but there really isn't any
>other workable answer.

OK. Found one of the discussions at:
http://postgresql.readysetnet.com/mhonarc/pgsql-hackers/2000-03/threads.html
#00088

Conclusion calling stuff oid.relname doesn't really work. Sorry to have
brought it up again.

Another idea that's probably more messy than it's worth: 

Main object still called <oid> with a symlink called <oid.originalrelname>.
DB really just uses <oid>.

Rename= adds symlink called <oid.newrelname>, doesn't remove symlinks
(symlinks more for show!). 

Committed drop table does what 7.1 does with the main oid entry. 

Vacuum cleans up the symlinks leaving just a single valid one or zaps all
if the table has been dropped. 

For windows create empty files named oid.relname instead of symlinks.
Windows will definitely like .verylongrelname extensions ;).

Kinda messy and kludgy. Throw in the performance reduction and Ick! 

I probably have to think harder :), maybe there's just no good way :(. 

Ah well,
Link.



Re: TABLE RENAME/NUMERIC FILENAMES (Was: New Linux xfs/reiser file systems)

From
Hannu Krosing
Date:
Tom Lane wrote:
> 
> Hannu Krosing <hannu@tm.ee> writes:
> > Even the IMHO hardest to solve problem
> > - RENAME - can
> > probably be done in a transaction-safe manner by doing a
> > link(oid.<newname>) in the
> > beginning and selective unlink(oid.<newname/oldname>) at commit time.
> 
> Nope.  Consider
> 
>         begin;
>         rename a to b;
>         rename b to a;
>         end;
> 
> And don't tell me you'll solve this by ignoring failures from link().
> That's a recipe for losing your data...

I guess link() failures can be safely ignored _as long as_ we check that 
we have the right link after doing it. I can't see how it will lose
data.

> I would ask people who think they have a solution to please go back and
> reread the very long discussions we have had on this point in the past.

I think I have now (No way to guarantee I have read _everything_ about
it, 
but I did hit about ~10 messages on oid_relname naming scheme).

the most serious objection seemed to be that we need to remember the 
postgres tablename while it would be much easier to use only oids .

I guess we could hit some system limits here (running out of directory 
entries or reaching the maximum number of links to a file) but at least
on 
linux i was able to make >10000 links to one file with no problems.

now that i think of it I have one concern  - it would require extra work 
to use tablenames like "/etc/passwd" or others that use characters that
are 
reserved in filenames which are ok to use in 7.1.

hannu=# create table "/etc/passwd"(
hannu(#   login text,
hannu(#   uid int,
hannu(#   gid int
hannu(# );
CREATE
hannu=# \dt     List of relations   Name     | Type  | Owner 
-------------+-------+-------/etc/passwd | table | hannu

So if people start using names like these it will not be easy to go back
;)

> Nobody particularly likes numeric filenames, but there really isn't any
> other workable answer.

At least we could put links on system relations, so it would be 
easier to find them. 

I guess one is not supposed to rename/drop system tables ?

---------------------
Hannu


Re: Re: New Linux xfs/reiser file systems

From
"Joe Conway"
Date:
> > Before we get too involved in speculating, shouldn't we actually measure
the
> > performance of 7.1 on XFS and Reiserfs?  Since it's easy to disable
fsync,
> > we can test whether that's the problem.  I don't think that logging file
> > systems must intrinsically give bad performance on fsync since they only
log
> > metadata changes.
> >
> > I don't have a machine with XFS installed and it will be at least a week
> > before I could get around to a build.  Any volunteers?
>
> There have been multiple reports of poor PostgreSQL performance on
> Reiser and xfs.  I don't have numbers, though.  Frankly, I think we need
> xfs and reiser experts involved to figure out our options here.

I've done some testing to see how Reiserfs performs
vs ext2, and also various for various values of wal_sync_method while on a
reiserfs partition. The attached graph shows the results. The y axis is
transactions per second and the x axis is the transaction number. It was
clear that, at least for my specific app, ext2 was significantly faster.

The hardware I tested on has an Athalon 1 Ghz cpu and 512 MB ram. The
harddrive is a 2 year old IDE drive. I'm running Red Hat 7 with all the
latest updates, and a freshly compiled 2.4.2 kernel with the latest Reiserfs
patch, and of course PostgreSQL 7.1. The transactions were run in a loop,
700 times per test, to insert sample data into 4 tables. I used a PHP script
running on the same machine to do the inserts.

I'd be happy to provide more detail or try a different variation if anyone
is interested.

- Joe




Re: New Linux xfs/reiser file systems

From
"Stephen C. Tweedie"
Date:
Hi,

On Fri, May 04, 2001 at 01:49:54PM -0400, Bruce Momjian wrote:
> > 
> > Performance doing what?  XFS has known performance problems doing
> > unlinks and truncates, but not synchronous IO.  The user should be
> > using fdatasync() for databases, btw, not fsync().
> 
> This is hugely helpful.  In PostgreSQL 7.1, we do use fdatasync() by
> default it is available on a platform.

Good --- fdatasync is defined in SingleUnix, so it's probably safe to
probe for it and use it by default if it is there.

The 2.2 Linux kernel does not have fdatasync implemented, but glibc
will fall back to fsync if that's all that the kernel supports.  2.4
implements both with the required semantics.

--Stephen


Re: Re: New Linux xfs/reiser file systems

From
teg@redhat.com (Trond Eivind Glomsrød)
Date:
teg@redhat.com (Trond Eivind Glomsrød) writes:

> "Ken Hirsch" <kenhirsch@myself.com> writes:
> 
> > I don't have a machine with XFS installed and it will be at least a week
> > before I could get around to a build.  Any volunteers?
> 
> I think I could do that... any useful benchmarks to run?

In lack of bigger benchmarks, I tried postgresql 7.1 on a Red Hat
Linux 7.1 system with the SGI XFS modifications. The differences were
very small.




-- 
Trond Eivind Glomsrød
Red Hat, Inc.

Re: Re: New Linux xfs/reiser file systems

From
Bruce Momjian
Date:
> teg@redhat.com (Trond Eivind Glomsr?d) writes:
> 
> > "Ken Hirsch" <kenhirsch@myself.com> writes:
> > 
> > > I don't have a machine with XFS installed and it will be at least a week
> > > before I could get around to a build.  Any volunteers?
> > 
> > I think I could do that... any useful benchmarks to run?
> 
> In lack of bigger benchmarks, I tried postgresql 7.1 on a Red Hat
> Linux 7.1 system with the SGI XFS modifications. The differences were
> very small.
> 

Thanks.  That is very helpful.  Seems XFS is fine.  According to Joe
Conway, reiser has some problems.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: New Linux xfs/reiser file systems

From
"Joe Conway"
Date:
> I don't mind contributing the script and schema that I used, but one thing
I
> failed to mention in my first post is that the first thing the script does
> is open connections to 256 databases (all on this same machine), and the
> transactions are relatively evenly dispersed among the 256 connections.
The
> test was originally written to try out an idea to allow scalability by
> partitioning the data into seperate databases (which could eventually each
> live on its own server). If you are interested I can modify the test to
use
> only one database and rerun the same tests this weekend.
>

I modified my test script to use just one (instead of 256) databases to be
more representative of a common installation. Then I ran more tests under
both ext2 and reiserfs. The summary follows. Short answer is that the
differences are much smaller than under the first test, but ext2 is still
faster.

-- Joe

case             rfs_fdatasync    ext_fdatasync    rfs_fdatasync
ext_fdatasync    rfs_fdatasync    ext_fdatasync
fstab            sync,noatime     sync,noatime     noatime          noatime
defaults         defaults
starting # tup   70k              70k              70k              70k
70k              70k
total time (min) 12.10            11.77            11.83            11.43
11.88            11.42
cpu util %       90-94%           95-98%           90-95%           95-99%
90-95%           95-99%
ram - stable cpu 42M              42M              42M              42M
42M              42M
ram - final      52M              52M              52M              52M
52M              52M
avg trans/sec
10000 tup        13.77            14.16            14.08            14.58
14.03            14.60
5000 tup         13.70            14.08            13.97            14.71
13.93            14.75
1000 tup         11.36            11.63            11.63            13.33
11.63            13.51


Notes:
1. rfs_fdatasync: data and wal on rieserfs with wal_sync_method = fdatasync

2. ext_fdatasync: data and wal on ext2 with wal_sync_method = fdatasync

3. starting # tup: the database was pre-seeded with 70k tuples. I made a
tarball of the starting database and refreshed the pgsql/data filestructure
before each test to ensure a good comparison.

4. cpu utilization + ram - stable cpu + ram - final: I eyeballed top while
the test was running. In general cpu % increased steadily through the first
1500 or so transactions, along with ram usage. At the point when cpu
utilization stabilized, ram was pretty consistently at 42M. From there, cpu
util % varied in the ranges noted, while ram usage slowly increased to 52M.
It seemed pretty linear in that I could estimate the number of transactions
already processes based on ram usage.

5. avg trans/sec: These represent the total transactions/total elapsed time
at the given number of transactions (as opposed to some instantaneous value
at that point in time).




Re: Re: New Linux xfs/reiser file systems

From
teg@redhat.com (Trond Eivind Glomsrød)
Date:
teg@redhat.com (Trond Eivind Glomsrød) writes:

> teg@redhat.com (Trond Eivind Glomsrød) writes:
>
> > "Ken Hirsch" <kenhirsch@myself.com> writes:
> >
> > > I don't have a machine with XFS installed and it will be at least a week
> > > before I could get around to a build.  Any volunteers?
> >
> > I think I could do that... any useful benchmarks to run?
>
> In lack of bigger benchmarks, I tried postgresql 7.1 on a Red Hat
> Linux 7.1 system with the SGI XFS modifications. The differences were
> very small.

And here is the one for ReiserFS - same kernel, but recompiled to turn
off debugging


When compared to the earlier ones (including XFS), you'll note that ReiserFS
performance is rather poor in some of the tests  - it takes 37 vs. 13
seconds for 8192 inserts, when the inserts are different transactions.
--
Trond Eivind Glomsrød
Red Hat, Inc.

Attachment

Re: Re: New Linux xfs/reiser file systems

From
Bruce Momjian
Date:
> 
> When compared to the earlier ones (including XFS), you'll note that ReiserFS
> performance is rather poor in some of the tests  - it takes 37 vs. 13
> seconds for 8192 inserts, when the inserts are different transactions.

That is all the fsync delay, probably, and it should be using fdatasync()
on that kernel.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Re: New Linux xfs/reiser file systems

From
teg@redhat.com (Trond Eivind Glomsrød)
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:

> > 
> > When compared to the earlier ones (including XFS), you'll note that ReiserFS
> > performance is rather poor in some of the tests  - it takes 37 vs. 13
> > seconds for 8192 inserts, when the inserts are different transactions.
> 
> That is all the fsync delay, probably, and it should be using fdatasync()
> on that kernel.

And it does seem to work that way with XFS...

-- 
Trond Eivind Glomsrød
Red Hat, Inc.


Re: Re: New Linux xfs/reiser file systems

From
Martín Marqués
Date:
Quoting Trond Eivind Glomsrød <teg@redhat.com>:

> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> 
> > > 
> > > When compared to the earlier ones (including XFS), you'll note that
> ReiserFS
> > > performance is rather poor in some of the tests  - it takes 37 vs. 13
> > > seconds for 8192 inserts, when the inserts are different transactions.
> > 
> > That is all the fsync delay, probably, and it should be using fdatasync()
> > on that kernel.
> 
> And it does seem to work that way with XFS...

I'm concearned about this because we are going to switch our fist server to a
Journaling FS (on Linux).
Searching and asking I found out that for our short term work we need ReiserFS
(it's for a proxy server).
Put the interesting thing was that for large (very large) files, everybody
recomends XFS.
The drawback of XFS is that it's very, very sloooow when deleting files.

Saludos... :-)

-- 
El mejor sistema operativo es aquel que te da de comer.
Cuida tu dieta.
-----------------------------------------------------------------
Martin Marques                  |        mmarques@unl.edu.ar
Programador, Administrador      |       Centro de Telematica                      Universidad Nacional
        del Litoral
 
-----------------------------------------------------------------


Re: Re: New Linux xfs/reiser file systems

From
Bruce Momjian
Date:
> I'm concearned about this because we are going to switch our
> fist server to a Journaling FS (on Linux).  Searching and asking
> I found out that for our short term work we need ReiserFS (it's
> for a proxy server).  Put the interesting thing was that for
> large (very large) files, everybody recomends XFS.  The drawback
> of XFS is that it's very, very sloooow when deleting files.

Why do all these file systems seem to have one major negative?

-- Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Re: New Linux xfs/reiser file systems

From
Martín Marqués
Date:
Quoting Bruce Momjian <pgman@candle.pha.pa.us>:

> > I'm concearned about this because we are going to switch our
> > fist server to a Journaling FS (on Linux).  Searching and asking
> > I found out that for our short term work we need ReiserFS (it's
> > for a proxy server).  Put the interesting thing was that for
> > large (very large) files, everybody recomends XFS.  The drawback
> > of XFS is that it's very, very sloooow when deleting files.
> 
> Why do all these file systems seem to have one major negative?

In the case of XFS they told me that it was slow deleting, but I guess that they
were trying to tell me that reiser would do the job on a proxy cache better then
XFS.
Everybody put there thumbs-up to XFS when talking about databases (because of
the large file size).

Saludos... :-)

-- 
El mejor sistema operativo es aquel que te da de comer.
Cuida tu dieta.
-----------------------------------------------------------------
Martin Marques                  |        mmarques@unl.edu.ar
Programador, Administrador      |       Centro de Telematica                      Universidad Nacional
        del Litoral
 
-----------------------------------------------------------------


Re: Re: New Linux xfs/reiser file systems

From
"Rod Taylor"
Date:
Makes it more fun :)  Kinda like a lottery ticket:

- reliable (cherry)
- fast (cherry)
- resource hog (lemon)
--
Rod Taylor  BarChord Entertainment Inc.
----- Original Message -----
From: "Bruce Momjian" <pgman@candle.pha.pa.us>
To: "Martín Marqués" <martin@bugs.unl.edu.ar>
Cc: "Trond Eivind Glomsrød" <teg@redhat.com>;
<pgsql-hackers@postgresql.org>
Sent: Wednesday, May 09, 2001 1:24 PM
Subject: Re: [HACKERS] Re: New Linux xfs/reiser file systems


> > I'm concearned about this because we are going to switch our
> > fist server to a Journaling FS (on Linux).  Searching and asking
> > I found out that for our short term work we need ReiserFS (it's
> > for a proxy server).  Put the interesting thing was that for
> > large (very large) files, everybody recomends XFS.  The drawback
> > of XFS is that it's very, very sloooow when deleting files.
>
> Why do all these file systems seem to have one major negative?
>
> --
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 853-3000
>   +  If your life is a hard drive,     |  830 Blythe Avenue
>   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania
19026
>
> ---------------------------(end of
broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>



Re: New Linux xfs/reiser file systems

From
Yaacov Akiba Slama
Date:
Hello !
I am forwarding the following from lkml

It seems that the only case when XFS is slow is the 'rm -rf linux' 
[which can be considered as a good sign for linux]. For all other 
operation XFS is the winner.

YAS

<MessageFromLKML>
From: Ricardo Galli (gallir@uib.es)
Date: Wed May 09 2001 - 20:45:46 EDT

* Next message: clameter@lameter.com: "USB broken in 2.4.4? Serial 
Ricochet works, USB performance sucks."
    * Previous message: AmigaLinux A2232 Driver Project : "New Amiga 
Driver"    * Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> It would be great to see a table of ReiserFS/XFS/Ext2+index performance> results. Well, to make it really fair it
shouldbe Ext3+index so I'd> better add 'backport the patch to 2.2' or 'bug Stephen and friends to> hurry up' to my
to-dolist.
 

You can find a simple benchmark (an average of three samples) among reiser,
ext2, xfs and fat32 under Linux:

http://bulma.lug.net/body.phtml?nIdNoticia=626

Although is Spanish, the tables are easy to understand.

The benchmark was carried up by Guillem Cantallops, student of the
University of Balearics Islands and member or the local LUG...

BASIC WORDS ;-)
Escritura: Writing
Lectura: Reading
Borrado: Deletion
Copia: Copy
Extracción: Extraction

Regards,

--ricardo
http://m3d.uib.es/~gallir/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
</MessageFromLKML>


Bruce Momjian wrote:

>>I'm concearned about this because we are going to switch our
>>fist server to a Journaling FS (on Linux).  Searching and asking
>>I found out that for our short term work we need ReiserFS (it's
>>for a proxy server).  Put the interesting thing was that for
>>large (very large) files, everybody recomends XFS.  The drawback
>>of XFS is that it's very, very sloooow when deleting files.
>>
> 
> Why do all these file systems seem to have one major negative?
> 
> --
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 853-3000
>   +  If your life is a hard drive,     |  830 Blythe Avenue
>   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>