Thread: ext3 filesystem / linux 7.3

ext3 filesystem / linux 7.3

From
Shankar K
Date:
hi there,

I was reading bruce's 'postgresql hardware performance
tuning' article and he has suggested ext3 filesystem
with data mode = writeback for high performance.

I would really appreciate if anyone could share your
experiences with ext3 from a production stand point or
any other suggestions for best read/write performance.

Our applications is an hybrid of heavy inserts/updates
and DSS queries.

version - postgres 7.3.2
hardware - raid 5 (5 x 73 g hardware raid), 4g ram, 2
* 2.8 GHz cpu, redhat 7.3

Note : we don't have the luxury of raid 1+0 (dedicated
disks) for xlog and clog files to start with but may
be down the line we might look into those options, but
for now i've planned on having them on local drives
rather than raid 5.

thanks for any inputs,
Shankar






__________________________________________________
Do you Yahoo!?
Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
http://platinum.yahoo.com


Re: ext3 filesystem / linux 7.3

From
"Jeffrey D. Brower"
Date:
What is the URL of that article?  I understood that ext2 was faster with PG
and so I went to a lot of trouble of creating an ext2 partition just for PG
and gave up the journalling to do that.  Something about double effort since
PG already does a lot of that.

Bruce, is there a final determination of which is faster/safer?

    Jeff

----- Original Message -----
From: "Shankar K" <shan0075@yahoo.com>
To: <pgsql-performance@postgresql.org>
Sent: Monday, March 31, 2003 3:55 PM
Subject: [PERFORM] ext3 filesystem / linux 7.3


> hi there,
>
> I was reading bruce's 'postgresql hardware performance
> tuning' article and he has suggested ext3 filesystem
> with data mode = writeback for high performance.
>
> I would really appreciate if anyone could share your
> experiences with ext3 from a production stand point or
> any other suggestions for best read/write performance.
>
> Our applications is an hybrid of heavy inserts/updates
> and DSS queries.
>
> version - postgres 7.3.2
> hardware - raid 5 (5 x 73 g hardware raid), 4g ram, 2
> * 2.8 GHz cpu, redhat 7.3
>
> Note : we don't have the luxury of raid 1+0 (dedicated
> disks) for xlog and clog files to start with but may
> be down the line we might look into those options, but
> for now i've planned on having them on local drives
> rather than raid 5.
>
> thanks for any inputs,
> Shankar
>
>
>
>
>
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
> http://platinum.yahoo.com
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org


Re: ext3 filesystem / linux 7.3

From
Shankar K
Date:
hi jeff,

go to
http://www.ca.postgresql.org/docs/momjian/hw_performance/
under 'filesystems' slide.

snip

File system choice is particularly difficult on Linux
because there are so many file system choices, and
none of them are optimal: ext2 is not entirely
crash-safe, ext3, XFS, and JFS are journal-based, and
Reiser is optimized for small files and does
journalling. The journalling file systems can be
significantly slower than ext2 but when crash recovery
is required, ext2 isn't an option. If ext2 must be
used, mount it with sync enabled. Some people
recommend XFS or an ext3 filesystem mounted with
data=writeback.

/snip

--- "Jeffrey D. Brower" <jeff@pointhere.net> wrote:
> What is the URL of that article?  I understood that
> ext2 was faster with PG
> and so I went to a lot of trouble of creating an
> ext2 partition just for PG
> and gave up the journalling to do that.  Something
> about double effort since
> PG already does a lot of that.
>
> Bruce, is there a final determination of which is
> faster/safer?
>
>     Jeff
>
> ----- Original Message -----
> From: "Shankar K" <shan0075@yahoo.com>
> To: <pgsql-performance@postgresql.org>
> Sent: Monday, March 31, 2003 3:55 PM
> Subject: [PERFORM] ext3 filesystem / linux 7.3
>
>
> > hi there,
> >
> > I was reading bruce's 'postgresql hardware
> performance
> > tuning' article and he has suggested ext3
> filesystem
> > with data mode = writeback for high performance.
> >
> > I would really appreciate if anyone could share
> your
> > experiences with ext3 from a production stand
> point or
> > any other suggestions for best read/write
> performance.
> >
> > Our applications is an hybrid of heavy
> inserts/updates
> > and DSS queries.
> >
> > version - postgres 7.3.2
> > hardware - raid 5 (5 x 73 g hardware raid), 4g
> ram, 2
> > * 2.8 GHz cpu, redhat 7.3
> >
> > Note : we don't have the luxury of raid 1+0
> (dedicated
> > disks) for xlog and clog files to start with but
> may
> > be down the line we might look into those options,
> but
> > for now i've planned on having them on local
> drives
> > rather than raid 5.
> >
> > thanks for any inputs,
> > Shankar
> >
> >
> >
> >
> >
> >
> > __________________________________________________
> > Do you Yahoo!?
> > Yahoo! Platinum - Watch CBS' NCAA March Madness,
> live on your desktop!
> > http://platinum.yahoo.com
> >
> >
> > ---------------------------(end of
> broadcast)---------------------------
> > TIP 1: subscribe and unsubscribe commands go to
> majordomo@postgresql.org
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster


__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://platinum.yahoo.com


Re: ext3 filesystem / linux 7.3

From
eric soroos
Date:
On Tue, 1 Apr 2003 09:39:17 -0800 (PST) in message <20030401173917.19476.qmail@web21101.mail.yahoo.com>, Shankar K
<shan0075@yahoo.com>wrote: 
> hi jeff,
>
> go to
> http://www.ca.postgresql.org/docs/momjian/hw_performance/
> under 'filesystems' slide.
>

I suspect that is what he's seen.

From my experience, ext3 is only a percent or two slower than ext2 under pg_bench. It saves an amazing amount of time
onstartup after a failure by not having to fsck to confirm that the filesystem is in a consistent state.  

I believe that ext3 is a metadata journaling system, and not a data journaling system. This would indicate that the PG
transactioningis complimentary to the filesystem journaling, not duplication.  

eric


Re: ext3 filesystem / linux 7.3

From
Bruce Momjian
Date:
I have heard XFS with the mount option is fastest.

---------------------------------------------------------------------------

Shankar K wrote:
> hi jeff,
>
> go to
> http://www.ca.postgresql.org/docs/momjian/hw_performance/
> under 'filesystems' slide.
>
> snip
>
> File system choice is particularly difficult on Linux
> because there are so many file system choices, and
> none of them are optimal: ext2 is not entirely
> crash-safe, ext3, XFS, and JFS are journal-based, and
> Reiser is optimized for small files and does
> journalling. The journalling file systems can be
> significantly slower than ext2 but when crash recovery
> is required, ext2 isn't an option. If ext2 must be
> used, mount it with sync enabled. Some people
> recommend XFS or an ext3 filesystem mounted with
> data=writeback.
>
> /snip
>
> --- "Jeffrey D. Brower" <jeff@pointhere.net> wrote:
> > What is the URL of that article?  I understood that
> > ext2 was faster with PG
> > and so I went to a lot of trouble of creating an
> > ext2 partition just for PG
> > and gave up the journalling to do that.  Something
> > about double effort since
> > PG already does a lot of that.
> >
> > Bruce, is there a final determination of which is
> > faster/safer?
> >
> >     Jeff
> >
> > ----- Original Message -----
> > From: "Shankar K" <shan0075@yahoo.com>
> > To: <pgsql-performance@postgresql.org>
> > Sent: Monday, March 31, 2003 3:55 PM
> > Subject: [PERFORM] ext3 filesystem / linux 7.3
> >
> >
> > > hi there,
> > >
> > > I was reading bruce's 'postgresql hardware
> > performance
> > > tuning' article and he has suggested ext3
> > filesystem
> > > with data mode = writeback for high performance.
> > >
> > > I would really appreciate if anyone could share
> > your
> > > experiences with ext3 from a production stand
> > point or
> > > any other suggestions for best read/write
> > performance.
> > >
> > > Our applications is an hybrid of heavy
> > inserts/updates
> > > and DSS queries.
> > >
> > > version - postgres 7.3.2
> > > hardware - raid 5 (5 x 73 g hardware raid), 4g
> > ram, 2
> > > * 2.8 GHz cpu, redhat 7.3
> > >
> > > Note : we don't have the luxury of raid 1+0
> > (dedicated
> > > disks) for xlog and clog files to start with but
> > may
> > > be down the line we might look into those options,
> > but
> > > for now i've planned on having them on local
> > drives
> > > rather than raid 5.
> > >
> > > thanks for any inputs,
> > > Shankar
> > >
> > >
> > >
> > >
> > >
> > >
> > > __________________________________________________
> > > Do you Yahoo!?
> > > Yahoo! Platinum - Watch CBS' NCAA March Madness,
> > live on your desktop!
> > > http://platinum.yahoo.com
> > >
> > >
> > > ---------------------------(end of
> > broadcast)---------------------------
> > > TIP 1: subscribe and unsubscribe commands go to
> > majordomo@postgresql.org
> >
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 4: Don't 'kill -9' the postmaster
>
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Tax Center - File online, calculators, forms, and more
> http://platinum.yahoo.com
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073


Re: ext3 filesystem / linux 7.3

From
Andrew Sullivan
Date:
On Tue, Apr 01, 2003 at 12:33:15PM -0500, Jeffrey D. Brower wrote:
> What is the URL of that article?  I understood that ext2 was faster with PG
> and so I went to a lot of trouble of creating an ext2 partition just for PG
> and gave up the journalling to do that.  Something about double effort since
> PG already does a lot of that.

I don't know how ext3 could be faster than ext2, since it has to do
more work.

But ext2 is not crash-safe.  So your data could well be hosed if you
come back from a crash on ext2.

Actually, I have my doubts about _any_ of the journaling filesystems
for Linux: ext3 has a reputation for being slow if you journal in the
real-safe mode, and there have been so many unrepeatable reiserfs
problem reports that I'm loathe to use it for real systems.  I had
exceptionally good experiences with xfs when I was admining SGI
boxes, but that's not part of the standard Linux kernel distribution,
and with no idea why, I think my managers would get grumpy with me
for using it.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: ext3 filesystem / linux 7.3

From
Bruce Momjian
Date:
eric soroos wrote:
> On Tue, 1 Apr 2003 09:39:17 -0800 (PST) in message
> <20030401173917.19476.qmail@web21101.mail.yahoo.com>, Shankar
> K <shan0075@yahoo.com> wrote:
> > hi jeff,
> >
> > go to
> > http://www.ca.postgresql.org/docs/momjian/hw_performance/
> > under 'filesystems' slide.
> >
>
> I suspect that is what he's seen.
>
> >From my experience, ext3 is only a percent or two slower than ext2 under pg_bench. It saves an amazing amount of
timeon startup after a failure by not having to fsck to confirm that the filesystem is in a consistent state. 
>
> I believe that ext3 is a metadata journaling system, and not a
> data journaling system. This would indicate that the PG
> transactioning is complimentary to the filesystem journaling,
> not duplication.

Ext3 is only metadata journaling if you set the mount flags as
described.  I also don't think pgbench is the best test for testing file
system performance.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073


Re: ext3 filesystem / linux 7.3

From
"Jeffrey D. Brower"
Date:
OK so am I hearing:

XFS is the fastest (but is it the safest?) but does not come on Linux.

Ext2 does less work than Ext3 so is fastest among what DOES come with
Linux - but if you have a crash that fsck can't fix you're hosed.

Ext3 is quite a bit slower if set to be real safe, a wee bit slower if run
with standard options which makes it more crash-safe, and much slower if the
mount flags are set to metadata journaling but that is much safer as a file
system because the metadata journaling is complementary to the PG
transactioning.

To determine which you want you must choose which one feels to you like the
right balance of speed and the setup work you are willing to perform and
maintain.

Do I have it right?

   Jeff


Re: ext3 filesystem / linux 7.3

From
"Keith Bottner"
Date:
FYI, I believe that XFS will be included in the 2.6 kernel.

Keith Bottner
kbottner@istation.com

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Andrew
Sullivan
Sent: Tuesday, April 01, 2003 11:55 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] ext3 filesystem / linux 7.3


On Tue, Apr 01, 2003 at 12:33:15PM -0500, Jeffrey D. Brower wrote:
> What is the URL of that article?  I understood that ext2 was faster
> with PG and so I went to a lot of trouble of creating an ext2
> partition just for PG and gave up the journalling to do that.
> Something about double effort since PG already does a lot of that.

I don't know how ext3 could be faster than ext2, since it has to do more
work.

But ext2 is not crash-safe.  So your data could well be hosed if you
come back from a crash on ext2.

Actually, I have my doubts about _any_ of the journaling filesystems for
Linux: ext3 has a reputation for being slow if you journal in the
real-safe mode, and there have been so many unrepeatable reiserfs
problem reports that I'm loathe to use it for real systems.  I had
exceptionally good experiences with xfs when I was admining SGI boxes,
but that's not part of the standard Linux kernel distribution, and with
no idea why, I think my managers would get grumpy with me for using it.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org


Re: ext3 filesystem / linux 7.3

From
"Christopher Kings-Lynne"
Date:
Just switch to FreeBSD and use UFS ;)

Chris

----- Original Message ----- 
From: "Jeffrey D. Brower" <jeff@pointhere.net>
To: "Bruce Momjian" <pgman@candle.pha.pa.us>; "eric soroos" <eric-psql@soroos.net>
Cc: "Shankar K" <shan0075@yahoo.com>; <pgsql-performance@postgresql.org>
Sent: Wednesday, April 02, 2003 4:42 AM
Subject: Re: [PERFORM] ext3 filesystem / linux 7.3


> OK so am I hearing:
> 
> XFS is the fastest (but is it the safest?) but does not come on Linux.
> 
> Ext2 does less work than Ext3 so is fastest among what DOES come with
> Linux - but if you have a crash that fsck can't fix you're hosed.
> 
> Ext3 is quite a bit slower if set to be real safe, a wee bit slower if run
> with standard options which makes it more crash-safe, and much slower if the
> mount flags are set to metadata journaling but that is much safer as a file
> system because the metadata journaling is complementary to the PG
> transactioning.
> 
> To determine which you want you must choose which one feels to you like the
> right balance of speed and the setup work you are willing to perform and
> maintain.
> 
> Do I have it right?
> 
>    Jeff
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>

Re: ext3 filesystem / linux 7.3

From
Sean Chittenden
Date:
> Just switch to FreeBSD and use UFS ;)

I must say, I found this whole discussion rather amusing on the
sidelines given it's largely a non-problem for non-Linux users.  :)

"Better performance through engineering elegance."

-sc

--
Sean Chittenden
seanc@FreeBSD.org


Re: ext3 filesystem / linux 7.3

From
Shridhar Daithankar
Date:
On Wednesday 02 April 2003 07:19, you wrote:
> > Just switch to FreeBSD and use UFS ;)
>
> I must say, I found this whole discussion rather amusing on the
> sidelines given it's largely a non-problem for non-Linux users.  :)
>
> "Better performance through engineering elegance."

Well, this may sound like a troll, but I have said this before and will say
that again. I found reiserfs to be faster than ext2, upto 40% at times when
we tried a quasi closed source benchmark on a quad xeon machine with SCSI
RAID.

Everything else being same and defaults used out of box, reiserfs on mandrake9
was far faster in every respect than ext2.

I personally find freeBSD UFS to be a better combo based on my workstation
tests. I believe freeBSD has a better IO scheuler that utilises disk
bandwidth in optimal manner. Scratching (my poor IDE) disk like mad does not
happen with freeBSD but linux does it plenty. But I didn't benchmark it for
throughput..

 Shridhar


Re: ext3 filesystem / linux 7.3

From
Andreas Kostyrka
Date:
On Tue, 2003-04-01 at 19:53, eric soroos wrote:
> On Tue, 1 Apr 2003 09:39:17 -0800 (PST) in message <20030401173917.19476.qmail@web21101.mail.yahoo.com>, Shankar K
<shan0075@yahoo.com>wrote: 
> > hi jeff,
> >
> > go to
> > http://www.ca.postgresql.org/docs/momjian/hw_performance/
> > under 'filesystems' slide.
> >
>
> I suspect that is what he's seen.
>
> >From my experience, ext3 is only a percent or two slower than ext2 under pg_bench. It saves an amazing amount of
timeon startup after a failure by not having to fsck to confirm that the filesystem is in a consistent state.  
>
> I believe that ext3 is a metadata journaling system, and not a data journaling system. This would indicate that the
PGtransactioning is complimentary to the filesystem journaling, not duplication.  
It's both. See the -o data=journal|data=ordered|data=writeback mount
time option.

Andreas

Attachment

Re: ext3 filesystem / linux 7.3

From
Andreas Kostyrka
Date:
On Tue, 2003-04-01 at 19:55, Andrew Sullivan wrote:
> I don't know how ext3 could be faster than ext2, since it has to do
> more work.
Depending upon certain parameters, it can be faster, because it writes
the data to the journal serially without head movement. The kernel might
be able to write that data in it spot later when the hdd would be idle.

So yes, in certain cases, ext3 might be faster than ext2.

>
> Actually, I have my doubts about _any_ of the journaling filesystems
> for Linux: ext3 has a reputation for being slow if you journal in the
Well, journaled filesystem usually means only meta-data journaling. ext3
is the only LinuxFS (AFAIK) that offers a fully journaled fs.
> real-safe mode, and there have been so many unrepeatable reiserfs
> problem reports that I'm loathe to use it for real systems.  I had
Well, I've been using ReiserFS now for years, and never had any problems
with it.

Andreas
--
Andreas Kostyrka
Josef-Mayer-Strasse 5
83043 Bad Aibling

Attachment

Re: ext3 filesystem / linux 7.3

From
"Jeffrey D. Brower"
Date:
... and what *exactly* is the difference?


Re: ext3 filesystem / linux 7.3

From
Andreas Kostyrka
Date:
On Wed, 2003-04-02 at 17:37, Jeffrey D. Brower wrote:
> ... and what *exactly* is the difference?
Between what? (how about a bit more context?)

Andreas
--
Andreas Kostyrka
Josef-Mayer-Strasse 5
83043 Bad Aibling

Attachment

Re: ext3 filesystem / linux 7.3

From
Andrew Sullivan
Date:
On Wed, Apr 02, 2003 at 05:18:26PM +0200, Andreas Kostyrka wrote:

> Well, I've been using ReiserFS now for years, and never had any problems
> with it.

Me too.  But the "known failure modes" that people keep reporting
about have to do with completely trashing, say, a whole page of data.
Your directories are fine, but the data is all hosed.

I've never had it happen.  I've never seen anyone who can
consistently reproduce it.  But I've certainly read about it often
enough to have pretty serious reservations about relying on the
filesystem for data I can't afford to lose.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: ext3 filesystem / linux 7.3

From
Andreas Pflug
Date:
Are there any comments on JFS regarding real-life safety and speed?


Re: ext3 filesystem / linux 7.3

From
"Jeffrey D. Brower"
Date:
>> This would indicate that the PG transactioning is complimentary to the
filesystem journaling, not duplication.

>It's both. See the -o data=journal|data=ordered|data=writeback mount
>time option.

I did a RTFM on that but I am now confused again.

I am wondering what the *best* setting is with ext3.  When I RTFM  the man
page for mount, the data=writeback option says plainly that it is fastest
but in a crash old data is quite possibly on the dataset.  The safest
*looks* to be data=journal since the journaling happens before writes are
committed to the file (and presumably the journal is used to update the file
on the disk to apply the journal entry to the disk file?) and the default is
data=ordered which says write to the disk AND THEN to the journal (which
seems bizarre to me).

How all of that works WITH and/or AGAINST PostgreSQL and what metadata
REALLY means is my bottom line quandary.  Obviously that is where finding
the warm and fuzzy place between speed and safety is found.

     Jeff


Re: ext3 filesystem / linux 7.3

From
Josh Berkus
Date:
Jeff,

> How all of that works WITH and/or AGAINST PostgreSQL and what metadata
> REALLY means is my bottom line quandary.  Obviously that is where finding
> the warm and fuzzy place between speed and safety is found.

For your $PGDATA directory, your only need for filesystem journaling is to
prevent a painful fsck process on an unexpected power-out.  You are not, as a
rule, terribly concerned with journaling the data as PostgreSQL already
provides some data recovery protection through WAL.

As a result, on my one server where I have to use Ext3 (I use Reiser on most
machines, and have never had a problem except for one disaster when upgrading
Reiser versions), the $PGDATA is mounted "noatime,data=writeback"

(BTW, I found that combining "data=writeback" with Linux LVM on RedHat 8.0
resulted in system-fatal mounting errors.   Anyone else have this problem?)

Of course, if you have a machine with a $60,000 disk array and disk I/O is
unlimited, then maybe you want to enable data=journal just for the protection
against corruption of the WAL and clog files.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: ext3 filesystem / linux 7.3

From
"Jeffrey D. Brower"
Date:
Thanks for that Josh.

I had previously understood that ext3 was a bad thing with PostgreSQL and I
went way above and beyond to create it on an Ext2 filesystem (the only one
on the server) and mount that.

Should I undo that work and go back to Ext3?

   Jeff

----- Original Message -----
From: "Josh Berkus" <josh@agliodbs.com>
To: "Jeffrey D. Brower" <jeff@pointhere.net>; "Andreas Kostyrka"
<andreas@mtg.co.at>
Cc: "Bruce Momjian" <pgman@candle.pha.pa.us>;
<pgsql-performance@postgresql.org>; "Shankar K" <shan0075@yahoo.com>; "eric
soroos" <eric-psql@soroos.net>
Sent: Wednesday, April 02, 2003 3:05 PM
Subject: Re: [PERFORM] ext3 filesystem / linux 7.3


> Jeff,
>
> > How all of that works WITH and/or AGAINST PostgreSQL and what metadata
> > REALLY means is my bottom line quandary.  Obviously that is where
finding
> > the warm and fuzzy place between speed and safety is found.
>
> For your $PGDATA directory, your only need for filesystem journaling is to
> prevent a painful fsck process on an unexpected power-out.  You are not,
as a
> rule, terribly concerned with journaling the data as PostgreSQL already
> provides some data recovery protection through WAL.
>
> As a result, on my one server where I have to use Ext3 (I use Reiser on
most
> machines, and have never had a problem except for one disaster when
upgrading
> Reiser versions), the $PGDATA is mounted "noatime,data=writeback"
>
> (BTW, I found that combining "data=writeback" with Linux LVM on RedHat 8.0
> resulted in system-fatal mounting errors.   Anyone else have this
problem?)
>
> Of course, if you have a machine with a $60,000 disk array and disk I/O is
> unlimited, then maybe you want to enable data=journal just for the
protection
> against corruption of the WAL and clog files.
>
> --
> -Josh Berkus
>  Aglio Database Solutions
>  San Francisco


Re: ext3 filesystem / linux 7.3

From
Josh Berkus
Date:
Jeff,

> Thanks for that Josh.

Welcome

> I had previously understood that ext3 was a bad thing with PostgreSQL and I
> went way above and beyond to create it on an Ext2 filesystem (the only one
> on the server) and mount that.
>
> Should I undo that work and go back to Ext3?

I would.  Not necessarily Ext3, mind you; you might want to consider Reiser or
JFS, too.  My experience has been better with Reiser than Ext3 with Postgres,
but I can't back that up with any statistics.

(DISCLAIMER:  This is not professional advice, and comes with no warranty.  If
you want professional advice, pay me.)

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: ext3 filesystem / linux 7.3

From
Chris Hedemark
Date:
On Tuesday, April 1, 2003, at 03:42 PM, Jeffrey D. Brower wrote:

> OK so am I hearing:

Enough...

...there is waaay too much hearsay going on in this thread.  Let's come
up with an acceptable test battery and actually settle it once and for
all with good hard numbers.  It would be worth my while to spend some
time on this since the developers I support currently hate pgsql due to
performance complaints (on servers that predate my employment there).
So if I am going to move them to better servers it would be worth my
while to do some homework on what OS and FS is best.

I'm not qualified at all to define the tests.  I am willing to try it
on any OS that will run on a Sun Ultra 5, which would include Linux,
several BSD's and Solaris to name a few.  It also runs the gammut of
filesystems that have been talked about here.  The machine isn't a
barnstormer but I'm willing to put in an 18GB SCSI drive and try this
with many different OS's and FS's if someone qualified will put
together an acceptable test suite and it doesn't meet with too much
opposition by the gurus here.

The test machine:

    Sun UltraSPARC 5
    333MHz UltraSPARC CPU, 2MB cache
    256MB RAM
    whatever SCSI card I can find most quickly
    either a 9GB or 18GB SCSI drive (whichever I can find most quickly)

The test client would likely be an Apple Powerbook G4 800MHz, 512MB,
running OS X 10.2.4.  Yes the client runs rings around the server but I
can afford to abuse the server.

While the server is admittedly an older machine, for the purpose of
this test it should not matter as long as the hardware configuration is
equal for all tests.  If we agree on a test suite there is nothing to
stop someone from running the same suite on their own hardware and
reporting their own results.

Anyone game to give a go at this?

--

"What difference does it make to the dead, the orphans and the
homeless, whether the mad destruction is wrought under the name of
totalitarianism or the holy name of liberty or democracy?" - Mahatma
Gandhi

Attachment

Re: ext3 filesystem / linux 7.3

From
Andrew Sullivan
Date:
On Wed, Apr 02, 2003 at 09:44:31PM -0500, Chris Hedemark wrote:

> While the server is admittedly an older machine, for the purpose of
> this test it should not matter as long as the hardware configuration is
> equal for all tests.  If we agree on a test suite there is nothing to

That's false.

One of the big problems with a lot of tuning info is that it tends
not to take int consideration hardware, &c.  I can tell you for sure
that if you have a giant-cache array connected by fibre channel, _it
makes no difference_ what the filesystem is.  The array is so fast
that you can't really fill the cache under normal load anyway.
Similarly, if you have enough memory, every read test is going to be
as fast as any other: you'll get 100% cache hits, and the same memory
configured the same way will always respond at about the same speed.

That said, I think you're right to demand some tests, and to say that
holding the machine constant and changing filesystems is a good
filesystem test.

So here are some suggested things, in no real order:

1.    Make sure you run out of buffers before you start to read
(for read filesystem speed tests).
2.    Pull the power plug repeatedly while the server is under
load.  Judge robustness.
3.    Put WAL and data area on different filesystems (to be fair,
this should probably be different spindles, but I'll take what I can
get) and configure the filesystems in various ways (including, say,
writeback for data and full journalling for WAL).  See tests above.
4.    Make sure your controller doesn't lie about fsync.
5.    Test under different loads.  10% writes vs. 90% reads;
20% writes; &c.  Compare simple INSERT write with UPDATE write.
Compare UPDATE writes where the UPDATEd row is the same one over and
over.  Make sure you do (2) several times.

Lots of these are artificial.  But it seems they might reveal
something.  I'd be particularly keen to hear about what _really_ is
up with reiserfs.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: ext3 filesystem / linux 7.3

From
Josh Berkus
Date:
Chris,

> ...there is waaay too much hearsay going on in this thread.  Let's come
> up with an acceptable test battery and actually settle it once and for
> all with good hard numbers.  It would be worth my while to spend some
> time on this since the developers I support currently hate pgsql due to
> performance complaints (on servers that predate my employment there).
> So if I am going to move them to better servers it would be worth my
> while to do some homework on what OS and FS is best.

You're not going to be able to determine this for certain, but at least you
should be able to debunk some myths.  Here's my suggested tests:

1) Read-only test -- numerous small rapidfire queries in the fashion of a PHP
web application.  PGBench already does this one test ok, maybe you could use
that.

2) Complex query test -- run a few 12-table queries with CASE statements,
custom functions and subselects and/or UNIONs.

3) Transaction Test -- hit the database with numerous rapid-fire single row
updates to a few tables.

4) OLAP Test -- do a few massive updates to thousands of rows based on related
data and/or cascading updates to multiple tables and dozens-hundreds of rows.
Create large temp tables based on Joe Conway's Crosstab.

5) Mixed use test: combine 1, 2, & 3 in a ratio of 70% 10% 20% on several
simultaneous connections.

Of course this requires us to have a sample database with at least 100,000
rows of data in one or two tables plus at least 5-10 additional tables with
realistically complex relationships.  Donor, anyone?

Also, we'll have to talk about .conf files ...

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: ext3 filesystem / linux 7.3

From
Seth Robertson
Date:
In message <200304022133.44511.josh@agliodbs.com>, Josh Berkus writes:

    Chris,

    > ...there is waaay too much hearsay going on in this thread.  Let's come
    > up with an acceptable test battery and actually settle it once and for
    > all with good hard numbers.  It would be worth my while to spend some
    > time on this since the developers I support currently hate pgsql due to
    > performance complaints (on servers that predate my employment there).
    > So if I am going to move them to better servers it would be worth my
    > while to do some homework on what OS and FS is best.

    You're not going to be able to determine this for certain, but at
    least you should be able to debunk some myths.  Here's my
    suggested tests:

    [...]

    Also, we'll have to talk about .conf files ...

When I installed my postgres, I tried a test program I wrote with all
four values of wal_sync, and for my RedHat Linux 8.0 ext3 filesystem
(default mount options), and my toy test; open_sync performed the best
for me.  Thus, I would suggest adding the wal_sync_method as another
axis for your testing.

                                        -Seth Robertson
                                         seth@sysd.com


Re: ext3 filesystem / linux 7.3

From
Chris Hedemark
Date:
On Thursday, April 3, 2003, at 12:33 AM, Josh Berkus wrote:

> You're not going to be able to determine this for certain, but at
> least you
> should be able to debunk some myths.  Here's my suggested tests:
[snip]

Being a mere sysadmin, it is creation of the test cases (perl script,
maybe?) that I'll have to ask someone else with more of a development
bent to help with.  My talent is more along the lines of system
administration.  Plus I am willing to take the time to go through these
tests over & over with a different OS or different tuning parameters on
the same OS, different FS's, etc.  Someone else needs to come up with
the test code.  The client machine has pgsql on it also if the results
are going into a db that won't go away after every test. :)

--

"What difference does it make to the dead, the orphans and the
homeless, whether the mad destruction is wrought under the name of
totalitarianism or the holy name of liberty or democracy?" - Mahatma
Gandhi


Re: ext3 filesystem / linux 7.3

From
Chris Sutton
Date:
On Wed, 2 Apr 2003, Josh Berkus wrote:

> > I had previously understood that ext3 was a bad thing with PostgreSQL and I
> > went way above and beyond to create it on an Ext2 filesystem (the only one
> > on the server) and mount that.

We recently started using Postgres on a new database server running RH 7.3
and ext3.  Due to some kernel problems the machine would crash at random
times.  Each time it crashed it came back up extremly easily with no data
loss.  If we were on ext2 coming back up after a crash probably wouldn't
have been quite as easy.

We have since given up on RH 7.3 and gone with RH Enterprise ES.  Just an
FIY for any of you out there thinking about moving to RH 7.3 or those that
are having problems with 7.3 and ext3.

Chris


Re: ext3 filesystem / linux 7.3

From
Josh Berkus
Date:
Chris,

> Being a mere sysadmin, it is creation of the test cases (perl script,
> maybe?) that I'll have to ask someone else with more of a development
> bent to help with.

I'll write the test queries and perl scripts if someone else can supply the
database.   Unfortunately, while I have a few databases that meet the
criteria, they are all NDA.

Criteria again:
Must have at least 100,000 rows with 12+ columns in "main" table.
Must have at least 10-12 additional tables, some with FK relationships to the
main table and each other.
Must be OK to make contents public.
More is better up to 500MB.

--
Josh Berkus
Aglio Database Solutions
San Francisco


Re: ext3 filesystem / linux 7.3

From
Chris Hedemark
Date:
On Thursday, April 3, 2003, at 11:52 AM, Josh Berkus wrote:

> Unfortunately, while I have a few databases that meet the
> criteria, they are all NDA.

I'm in the same boat.

--

"What difference does it make to the dead, the orphans and the
homeless, whether the mad destruction is wrought under the name of
totalitarianism or the holy name of liberty or democracy?" - Mahatma
Gandhi


Re: ext3 filesystem / linux 7.3

From
"Jeffrey D. Brower"
Date:
Can't we generate data?  Random data stored in random formats at random
sizes would stress the file system wouldn't it?

----- Original Message -----
From: "Josh Berkus" <josh@agliodbs.com>
To: "Chris Hedemark" <chrish@trilug.org>; <pgsql-performance@postgresql.org>
Sent: Thursday, April 03, 2003 11:52 AM
Subject: Re: [PERFORM] ext3 filesystem / linux 7.3


> Chris,
>
> > Being a mere sysadmin, it is creation of the test cases (perl script,
> > maybe?) that I'll have to ask someone else with more of a development
> > bent to help with.
>
> I'll write the test queries and perl scripts if someone else can supply
the
> database.   Unfortunately, while I have a few databases that meet the
> criteria, they are all NDA.
>
> Criteria again:
> Must have at least 100,000 rows with 12+ columns in "main" table.
> Must have at least 10-12 additional tables, some with FK relationships to
the
> main table and each other.
> Must be OK to make contents public.
> More is better up to 500MB.
>
> --
> Josh Berkus
> Aglio Database Solutions
> San Francisco
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster


Re: ext3 filesystem / linux 7.3

From
"scott.marlowe"
Date:
On Thu, 3 Apr 2003, Chris Sutton wrote:

> On Wed, 2 Apr 2003, Josh Berkus wrote:
>
> > > I had previously understood that ext3 was a bad thing with PostgreSQL and I
> > > went way above and beyond to create it on an Ext2 filesystem (the only one
> > > on the server) and mount that.
>
> We recently started using Postgres on a new database server running RH 7.3
> and ext3.  Due to some kernel problems the machine would crash at random
> times.  Each time it crashed it came back up extremly easily with no data
> loss.  If we were on ext2 coming back up after a crash probably wouldn't
> have been quite as easy.
>
> We have since given up on RH 7.3 and gone with RH Enterprise ES.  Just an
> FIY for any of you out there thinking about moving to RH 7.3 or those that
> are having problems with 7.3 and ext3.

We're still running RH 7.2 due to issues we had with 7.3 as well.


Re: ext3 filesystem / linux 7.3

From
Josh Berkus
Date:
Jeffery,

> Can't we generate data?  Random data stored in random formats at random
> sizes would stress the file system wouldn't it?

In my experience, randomly generated data tends to resemble real data very
little in distribution patterns and data types.  This is one of the
limitations of PGBench.

Surely there must be an OSS project out there with a medium-large PG database
which is OSS-licensed?

I'll post on GENERAL

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: ext3 filesystem / linux 7.3

From
Shankar K
Date:
Hi Scott,

Could you please share with us the problems you had
with linux 7.3

would be really interested to know the kernel configs
and ext3 filesystem modes

Shankar

--- "scott.marlowe" <scott.marlowe@ihs.com> wrote:
> On Thu, 3 Apr 2003, Chris Sutton wrote:
>
> > On Wed, 2 Apr 2003, Josh Berkus wrote:
> >
> > > > I had previously understood that ext3 was a
> bad thing with PostgreSQL and I
> > > > went way above and beyond to create it on an
> Ext2 filesystem (the only one
> > > > on the server) and mount that.
> >
> > We recently started using Postgres on a new
> database server running RH 7.3
> > and ext3.  Due to some kernel problems the machine
> would crash at random
> > times.  Each time it crashed it came back up
> extremly easily with no data
> > loss.  If we were on ext2 coming back up after a
> crash probably wouldn't
> > have been quite as easy.
> >
> > We have since given up on RH 7.3 and gone with RH
> Enterprise ES.  Just an
> > FIY for any of you out there thinking about moving
> to RH 7.3 or those that
> > are having problems with 7.3 and ext3.
>
> We're still running RH 7.2 due to issues we had with
> 7.3 as well.
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: you can get off all lists at once with the
> unregister command
>     (send "unregister YourEmailAddressHere" to
majordomo@postgresql.org)


__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://tax.yahoo.com


Re: ext3 filesystem / linux 7.3

From
"scott.marlowe"
Date:
On Thu, 3 Apr 2003, Shankar K wrote:

> Hi Scott,
>
> Could you please share with us the problems you had
> with linux 7.3
>
> would be really interested to know the kernel configs
> and ext3 filesystem modes

Actually, I had a couple of problems with it, one of which was that I
couldn't get it to book with ext3 file systems properly.  I think it was
something to do with ext3 on linux kernel RAID sets that wouldn't work
right.  There's probably a fix for it, but 7.2 is pretty stable, and we
can wait for 8.0 or maybe look at another distro.

I remember there being some other issues I had with configuration stuff
like this, but now that it's been many months since I played with it I
can't remember them all.

My personal problem was that redhat stopped including linuxconf as an rpm
package, and the only configuration programs they include don't seem to
work well from a command line, but seemed to prefer to be used in X11.


Re: ext3 filesystem / linux 7.3

From
Will LaShell
Date:
Hey guys,

On Thu, 2003-04-03 at 13:19, scott.marlowe wrote:
> On Thu, 3 Apr 2003, Shankar K wrote:
>
> > Hi Scott,
> >
> > Could you please share with us the problems you had
> > with linux 7.3
> >
> > would be really interested to know the kernel configs
> > and ext3 filesystem modes
>
> Actually, I had a couple of problems with it, one of which was that I
> couldn't get it to book with ext3 file systems properly.  I think it was
> something to do with ext3 on linux kernel RAID sets that wouldn't work
> right.  There's probably a fix for it, but 7.2 is pretty stable, and we
> can wait for 8.0 or maybe look at another distro.
>

Normally I stay far far away from the distro wars / filesystem
discussions. However I'd like to offer information about the systems we
use here at OFS. The 2 core database servers  are a matched pair of
system with the following statistics.
Dual AMD MP 1800's
Tyan Thunder K7x motherboard
LSI Megaraid Elite 1650 controller w/ battery pack & 128 Mb cache
5 Seagate Cheetak 10k 36 Gig drives  Configured in a raid 1+0 w/ hot
spare.

Both are using the stock redhat 7.3 kernel w/ the latest LSI megaraid
drivers and firmware.

The postgresql cluster itself contains the records and information
necessary to process loans and loan applications.

We are using rserv ( from contrib ) to replicate data from three
databases in the cluster between the two servers. ( Hahah, I think we
may be the only people using this in production or something.  )

At any rate we use ext3 on the filesystems and we've had no problems at
all with the systems. Everything is stable and runs. We keep the
machines running and available 24/7 with scheduled downtime transitions
to the redundant servers as we need to for whatever kind of
enhancements.

The largest table in the cluster btw, has 4.2 million tuples in it and
its the rserv log table.

Hope this gives you some additional information to base your decisions
on.

Sincerely,
Will LaShell

<snip>

Attachment

Re: ext3 filesystem / linux 7.3

From
Josh Berkus
Date:
Will,


> At any rate we use ext3 on the filesystems and we've had no problems at
> all with the systems. Everything is stable and runs. We keep the
> machines running and available 24/7 with scheduled downtime transitions
> to the redundant servers as we need to for whatever kind of
> enhancements.

Hey, can we use you as a case study for advocacy.openoffice.org?

--
-Josh Berkus

______AGLIO DATABASE SOLUTIONS___________________________
                                        Josh Berkus
   Complete information technology     josh@agliodbs.com
    and data management solutions     (415) 565-7293
   for law firms, small businesses      fax 621-2533
    and non-profit organizations.     San Francisco


Re: ext3 filesystem / linux 7.3

From
Andreas Kostyrka
Date:
On Wed, 2003-04-02 at 17:56, Andrew Sullivan wrote:
> On Wed, Apr 02, 2003 at 05:18:26PM +0200, Andreas Kostyrka wrote:
>
> > Well, I've been using ReiserFS now for years, and never had any problems
> > with it.
>
> Me too.  But the "known failure modes" that people keep reporting
> about have to do with completely trashing, say, a whole page of data.
> Your directories are fine, but the data is all hosed.
>
> I've never had it happen.  I've never seen anyone who can
> consistently reproduce it.  But I've certainly read about it often
> enough to have pretty serious reservations about relying on the
> filesystem for data I can't afford to lose.
Well, than backups and statistics are your only solution.
Only way to know if something works is to test it for some time. (You
never know if something in your use doesn't trigger some border case of
malfunction in the kernel.)

Andreas
--
Andreas Kostyrka
Josef-Mayer-Strasse 5
83043 Bad Aibling

Attachment

Re: ext3 filesystem / linux 7.3

From
Will LaShell
Date:
Yes, I think we'd be willing to do that.

( 480 967 7530 ) is the phone contact for the company,
IT manager is Trevor Mantle
and you can ask for me as well.

wlashell@outsourcefinancial.com  is my work email you can feel free to
use.

Sincerely,

Will LaShell

On Thu, 2003-04-03 at 16:12, Josh Berkus wrote:
> Will,
>
>
> > At any rate we use ext3 on the filesystems and we've had no problems at
> > all with the systems. Everything is stable and runs. We keep the
> > machines running and available 24/7 with scheduled downtime transitions
> > to the redundant servers as we need to for whatever kind of
> > enhancements.
>
> Hey, can we use you as a case study for advocacy.openoffice.org?
>
> --
> -Josh Berkus
>
> ______AGLIO DATABASE SOLUTIONS___________________________
>                                         Josh Berkus
>    Complete information technology     josh@agliodbs.com
>     and data management solutions     (415) 565-7293
>    for law firms, small businesses      fax 621-2533
>     and non-profit organizations.     San Francisco
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster


Attachment

Re: ext3 filesystem / linux 7.3

From
Robert Treat
Date:
We've had 2 crashes on red hat 7.3 in about 9 months of running. Both
instances required manual power off/on of the server, but everything
came up nice and ready to go. The problems seemed to stem from i/o load
with the kernel (not postgresql specific), but should be resolved with
the latest Red Hat kernel. If you search on buffer_jdirty in bugzilla
you'll see a couple of reports.

Robert Treat

On Thu, 2003-04-03 at 14:45, Shankar K wrote:
> Hi Scott,
>
> Could you please share with us the problems you had
> with linux 7.3
>
> would be really interested to know the kernel configs
> and ext3 filesystem modes
>
> Shankar
>
> --- "scott.marlowe" <scott.marlowe@ihs.com> wrote:
> > On Thu, 3 Apr 2003, Chris Sutton wrote:
> >
> > > On Wed, 2 Apr 2003, Josh Berkus wrote:
> > >
> > > > > I had previously understood that ext3 was a
> > bad thing with PostgreSQL and I
> > > > > went way above and beyond to create it on an
> > Ext2 filesystem (the only one
> > > > > on the server) and mount that.
> > >
> > > We recently started using Postgres on a new
> > database server running RH 7.3
> > > and ext3.  Due to some kernel problems the machine
> > would crash at random
> > > times.  Each time it crashed it came back up
> > extremly easily with no data
> > > loss.  If we were on ext2 coming back up after a
> > crash probably wouldn't
> > > have been quite as easy.
> > >
> > > We have since given up on RH 7.3 and gone with RH
> > Enterprise ES.  Just an
> > > FIY for any of you out there thinking about moving
> > to RH 7.3 or those that
> > > are having problems with 7.3 and ext3.
> >
> > We're still running RH 7.2 due to issues we had with
> > 7.3 as well.
> >


Re: ext3 filesystem / linux 7.3

From
Kevin Brown
Date:
Josh Berkus wrote:
> Jeffery,
>
> > Can't we generate data?  Random data stored in random formats at random
> > sizes would stress the file system wouldn't it?
>
> In my experience, randomly generated data tends to resemble real data very
> little in distribution patterns and data types.  This is one of the
> limitations of PGBench.

Okay, from this it sounds like what we need is information on the data
types typically used for real world applications and information on
the the distribution patterns for each type (the latter could get
quite complex and varied, I'm sure, but since we're after something
that's typical, we only need a few examples).

So perhaps the first step in this is to write something that will show
what the distribution pattern for data in a table is?  With that
information, we *could* randomly generate data that would conform to
the statistical patterns seen in the real world.

In fact, even though the databases you have access to are all
proprietary, I'm pretty sure their owners would agree to let you run a
program that would gather statistical distribution about it.  Then (as
long as they agree) you could copy the schema itself, recreate it on
the test system, and randomly generate the data.



--
Kevin Brown                          kevin@sysexperts.com


Re: ext3 filesystem / linux 7.3

From
Josh Berkus
Date:
Kevin,

> So perhaps the first step in this is to write something that will show
> what the distribution pattern for data in a table is?  With that
> information, we *could* randomly generate data that would conform to
> the statistical patterns seen in the real world.

Sure.  But I think it'll be *much* easier just to use portions of the FCC
database.   You want to start working on converting it to PostgreSQL?

--
Josh Berkus
Aglio Database Solutions
San Francisco