Thread: Recomended FS

Recomended FS

From
"Ben-Nes Michael"
Date:
Hi

I'm upgrading the DB sever hardware and also the Linux OS.

My Questions are:

1. What is the preferred FS to go with ? EXT3, Reiseref, JFS, XFS ? ( speed,
efficiency )
2. What is the most importent part in the Hardware ? fast HD, alot of mem,
or maybe strong cpu ?

Thanks in Advance

--------------------------
Canaan Surfing Ltd.
Internet Service Providers
Ben-Nes Michael - Manager
Tel: 972-4-6991122
Fax: 972-4-6990098
http://www.canaan.net.il
--------------------------


Re: Recomended FS

From
Shridhar Daithankar
Date:
Ben-Nes Michael wrote:

> Hi
>
> I'm upgrading the DB sever hardware and also the Linux OS.
>
> My Questions are:
>
> 1. What is the preferred FS to go with ? EXT3, Reiseref, JFS, XFS ? ( speed,
> efficiency )

Thats a flamebait. People never agree due to their experiences. Besides that
depends upon what kind of database you are dealing with.

Best bet is benchmark for your own app. Reiser/XFS/JFS are all good. Ext3
requires selection of proper mode. Its almost equally good. You decide what
works best for you..

> 2. What is the most importent part in the Hardware ? fast HD, alot of mem,
> or maybe strong cpu ?

A fast HD with a good RAID controller. Subject to budget, SCSI are beter buy
than IDE. So does hardware SCSI RAID.

  Shridhar


Re: Recomended FS

From
Peter Childs
Date:

On Mon, 20 Oct 2003, Shridhar Daithankar wrote:
>
> A fast HD with a good RAID controller. Subject to budget, SCSI are beter buy
> than IDE. So does hardware SCSI RAID.
>
    I hate asking this again. But WHY?

    What SCSI gain in spinning at 15000RPM and larger buffers. They
lose in Space, and a slower bus. I would like to see some profe. Sorry.

IDE Hard Disk 40Gb 7200RPM   = 133Mbs = 50UKP
SCSI Hard Disk 36Gb 10000RPM = 160Mbs = 110UKP

    Is that extra 27Mbs worth another IDE Disk. and while you can get
bigger faster SCSI disks prices go through the roof. Its no longer RAID
but RAED (Redundant Array of Expensive Disks)

    My advise not that I've got any proof is that the money is better
spent on a good disk controller and many disks than on each disk.

    In short if you have money to burn then by all means get SCSI but
most people are better of spending

$200 Disk Controller                $200 Disk Controller
$100 40Gb Disks         Than        $200 40Gb Disk

Prices only approx.

Peter Childs

Re: Recomended FS

From
Shridhar Daithankar
Date:
Peter Childs wrote:

>
> On Mon, 20 Oct 2003, Shridhar Daithankar wrote:
>
>>A fast HD with a good RAID controller. Subject to budget, SCSI are beter buy
>>than IDE. So does hardware SCSI RAID.
>>
>
>     I hate asking this again. But WHY?

OK.. There are only few SCSI disks that I have handled so take it with grain of
salt.

1. SCSI bus can share bandwidth much better than IDE disks. Put two IDE disks on
same channel and two SCSI disks. See which combo performs better.
2. <Unconfirmed> SCSI disks are idividually tested and IDEs are sampled. Makes a
big difference in reliability. I know for some people IDE disks do not crash at
all but majority think SCSI are more reliable than IDEs.
3. SCSI disks have Tag commands and things alike, that makes them better at
handling load.

Technically,  if you don't know the load, SCSI would make a better choice. If
you know your load very well and it is predictive, IDE might be a choice.

I would personally prefer IDE disk array with hardware RAID controller because I
can put it in my home machine, unlike SCSI. But every developer I have asked
around here, says that IDE performance starts dropping once you hit real world load.

  Shridhar


Re: Recomended FS

From
Nick Burrett
Date:
Peter Childs wrote:
>
> On Mon, 20 Oct 2003, Shridhar Daithankar wrote:
>
>>A fast HD with a good RAID controller. Subject to budget, SCSI are beter buy
>>than IDE. So does hardware SCSI RAID.
>>
>
>     I hate asking this again. But WHY?

The duty cycle of SCSI drives is 100%.  The duty cycle of IDE drives is
around 30-40%.  Therefore one uses SCSI drives in mail and news servers
where disk access is more-or-less permanent.  IDE drives usually degrade
or fail faster under such load.

 From experience I have noticed that IDE drives that initially perform
at 30Mbyte/sec dropped to around 10Mbyte/sec after a year or so.

>     What SCSI gain in spinning at 15000RPM and larger buffers. They
> lose in Space, and a slower bus. I would like to see some profe. Sorry.
>
> IDE Hard Disk 40Gb 7200RPM   = 133Mbs = 50UKP
> SCSI Hard Disk 36Gb 10000RPM = 160Mbs = 110UKP

On new servers doing a software RAID1 sync between two disks, I find the
following sustained transfer rates:

SuperMicro 6013P-i ATA 133 80Gb IDE 7200rpm: 39000kbytes/sec.
SuperMicro 6013P-8 SCSI 320 72Gb SCSI 10000rpm: 65000kbytes/sec.

The IDE drives are on seperate busses.  The SCSI drives are on the same bus.

I think that the 320Mhz SCSI busses are a bit faster than the 133Mhz ATA
busses.


>     Is that extra 27Mbs worth another IDE Disk. and while you can get
> bigger faster SCSI disks prices go through the roof. Its no longer RAID
> but RAED (Redundant Array of Expensive Disks)
>
>     My advise not that I've got any proof is that the money is better
> spent on a good disk controller and many disks than on each disk.
>
>     In short if you have money to burn then by all means get SCSI but
> most people are better of spending

I suppose that's your choice.  Another way of looking that things is to
consider the worth the server has to your business and factor that into
how much you should consider spending on equipment.

e.g. if the server can be attributed to £10,000/year, then perhaps a
cheap PC will do.  If £1 million of your business relies on the server,
then perhaps you should look into investing more into it.


Regards,


Nick.


--
Nick Burrett
Network Engineer, Designer Servers Ltd.   http://www.dsvr.co.uk


Re: Recomended FS

From
Peter Childs
Date:

On Mon, 20 Oct 2003, Nick Burrett wrote:

> Peter Childs wrote:
> >
> > On Mon, 20 Oct 2003, Shridhar Daithankar wrote:
> >
> >>A fast HD with a good RAID controller. Subject to budget, SCSI are beter buy
> >>than IDE. So does hardware SCSI RAID.
> >>
> >
> >     I hate asking this again. But WHY?
>
> The duty cycle of SCSI drives is 100%.  The duty cycle of IDE drives is
> around 30-40%.  Therefore one uses SCSI drives in mail and news servers
> where disk access is more-or-less permanent.  IDE drives usually degrade
> or fail faster under such load.
>
>  From experience I have noticed that IDE drives that initially perform
> at 30Mbyte/sec dropped to around 10Mbyte/sec after a year or so.
>
> >     What SCSI gain in spinning at 15000RPM and larger buffers. They
> > lose in Space, and a slower bus. I would like to see some profe. Sorry.
> >
> > IDE Hard Disk 40Gb 7200RPM   = 133Mbs = 50UKP
> > SCSI Hard Disk 36Gb 10000RPM = 160Mbs = 110UKP
>
> On new servers doing a software RAID1 sync between two disks, I find the
> following sustained transfer rates:
>
> SuperMicro 6013P-i ATA 133 80Gb IDE 7200rpm: 39000kbytes/sec.
> SuperMicro 6013P-8 SCSI 320 72Gb SCSI 10000rpm: 65000kbytes/sec.
>
> The IDE drives are on seperate busses.  The SCSI drives are on the same bus.
>
> I think that the 320Mhz SCSI busses are a bit faster than the 133Mhz ATA
> busses.
>
>
> >     Is that extra 27Mbs worth another IDE Disk. and while you can get
> > bigger faster SCSI disks prices go through the roof. Its no longer RAID
> > but RAED (Redundant Array of Expensive Disks)
> >
> >     My advise not that I've got any proof is that the money is better
> > spent on a good disk controller and many disks than on each disk.
> >
> >     In short if you have money to burn then by all means get SCSI but
> > most people are better of spending
>
> I suppose that's your choice.  Another way of looking that things is to
> consider the worth the server has to your business and factor that into
> how much you should consider spending on equipment.
>
> e.g. if the server can be attributed to £10,000/year, then perhaps a
> cheap PC will do.  If £1 million of your business relies on the server,
> then perhaps you should look into investing more into it.
>
>
    At last somone who has the real answers that I thought ought to be
true all the time. Its a shame nobody can give some hard and fast numbers
that I can get the budget people to understand!

Peter Childs

Re: Recomended FS

From
"Ben-Nes Michael"
Date:
I'm not a HD specialist but I know scsi can handle load much better the IDE.

I read a benchmark lately ( don't really remember where ) checking SATA
against U160, the result show that SATA give better performance at start.

but later on the SCSI take it while HD cpu load is 30% and the SATA is 100%
load for the same task.

So I see its kinda obvious for me, if its a server serve lots of files and
the HD will work against lots of users ill go for the SCSI.
For a workstation or backup server ill go for IDE.

But still the greatest question is what FS to put on ?

I heard Reiesref can handle small files very quickly.
--------------------------
Canaan Surfing Ltd.
Internet Service Providers
Ben-Nes Michael - Manager
Tel: 972-4-6991122
Fax: 972-4-6990098
http://www.canaan.net.il
--------------------------
----- Original Message -----
From: "Peter Childs" <blue.dragon@blueyonder.co.uk>
To: "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in>
Cc: "Ben-Nes Michael" <miki@canaan.co.il>; "postgresql"
<pgsql-general@postgresql.org>
Sent: Monday, October 20, 2003 11:51 AM
Subject: Re: [GENERAL] Recomended FS


>
>
> On Mon, 20 Oct 2003, Shridhar Daithankar wrote:
> >
> > A fast HD with a good RAID controller. Subject to budget, SCSI are beter
buy
> > than IDE. So does hardware SCSI RAID.
> >
> I hate asking this again. But WHY?
>
> What SCSI gain in spinning at 15000RPM and larger buffers. They
> lose in Space, and a slower bus. I would like to see some profe. Sorry.
>
> IDE Hard Disk 40Gb 7200RPM   = 133Mbs = 50UKP
> SCSI Hard Disk 36Gb 10000RPM = 160Mbs = 110UKP
>
> Is that extra 27Mbs worth another IDE Disk. and while you can get
> bigger faster SCSI disks prices go through the roof. Its no longer RAID
> but RAED (Redundant Array of Expensive Disks)
>
> My advise not that I've got any proof is that the money is better
> spent on a good disk controller and many disks than on each disk.
>
> In short if you have money to burn then by all means get SCSI but
> most people are better of spending
>
> $200 Disk Controller $200 Disk Controller
> $100 40Gb Disks Than $200 40Gb Disk
>
> Prices only approx.
>
> Peter Childs
>


Re: Recomended FS

From
Jeff
Date:
On Mon, 20 Oct 2003 11:07:20 +0100
Nick Burrett <nick@dsvr.net> wrote:


>  From experience I have noticed that IDE drives that initially perform
>
> at 30Mbyte/sec dropped to around 10Mbyte/sec after a year or so.
>

Yes. This is very true - a good test I like to show of IDE falling apart
is to start up one client and show it go very fast.  Then start up 20
and see what happens :)

Also - you can easily have many, many more scsi devices (and external
scsi devices) than IDE.  More platters / disks == faster IO.


> >
> > IDE Hard Disk 40Gb 7200RPM   = 133Mbs = 50UKP
> > SCSI Hard Disk 36Gb 10000RPM = 160Mbs = 110UKP
>

If you don't mind refurb disks that still have a warranty, check out
ebay.  Friday I won a lot of 10 18GB disks for $96 + $27
insured shipping.   But yeah, new scsi is quite expensive, but it can be
worth it...  IMHO scsi is to be used in a raid, not alone.  No one disk
can saturate the bw offered. (both ide and scsi).


--
Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/

Re: Recomended FS

From
Nick Burrett
Date:
Ben-Nes Michael wrote:

> But still the greatest question is what FS to put on ?
>
> I heard Reiesref can handle small files very quickly.

Switching from ext3 to reiserfs for our name servers reduced the time
taken to load 110,000 zones from 45 minutes to 5 minutes.

However for a database, I don't think you can really factor this type of
stuff into the equation.  The performance benefits you get from
different filesystem types are going to be small compared to the
modifications that you can make to your database structure, queries and
applications.  The actual algorithms used in processing the data will be
much slower than the time taken to fetch the data off disk.

--
Nick Burrett
Network Engineer, Designer Servers Ltd.   http://www.dsvr.co.uk


Re: Recomended FS

From
"Ben-Nes Michael"
Date:
----- Original Message -----
From: "Nick Burrett" <nick@dsvr.net>
To: "Ben-Nes Michael" <miki@canaan.co.il>
Cc: "Peter Childs" <blue.dragon@blueyonder.co.uk>; "Shridhar Daithankar"
<shridhar_daithankar@persistent.co.in>; "postgresql"
<pgsql-general@postgresql.org>
Sent: Monday, October 20, 2003 2:08 PM
Subject: Re: [GENERAL] Recomended FS


> Ben-Nes Michael wrote:
>
> > But still the greatest question is what FS to put on ?
> >
> > I heard Reiesref can handle small files very quickly.
>
> Switching from ext3 to reiserfs for our name servers reduced the time
> taken to load 110,000 zones from 45 minutes to 5 minutes.
>
> However for a database, I don't think you can really factor this type of
> stuff into the equation.  The performance benefits you get from
> different filesystem types are going to be small compared to the
> modifications that you can make to your database structure, queries and
> applications.  The actual algorithms used in processing the data will be
> much slower than the time taken to fetch the data off disk.

So you say the FS has no real speed impact on the SB ?

In my pg data folder i have 2367 files, some big some small.
>
> --
> Nick Burrett
> Network Engineer, Designer Servers Ltd.   http://www.dsvr.co.uk


Re: Recomended FS

From
Nick Burrett
Date:
Ben-Nes Michael wrote:
> ----- Original Message -----
> From: "Nick Burrett" <nick@dsvr.net>
>>Ben-Nes Michael wrote:
>>
>>
>>>But still the greatest question is what FS to put on ?
>>>
>>>I heard Reiesref can handle small files very quickly.
>>
>>Switching from ext3 to reiserfs for our name servers reduced the time
>>taken to load 110,000 zones from 45 minutes to 5 minutes.
>>
>>However for a database, I don't think you can really factor this type of
>>stuff into the equation.  The performance benefits you get from
>>different filesystem types are going to be small compared to the
>>modifications that you can make to your database structure, queries and
>>applications.  The actual algorithms used in processing the data will be
>>much slower than the time taken to fetch the data off disk.
>
>
> So you say the FS has no real speed impact on the SB ?
>
> In my pg data folder i have 2367 files, some big some small.

I'm saying: don't expect your DB performance to come on leaps and bounds
just because you changed to a different filesystem format.  If you've
got speed problems then it might help to look elsewhere first.

--
Nick Burrett
Network Engineer, Designer Servers Ltd.   http://www.dsvr.co.uk


Re: Recomended FS

From
"Ben-Nes Michael"
Date:
----- Original Message -----
From: "Nick Burrett" <nick@dsvr.net>
To: "Ben-Nes Michael" <miki@canaan.co.il>
Cc: "postgresql" <pgsql-general@postgresql.org>
Sent: Monday, October 20, 2003 2:54 PM
Subject: Re: [GENERAL] Recomended FS

> >>>But still the greatest question is what FS to put on ?
> >>>
> >>>I heard Reiesref can handle small files very quickly.
> >>
> >>Switching from ext3 to reiserfs for our name servers reduced the time
> >>taken to load 110,000 zones from 45 minutes to 5 minutes.
> >>
> >>However for a database, I don't think you can really factor this type of
> >>stuff into the equation.  The performance benefits you get from
> >>different filesystem types are going to be small compared to the
> >>modifications that you can make to your database structure, queries and
> >>applications.  The actual algorithms used in processing the data will be
> >>much slower than the time taken to fetch the data off disk.
> >
> >
> > So you say the FS has no real speed impact on the SB ?
> >
> > In my pg data folder i have 2367 files, some big some small.
>
> I'm saying: don't expect your DB performance to come on leaps and bounds
> just because you changed to a different filesystem format.  If you've
> got speed problems then it might help to look elsewhere first.
>
I dont expect miracles :)
but still i have to choose one,so why shouldnt i choose the one which best
fit ?


Re: Recomended FS

From
Shridhar Daithankar
Date:
Ben-Nes Michael wrote:

>>I'm saying: don't expect your DB performance to come on leaps and bounds
>>just because you changed to a different filesystem format.  If you've
>>got speed problems then it might help to look elsewhere first.
>>
>
> I dont expect miracles :)
> but still i have to choose one,so why shouldnt i choose the one which best
> fit ?

All things being equal, you should optimise your application design and database
tuning before you choose file system.

If a thing works well for you, with a better file system it will just work
better. That's the point.

  Shridhar


Re: Recomended FS

From
"Ben-Nes Michael"
Date:
----- Original Message -----
From: "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in>
To: "Ben-Nes Michael" <miki@canaan.co.il>
Cc: "Nick Burrett" <nick@dsvr.net>; "postgresql"
<pgsql-general@postgresql.org>
Sent: Monday, October 20, 2003 3:06 PM
Subject: Re: [GENERAL] Recomended FS


> Ben-Nes Michael wrote:
>
> >>I'm saying: don't expect your DB performance to come on leaps and bounds
> >>just because you changed to a different filesystem format.  If you've
> >>got speed problems then it might help to look elsewhere first.
> >>
> >
> > I dont expect miracles :)
> > but still i have to choose one,so why shouldnt i choose the one which
best
> > fit ?
>
> All things being equal, you should optimise your application design and
database
> tuning before you choose file system.
>
> If a thing works well for you, with a better file system it will just work
> better. That's the point.
>

I agree, but still ill have to choose an FS, does the list have any opinion
on what FS to choose ?

>   Shridhar
>
>


Re: Recomended FS

From
"Arjen van der Meijden"
Date:
> Peter Childs wrote:
>
> On Mon, 20 Oct 2003, Shridhar Daithankar wrote:
> >
> > A fast HD with a good RAID controller. Subject to budget, SCSI are
> > beter buy than IDE. So does hardware SCSI RAID.
> >
>     I hate asking this again. But WHY?
>
>     What SCSI gain in spinning at 15000RPM and larger
> buffers. They lose in Space, and a slower bus. I would like
> to see some profe. Sorry.
They win it, easily, on random disk accesses and mixed reads and writes.
And the bus is, much, faster not slower.

> IDE Hard Disk 40Gb 7200RPM   = 133Mbs = 50UKP
> SCSI Hard Disk 36Gb 10000RPM = 160Mbs = 110UKP
>
>     Is that extra 27Mbs worth another IDE Disk. and while
> you can get bigger faster SCSI disks prices go through the
> roof. Its no longer RAID but RAED (Redundant Array of Expensive Disks)
You're looking at the BUS speed, not the actual speed the disk achieves.
My guess is that that SCSI disk is, on some fields, twice as fast as the
IDE and on average 10-30% faster.

>     My advise not that I've got any proof is that the money
> is better spent on a good disk controller and many disks than
> on each disk.
This havily depends on your setup and tasks.

- SCSI has a (supposedly) better lifetime, due to (much) better disk
components.
- SCSI disks are designed for servertasks (many random accesses) and
have their queue-management (better) tuned for that. This also applies
to mixed reads and writes.
- SCSI disks have, often, smaller and thicker platters which can spin
more stable and at higher RPMs.
- The SCSI bus allows all the disks to operate at maximum speed (as far
as the PCI-bus can handle it of course), while the IDE bus is shared
among both disks.
- SCSI allows more disks and longer cables on the same controller.

Anyway, you don't need all those advantages all the time, since the
major disadvantage is of course the pricetag.
For simple backup solutions (many storage for with reasonable
performance and an acceptable price), IDE is quite good in RAID5 orso.
For a high performing Database, you really want to look into a RAID
setup with scsi (or at least WD Raptor IDE disks or something like
that).

>     In short if you have money to burn then by all means
> get SCSI but most people are better of spending
Also if you don't have money to burn, but simply need the higher
performance (which is really there) for, for instance, the random disk
accesses.

Best regards,

Arjen




Re: Recomended FS

From
Christopher Browne
Date:
Quoth miki@canaan.co.il ("Ben-Nes Michael"):
> I'm not a HD specialist but I know scsi can handle load much better the IDE.
>
> I read a benchmark lately ( don't really remember where ) checking SATA
> against U160, the result show that SATA give better performance at start.
> but later on the SCSI take it while HD cpu load is 30% and the SATA is 100%
> load for the same task.
>
> So I see its kinda obvious for me, if its a server serve lots of files and
> the HD will work against lots of users ill go for the SCSI.
> For a workstation or backup server ill go for IDE.
>
> But still the greatest question is what FS to put on ?
>
> I heard Reiesref can handle small files very quickly.

ReiserFS was designed to cope with having huge hordes of tiny files.
PostgreSQL doesn't create files in that pattern; it only creates
fairly large files, and that tends to be the pathological case where
ReiserFS works somewhat badly.

When I ran some transaction-heavy benchmarks between ext3, XFS, and
JFS, I found JFS to be pretty consistently faster.  I didn't bother
trying reiserfs because:
 a) It has a history of being slower for big files;
 b) I have had some cases of losing data to it, diminishing my trust
    of it.
--
output = ("cbbrowne" "@" "ntlug.org")
http://www.ntlug.org/~cbbrowne/unix.html
"sic transit discus mundi"
-- From the System Administrator's Guide, by Lars Wirzenius

Re: Recomended FS

From
Murthy Kambhampaty
Date:
You'd be well served if you could benchmark several filesystems and see
which one gives the best "performance" (talk about a loaded term) for your
application. Having said that, however, I'd recommend XFS for its
combination of performance and userspace tools (particularly xfsdump and
xfs_freeze).

Cheers,
    Murthy


>-----Original Message-----
>From: Ben-Nes Michael [mailto:miki@canaan.co.il]
>Sent: Monday, October 20, 2003 04:48
>To: postgresql
>Subject: [GENERAL] Recomended FS
>
>
>Hi
>
>I'm upgrading the DB sever hardware and also the Linux OS.
>
>My Questions are:
>
>1. What is the preferred FS to go with ? EXT3, Reiseref, JFS,
>XFS ? ( speed,
>efficiency )
>2. What is the most importent part in the Hardware ? fast HD,
>alot of mem,
>or maybe strong cpu ?
>
>Thanks in Advance
>
>--------------------------
>Canaan Surfing Ltd.
>Internet Service Providers
>Ben-Nes Michael - Manager
>Tel: 972-4-6991122
>Fax: 972-4-6990098
>http://www.canaan.net.il
>--------------------------
>
>
>---------------------------(end of
>broadcast)---------------------------
>TIP 8: explain analyze is your friend
>

Re: Recomended FS

From
"scott.marlowe"
Date:
On Mon, 20 Oct 2003, Peter Childs wrote:

>
>
> On Mon, 20 Oct 2003, Shridhar Daithankar wrote:
> >
> > A fast HD with a good RAID controller. Subject to budget, SCSI are beter buy
> > than IDE. So does hardware SCSI RAID.
> >
>     I hate asking this again. But WHY?
>
>     What SCSI gain in spinning at 15000RPM and larger buffers. They
> lose in Space, and a slower bus. I would like to see some profe. Sorry.

SCSI beats IDE hands down for databases, and for one reason above all the
rest.  They don't generally lie about fsync.

With SCSI, you can initiate 'pgbench -c 100 -t 1000000' and pull the plug
on your server, and voila, the whole thing will come back up (assuming a
journaling file system, and solid hardware.)

Do that with IDE with write cache enabled and you WILL have a scrambled
database that needs to be re-initdbed and restored.

Now, turn off the write cache on the IDE drive, which will make it solid
and reliable like the SCSI drive, and compare speed, it's not even close.

Until the IDE drive manufacturers start making IDE drives that actually
report fsync properly, they're a toy that should not be used for your
database unless you know the dangers they present.


Re: Recomended FS

From
Peter Eisentraut
Date:
Ben-Nes Michael writes:

> 1. What is the preferred FS to go with ? EXT3, Reiseref, JFS, XFS ? ( speed,
> efficiency )

PostgreSQL might work better on "simple" file systems, so you avoid making
the head run all over the place for writing its own log and the PostgreSQL
log.  Some have even suggested FAT for the data files.  Good bets for
improving performance are putting the WAL logs and the indexes not on the
same spindle as the table files.  Of course, certain RAID configurations
achieve a similar effect.

> 2. What is the most importent part in the Hardware ? fast HD, alot of mem,
> or maybe strong cpu ?

Lots of memory, so you can cache a large fraction of the data in memory.
A good hard disk, if you do a lot of updates and/or your memory is not big
enough to cache most of the data.  CPU is not as important.

--
Peter Eisentraut   peter_e@gmx.net


Re: Recomended FS

From
Unihost Web Hosting
Date:
Hi Ben,

You asked so here's my take on the subject, but I've gotta say that you can't go far wrong with reading Bruce Momjian's paper at:

http://www.ca.postgresql.org/docs/momjian/hw_performance/

But with that aside.

1. Unless your doing major league DB stuff, the FS should make more than marginal difference, if it's Journaled then it's good.  You can take all the time benchmarking that you want, just be sure your ROI is worth the time you invest.  My favourite fs is Reiser, but in the cold light of day, ext3 is supported in more places.  My first choice is Reiser, since I used it even when it was "unstable" on production servers and it never let me down.  I often use one or the other.

2.  Bruce's article really is good for this question, but in a nutshell you need to get as much of the DB as close to the CPU as possible.  As with any  serious application, you can't beat a good L1/L2 cache, then plenty of RAM/Memory ... DBs yum RAM, the more the merrier.  Lastly fast and wide disc access, remember disk access will be the slowest part of the system, and in an ideal world you'd fit nearly all of your DB in RAM if it was practical and safe.

You'd probably gain more from taking the time to really ensure that your DB is designed flawlessly, and all your indexes are where they're needed.  All of the basics come into play, but a well built RDBMS system is greater than the sum of its parts.

For further reading check out:

http://www.argudo.org/postgresql/soft-tuning.html

It all adds up!!.

Good Luck

Tony.



Ben-Nes Michael wrote:
Hi

I'm upgrading the DB sever hardware and also the Linux OS.

My Questions are:

1. What is the preferred FS to go with ? EXT3, Reiseref, JFS, XFS ? ( speed,
efficiency )
2. What is the most importent part in the Hardware ? fast HD, alot of mem,
or maybe strong cpu ?

Thanks in Advance

--------------------------
Canaan Surfing Ltd.
Internet Service Providers
Ben-Nes Michael - Manager
Tel: 972-4-6991122
Fax: 972-4-6990098
http://www.canaan.net.il
--------------------------


---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend 

Re: Recomended FS

From
Mike Benoit
Date:
This site:

http://fsbench.netnation.com/

has some decent generic file system benchmarks that may help with your
decision.

On Mon, 2003-10-20 at 10:32, Murthy Kambhampaty wrote:
> You'd be well served if you could benchmark several filesystems and see
> which one gives the best "performance" (talk about a loaded term) for your
> application. Having said that, however, I'd recommend XFS for its
> combination of performance and userspace tools (particularly xfsdump and
> xfs_freeze).
>
> Cheers,
>     Murthy
>
>
> >-----Original Message-----
> >From: Ben-Nes Michael [mailto:miki@canaan.co.il]
> >Sent: Monday, October 20, 2003 04:48
> >To: postgresql
> >Subject: [GENERAL] Recomended FS
> >
> >
> >Hi
> >
> >I'm upgrading the DB sever hardware and also the Linux OS.
> >
> >My Questions are:
> >
> >1. What is the preferred FS to go with ? EXT3, Reiseref, JFS,
> >XFS ? ( speed,
> >efficiency )
> >2. What is the most importent part in the Hardware ? fast HD,
> >alot of mem,
> >or maybe strong cpu ?
> >
> >Thanks in Advance
> >
> >--------------------------
> >Canaan Surfing Ltd.
> >Internet Service Providers
> >Ben-Nes Michael - Manager
> >Tel: 972-4-6991122
> >Fax: 972-4-6990098
> >http://www.canaan.net.il
> >--------------------------
> >
> >
> >---------------------------(end of
> >broadcast)---------------------------
> >TIP 8: explain analyze is your friend
> >
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
--
Best Regards,

Mike Benoit



Re: Recomended FS

From
Steve Crawford
Date:
On Monday 20 October 2003 10:28 am, scott.marlowe wrote:
> On Mon, 20 Oct 2003, Peter Childs wrote:
> > On Mon, 20 Oct 2003, Shridhar Daithankar wrote:
> > > A fast HD with a good RAID controller. Subject to budget, SCSI
> > > are beter buy than IDE. So does hardware SCSI RAID.
> >
> >     I hate asking this again. But WHY?
> >
> >     What SCSI gain in spinning at 15000RPM and larger buffers. They
> > lose in Space, and a slower bus. I would like to see some profe.
> > Sorry.
>
> SCSI beats IDE hands down for databases, and for one reason above
> all the rest.  They don't generally lie about fsync.
>....

Talk about timing...this article posted today seems quite apropos
(spoiler: SCSI beats IDE):

http://hardware.devchannel.org/hardwarechannel/03/10/20/1953249.shtml?tid=20&tid=38&tid=49

Cheers,
Steve


Re: Recomended FS

From
"Matthew D. Fuller"
Date:
On Mon, Oct 20, 2003 at 08:09:34AM -0400 I heard the voice of
Jeff, and lo! it spake thus:
>
> insured shipping.   But yeah, new scsi is quite expensive, but it can be
> worth it...  IMHO scsi is to be used in a raid, not alone.  No one disk
> can saturate the bw offered. (both ide and scsi).

The difference is that IDE *HAS* to be able to saturate the bus (which it
can't, of course; show me an IDE drive that pushes even 66MB/sec off the
platter) for the bus speed to matter, since IDE doesn't support
disconnection.  Multiple SCSI drives can be stuffing data over the SCSI
channel all at once.  They don't have to be RAID'd, they can be different
filesystems accessed in parallel.


--
Matthew Fuller     (MF4839)   |  fullermd@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/

"The only reason I'm burning my candle at both ends, is because I
      haven't figured out how to light the middle yet"

Re: Recomended FS

From
Mark Kirkwood
Date:
Some sort of ATA Raid is probably worth considering -

e.g. I am experimenting with a system using 2 ATA-66 Seagates + 1
Promise TX2000

The disks themselves give fairly poor performance when attached to the
std IDE channels :

sequential write 15Mb/s
sequential read 20Mb/s

But attached to the Promise card using RAID 0 do considerably better:

sequential write 52Mb/s
sequential read 52MB/s

Now you would probably not use RAID 0 for a "real" system (unless you
had good backups), but the difference is interesting

Note that even including the card, this is a very cheap setup.

(I have not gotten around to testing random read and writes, but if
anybody is interested I can test this and supply figures)

regards

Mark


Steve Crawford wrote:

>
>
>Talk about timing...this article posted today seems quite apropos
>(spoiler: SCSI beats IDE):
>
>http://hardware.devchannel.org/hardwarechannel/03/10/20/1953249.shtml?tid=20&tid=38&tid=49
>
>
>
>


Re: Recomended FS

From
Shridhar Daithankar
Date:
On Tuesday 21 October 2003 11:26, Mark Kirkwood wrote:
> (I have not gotten around to testing random read and writes, but if
> anybody is interested I can test this and supply figures)

Can you compare ogbench results for the RAID and single IDE disks? It would be
great if you could turn off write caching of individual drives in RAID and
test it as well.

I think for lot of databases IDE RAID could be a good compramise. Just
remember its not the best out there. So use it when you have good
backups..:-)

 Shridhar


Re: Recomended FS

From
"Markus Wollny"
Date:
Hi!

> -----Ursprüngliche Nachricht-----
> Von: Shridhar Daithankar [mailto:shridhar_daithankar@persistent.co.in]
> Gesendet: Dienstag, 21. Oktober 2003 08:08
> An: pgsql-general@postgresql.org
> Betreff: Re: [GENERAL] Recomended FS

> Can you compare ogbench results for the RAID and single IDE
> disks? It would be
> great if you could turn off write caching of individual
> drives in RAID and
> test it as well.

One thing I can say from previous experiences is that the type of RAID
does matter quite a lot. RAID5, even with a quite expensive Adaptec
SCSI-hardware-controller, is not always the best solution for a
database, particularly if there's a lot of INSERTs and UPDATEs going on.
If you're not too dependant on raw storage size, your best bet is to use
the space-consuming RAID0+1 instead; the reasoning behind this is
probably that on RAID5 the controller has to calculate the parity-data
for every write-access, on RAID0+1 it just mirrors and distributes the
data, reducing overall load on the controller and making use of more
spindles and two-channel-SCSI.

We're hosting some DB-intensive websites (>12M impressions/month) on two
PostgreSQL-servers (one DELL Poweredge 6400, 4xPentium III Xeon@550MHz,
2GB RAM, 4x18GB SCSI in RAID0+1, 1 hot-spare and one Dell Poweredge
6650, 4x Intel XEON@1.40GHz, 4GB RAM, 4x36 GB SCSI in RAID0+1, 1
hot-spare) and when I switched the 5-disc-RAID5-config over to a
4-disc-RAID0+1 plus one hotspare, I noticed system-load dropping by a
very considerable amount. I haven't got any benchmark-figures to show
off though, it's just experiences from a realworld application.

Regards

    Markus

Re: Recomended FS

From
Holger Marzen
Date:
On Tue, 21 Oct 2003, Markus Wollny wrote:

> Hi!
>
> > -----Ursprüngliche Nachricht-----
> > Von: Shridhar Daithankar [mailto:shridhar_daithankar@persistent.co.in]
> > Gesendet: Dienstag, 21. Oktober 2003 08:08
> > An: pgsql-general@postgresql.org
> > Betreff: Re: [GENERAL] Recomended FS
>
> > Can you compare ogbench results for the RAID and single IDE
> > disks? It would be
> > great if you could turn off write caching of individual
> > drives in RAID and
> > test it as well.
>
> One thing I can say from previous experiences is that the type of RAID
> does matter quite a lot. RAID5, even with a quite expensive Adaptec
> SCSI-hardware-controller, is not always the best solution for a
> database, particularly if there's a lot of INSERTs and UPDATEs going on.
> If you're not too dependant on raw storage size, your best bet is to use
> the space-consuming RAID0+1 instead; the reasoning behind this is
> probably that on RAID5 the controller has to calculate the parity-data
> for every write-access, on RAID0+1 it just mirrors and distributes the
> data, reducing overall load on the controller and making use of more
> spindles and two-channel-SCSI.

Theory vs. real life. In Theory, RAID5 is faster because less data have
to be written to disk. But it's true, many RAID5 controllers don't have
enough CPU power.

Re: Recomended FS

From
"Markus Wollny"
Date:
> Theory vs. real life. In Theory, RAID5 is faster because less
> data have
> to be written to disk. But it's true, many RAID5 controllers
> don't have
> enough CPU power.

I think it might not be just CPU-power of the controller. For RAID0+1
you just have two disc-I/O per write-access: writing to the original set
and the mirror-set. For RAID5 you have three additional
disc-I/O-processes: 1. Read the original data block, 2. read the parity
block (and calculate the new parity block, which is not a disk I/O), 3.
write the updated data block and 4. write the updated parity block. Thus
recommendations by IBM for DB/2 and several Oracle-consultants state
that RAID5 is the best compromise for storage vs. transaction speed, but
if your main concern is the latter, you're always best of with RAID0+1;
RAID0+1 does indeed always and reproducably have better write
performance that RAID0+1 and read-performance is almost always also
slightly better.

Re: Recomended FS

From
"Ben-Nes Michael"
Date:
what about mirroring only ? raid 1 ?

I always thought that raid 1 is the fastest, am I true ?

I don't really need more then 3GB data and I have two 36GB HD. so I don't
need lvl 0 nor lvl 5 unless raid 1 is slower.

--------------------------
Canaan Surfing Ltd.
Internet Service Providers
Ben-Nes Michael - Manager
Tel: 972-4-6991122
Fax: 972-4-6990098
http://www.canaan.net.il
--------------------------
----- Original Message -----
From: "Markus Wollny" <Markus.Wollny@computec.de>
To: <holger@marzen.de>
Cc: <pgsql-general@postgresql.org>
Sent: Tuesday, October 21, 2003 11:00 AM
Subject: Re: [GENERAL] Recomended FS


> Theory vs. real life. In Theory, RAID5 is faster because less
> data have
> to be written to disk. But it's true, many RAID5 controllers
> don't have
> enough CPU power.

I think it might not be just CPU-power of the controller. For RAID0+1
you just have two disc-I/O per write-access: writing to the original set
and the mirror-set. For RAID5 you have three additional
disc-I/O-processes: 1. Read the original data block, 2. read the parity
block (and calculate the new parity block, which is not a disk I/O), 3.
write the updated data block and 4. write the updated parity block. Thus
recommendations by IBM for DB/2 and several Oracle-consultants state
that RAID5 is the best compromise for storage vs. transaction speed, but
if your main concern is the latter, you're always best of with RAID0+1;
RAID0+1 does indeed always and reproducably have better write
performance that RAID0+1 and read-performance is almost always also
slightly better.

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

               http://archives.postgresql.org


Re: Recomended FS

From
Peter Childs
Date:

On Tue, 21 Oct 2003, Ben-Nes Michael wrote:

> what about mirroring only ? raid 1 ?
>
> I always thought that raid 1 is the fastest, am I true ?
>
> I don't really need more then 3GB data and I have two 36GB HD. so I don't
> need lvl 0 nor lvl 5 unless raid 1 is slower.

    Raid 1 should not be slower than raid 5. hence

Raid 0
Write = Deciede which disk, Write
Read = Deciede Which disk, Read

Raid 1
Write = Write Disk 1, Write Dist 2
Read = Read (Don't matter which one)

Raid 5
Write = Write Disk 1, Write Disk 2, Calc Check Sum, Write Disk 3
Read = Read Disk 1, Read Disk 2, Regenate Data.

Peter Childs

>
> --------------------------
> Canaan Surfing Ltd.
> Internet Service Providers
> Ben-Nes Michael - Manager
> Tel: 972-4-6991122
> Fax: 972-4-6990098
> http://www.canaan.net.il
> --------------------------
> ----- Original Message -----
> From: "Markus Wollny" <Markus.Wollny@computec.de>
> To: <holger@marzen.de>
> Cc: <pgsql-general@postgresql.org>
> Sent: Tuesday, October 21, 2003 11:00 AM
> Subject: Re: [GENERAL] Recomended FS
>
>
> > Theory vs. real life. In Theory, RAID5 is faster because less
> > data have
> > to be written to disk. But it's true, many RAID5 controllers
> > don't have
> > enough CPU power.
>
> I think it might not be just CPU-power of the controller. For RAID0+1
> you just have two disc-I/O per write-access: writing to the original set
> and the mirror-set. For RAID5 you have three additional
> disc-I/O-processes: 1. Read the original data block, 2. read the parity
> block (and calculate the new parity block, which is not a disk I/O), 3.
> write the updated data block and 4. write the updated parity block. Thus
> recommendations by IBM for DB/2 and several Oracle-consultants state
> that RAID5 is the best compromise for storage vs. transaction speed, but
> if your main concern is the latter, you're always best of with RAID0+1;
> RAID0+1 does indeed always and reproducably have better write
> performance that RAID0+1 and read-performance is almost always also
> slightly better.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>

Re: Recomended FS

From
Andrew Sullivan
Date:
On Tue, Oct 21, 2003 at 06:56:52PM +1300, Mark Kirkwood wrote:
> Some sort of ATA Raid is probably worth considering -
>
> e.g. I am experimenting with a system using 2 ATA-66 Seagates + 1
> Promise TX2000

We had some reasonably good luck with RAID on a 2-way Promise card,
but multi-disk ATA RAID has been a great disappointment.  If I were
doing it again, I'd buy 2 or 3 ATA controllers and do the RAID in
software.

That said, even the 2-way RAID became almost uselessly slow when
multiple queries were running -- indeed, dramatically slower than a
plain single IDE drive.  This is not at all the experience we have
with SCSI, so either the IDE RAID people haven't worked it all out,
or (more likely IMHO) there are limitations in IDE which make it
ill-suited to the access patterns of a database under multiple
simultaneous (divergent) queries.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Afilias Canada                        Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: Recomended FS

From
"James Moe"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 21 Oct 2003 18:56:52 +1300, Mark Kirkwood wrote:
>
>Note that even including the card, this is a very cheap setup.
>
  Yes, this is the single advantage of IDE vs SCSI. If the price of the storage system
is the *only* consideration, IDE is the way to go.
  SCSI has a long history of providing sustained throughput for server systems.
  IDE has a short history of providing very cheap storage for desktops.


- --
jimoe at sohnen-moe dot com
pgp/gpg public key: http://www.keyserver.net/en/
-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 5.0 OS/2 for non-commercial use
Comment: PGP 5.0 for OS/2
Charset: cp850

wj8DBQE/lT/NsxxMki0foKoRAvQMAKDKcQipqioww6aVc+kbCXUAdLtLUwCfe2Wo
Zdkcklqi45qBXpRsznne3QE=
=H19u
-----END PGP SIGNATURE-----



Re: Recomended FS

From
"scott.marlowe"
Date:
On Tue, 21 Oct 2003, Peter Childs wrote:

>
>
> On Tue, 21 Oct 2003, Ben-Nes Michael wrote:
>
> > what about mirroring only ? raid 1 ?
> >
> > I always thought that raid 1 is the fastest, am I true ?
> >
> > I don't really need more then 3GB data and I have two 36GB HD. so I don't
> > need lvl 0 nor lvl 5 unless raid 1 is slower.
>
>     Raid 1 should not be slower than raid 5. hence
>
> Raid 0
> Write = Deciede which disk, Write
> Read = Deciede Which disk, Read
>
> Raid 1
> Write = Write Disk 1, Write Dist 2
> Read = Read (Don't matter which one)
>
> Raid 5
> Write = Write Disk 1, Write Disk 2, Calc Check Sum, Write Disk 3
> Read = Read Disk 1, Read Disk 2, Regenate Data.

That's not quite right.

Raid 5
Write:
    Read Old Checksum Disk 1
    XOR old Checksum with new data
    Write Checksum to Disk 1
    Write Data to Disk 2
Read = Read Data from Disk 1.  That is all.

Raid 5 is lightening fast in a mostly read database (report databases) but
a little slower at writes, especially when there are only a few disks.
When the number of disks gets high enough to allow multiple reads and
writes to mostly hit different disks, you can get good parallel
performance until you saturate your bandwidth.


Re: Recomended FS

From
"scott.marlowe"
Date:
On Tue, 21 Oct 2003, Mark Kirkwood wrote:

> Some sort of ATA Raid is probably worth considering -
>
> e.g. I am experimenting with a system using 2 ATA-66 Seagates + 1
> Promise TX2000
>
> The disks themselves give fairly poor performance when attached to the
> std IDE channels :
>
> sequential write 15Mb/s
> sequential read 20Mb/s
>
> But attached to the Promise card using RAID 0 do considerably better:
>
> sequential write 52Mb/s
> sequential read 52MB/s
>
> Now you would probably not use RAID 0 for a "real" system (unless you
> had good backups), but the difference is interesting
>
> Note that even including the card, this is a very cheap setup.
>
> (I have not gotten around to testing random read and writes, but if
> anybody is interested I can test this and supply figures)

OK, but here's the real test.  As the postgres user, run 'pgbench -i',
then after that runs, run 'pgbench -c 50 -t 1000000'.  While it's running
and settled (pg aux|grep postgres|wc -l should show a number of ~54 or
so.) pull the plug. Wait for the hard drives to spin down, then plug it
back in and power it one.  With SCSI you will still have a coherent
database.

If you want a coherent database on IDE drives under postgresql you will
need to issue this command: 'hdparm -W0 /dev/hdx' where x is the letter of
the drives under the RAID array to turn off write caching.  This will slow
them to a crawl on writes.

And there's plenty of uses for RAID 0 in real systems, just not generally
in real 24/7 systems.  But for high speed batchs that might take a week to
run on a RAID5 but run in an hour on RAID0, that would be an acceptable
risk.  Think of machines that read in all their data off of a NAS, crunch
it, and dump it back out in flat files when they're done.

For things like that IDE drives and RAID 0 make a nice fit.  But don't put
the payroll on them.  :-)


Re: Recomended FS

From
Richard Ellis
Date:
On Tue, Oct 21, 2003 at 06:56:52PM +1300, Mark Kirkwood wrote:
> Some sort of ATA Raid is probably worth considering -
>
> e.g. I am experimenting with a system using 2 ATA-66 Seagates + 1
> Promise TX2000
> ...
> But attached to the Promise card using RAID 0 do considerably
> better:
>
> sequential write 52Mb/s
> sequential read 52MB/s
> ...
> Note that even including the card, this is a very cheap setup.

You may also want to consider the 3Ware IDE raid cards
(www.3ware.com).  Unlike the Promise card, they are full hardware
RAID with onboard CPU's to handle all the RAID work and offload that
from your main CPU in your PC.  They are a bit more expensive than
the Promise offerings, but when you consider than the larger cards do
RAID0/1/5 totally in hardware on the card, the price difference is
not so great afterall.

Some of their (3Ware's) larger cards allow you to attach up to 12 IDE
disks to the card as well as giving you hot swap capability.


Re: Recomended FS

From
"Joshua D. Drake"
Date:
Hello,

   Actually if you were to get off that Promise controller and on to a
3Ware or other "real" hardware RAID... you would probably
see even better performance.

Sincerely,

Joshua Drake


Mark Kirkwood wrote:

> Some sort of ATA Raid is probably worth considering -
>
> e.g. I am experimenting with a system using 2 ATA-66 Seagates + 1
> Promise TX2000
>
> The disks themselves give fairly poor performance when attached to the
> std IDE channels :
>
> sequential write 15Mb/s
> sequential read 20Mb/s
>
> But attached to the Promise card using RAID 0 do considerably better:
>
> sequential write 52Mb/s
> sequential read 52MB/s
>
> Now you would probably not use RAID 0 for a "real" system (unless you
> had good backups), but the difference is interesting
>
> Note that even including the card, this is a very cheap setup.
>
> (I have not gotten around to testing random read and writes, but if
> anybody is interested I can test this and supply figures)
>
> regards
>
> Mark
>
>
> Steve Crawford wrote:
>
>>
>>
>> Talk about timing...this article posted today seems quite apropos
>> (spoiler: SCSI beats IDE):
>>
>> http://hardware.devchannel.org/hardwarechannel/03/10/20/1953249.shtml?tid=20&tid=38&tid=49
>>
>>
>>
>>
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faqs/FAQ.html


--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-222-2783 - jd@commandprompt.com - http://www.commandprompt.com
Editor-N-Chief - PostgreSQl.Org - http://www.postgresql.org



Re: Recomended FS

From
Mark Kirkwood
Date:
Yes indeed - have come to that conclusion too (see other email)

Joshua D. Drake wrote:

> Hello,
>
>   Actually if you were to get off that Promise controller and on to a
> 3Ware or other "real" hardware RAID... you would probably
> see even better performance.
>
> Sincerely,
>
> Joshua Drake
>
>


Re: Recomended FS

From
Mark Kirkwood
Date:
I have found this as well -

I have a nice simple example of a program that loops and occasionally
writes a block to a file.
On a 2 cpu machine, running 2 of these processes in parallel takes twice
as long as running just 1 process!
However if I comment out the IO, then 2 processes takes the same elapsed
time as 1.

My conclusion is there exists some sort of "big" lock on access to the
ATA array.

I believe that 3ware have a non blocking implementation of ATA RAID -
I intend to sell the Promise and obtain a 3ware in the next month of so
and test this out.

regards

Mark

Andrew Sullivan wrote:

>That said, even the 2-way RAID became almost uselessly slow when
>multiple queries were running -- indeed, dramatically slower than a
>plain single IDE drive.
>
>


Re: Recomended FS

From
Mark Kirkwood
Date:
scott.marlowe wrote:

>
>OK, but here's the real test.  As the postgres user, run 'pgbench -i',
>then after that runs, run 'pgbench -c 50 -t 1000000'.  While it's running
>and settled (pg aux|grep postgres|wc -l should show a number of ~54 or
>so.) pull the plug. Wait for the hard drives to spin down, then plug it
>back in and power it one.  With SCSI you will still have a coherent
>database.
>
>
Agreed in principle -  pgbench is the most interesting test... for this
mailing list anyway :-).
However s = 1 makes a tiny database that fits into the file buffer cache
on most machines, which is not a very realistic situation.

 e.g. the Dell gets tps = 250 for s = 1 c = 5 t = 1000. This number
looks great but its not too much to do with IO....

I am happier about  s = 10 - 50 for machines with 512+ Mb of RAM.

 From memory the Dell gets tps = 36 for s = 10 c = 5 t = 100000. This
result seems more believable!


>If you want a coherent database on IDE drives under postgresql you will
>need to issue this command: 'hdparm -W0 /dev/hdx' where x is the letter of
>the drives under the RAID array to turn off write caching.  This will slow
>them to a crawl on writes.
>
>
I should have said that I was using Freebsd 4.8 with write caching off.
The question of whether the disk *actually* turned it off is the
significant issue, so yes, "use with care" should preface any comments
about IDE usage!

best wishes

Mark


Re: Recomended FS

From
"Joshua D. Drake"
Date:
> I believe that 3ware have a non blocking implementation of ATA RAID -
> I intend to sell the Promise and obtain a 3ware in the next month of
> so and test this out.


I use 3Ware exclusively for my ATA-RAID solutions. The nice thing about
them is that
they are REAL hardware RAID and the use the SCSI layer within Linux so
you address
them as a standard SCSI device.

Also their support is in the kernel... no wierd, experimental patching.

On a Dual 2000 Athlon MP I was able to sustain 50MB/sec over large
copys (4+ gigs). Very, Very happy with them.

Sincerely,

Joshua Drake




> regards
>
> Mark
>
> Andrew Sullivan wrote:
>
>> That said, even the 2-way RAID became almost uselessly slow when
>> multiple queries were running -- indeed, dramatically slower than a
>> plain single IDE drive.
>>
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)



--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-222-2783 - jd@commandprompt.com - http://www.commandprompt.com
Editor-N-Chief - PostgreSQl.Org - http://www.postgresql.org



Re: Recomended FS (correction)

From
Mark Kirkwood
Date:

Mark Kirkwood wrote:

> I should have said that I was using Freebsd 4.8 with write caching off.

write caching *on* - I got myself confused about what the value "1"
means....


Re: Recomended FS

From
"scott.marlowe"
Date:
On Thu, 23 Oct 2003, Mark Kirkwood wrote:

>
> scott.marlowe wrote:
>
> >
> >OK, but here's the real test.  As the postgres user, run 'pgbench -i',
> >then after that runs, run 'pgbench -c 50 -t 1000000'.  While it's running
> >and settled (pg aux|grep postgres|wc -l should show a number of ~54 or
> >so.) pull the plug. Wait for the hard drives to spin down, then plug it
> >back in and power it one.  With SCSI you will still have a coherent
> >database.
> >
> >
> Agreed in principle -  pgbench is the most interesting test... for this
> mailing list anyway :-).
> However s = 1 makes a tiny database that fits into the file buffer cache
> on most machines, which is not a very realistic situation.
>
>  e.g. the Dell gets tps = 250 for s = 1 c = 5 t = 1000. This number
> looks great but its not too much to do with IO....
>
> I am happier about  s = 10 - 50 for machines with 512+ Mb of RAM.
>
>  From memory the Dell gets tps = 36 for s = 10 c = 5 t = 100000. This
> result seems more believable!

You missed my point there.  I wasn't CARING what kind of numbers you get
back at all.  My point was that if you place the database under fairly
high transactional load, and pull the plug, is the database still coherent
when it comes back up.

I generally test with -s10 through -s50, but for this test it makes no
difference I can see, i.e. if the thing is gonna get scrammed at -s50,
it'll get scrammed at -s1 as well, and take less time to test.

> >If you want a coherent database on IDE drives under postgresql you will
> >need to issue this command: 'hdparm -W0 /dev/hdx' where x is the letter of
> >the drives under the RAID array to turn off write caching.  This will slow
> >them to a crawl on writes.
> >
> >
> I should have said that I was using Freebsd 4.8 with write caching off.
> The question of whether the disk *actually* turned it off is the
> significant issue, so yes, "use with care" should preface any comments
> about IDE usage!

-- NOTE in a correction Mark stated that caching was on, not off --

Assuming that the caching was on, I'm betting your database won't survive
a power plug pull in the middle of transactions like the test I put up
above.


Re: Recomended FS

From
"scott.marlowe"
Date:
On Wed, 22 Oct 2003, Joshua D. Drake wrote:

>
> > I believe that 3ware have a non blocking implementation of ATA RAID -
> > I intend to sell the Promise and obtain a 3ware in the next month of
> > so and test this out.
>
>
> I use 3Ware exclusively for my ATA-RAID solutions. The nice thing about
> them is that
> they are REAL hardware RAID and the use the SCSI layer within Linux so
> you address
> them as a standard SCSI device.
>
> Also their support is in the kernel... no wierd, experimental patching.
>
> On a Dual 2000 Athlon MP I was able to sustain 50MB/sec over large
> copys (4+ gigs). Very, Very happy with them.

Do they survive the power plug pulling test I was talking about elsewhere
in this thread?


Re: Recomended FS

From
Mark Kirkwood
Date:
Its worth checking - isn't it ?

I appeciate that you may have performed such tests previously - but as
hardware and software evolve its often worth repeating such tests (goes
away to do the suggested one tonight).

Note that I am not trying to argue away the issue about write caching -
it *has* to increase the risk of database corruption following a power
failure, however if your backups are regular and reliable this may be a
risk worth taking to achieve acceptable performance at a low price.

regards

Mark


scott.marlowe wrote:

>
>Assuming that the caching was on, I'm betting your database won't survive
>a power plug pull in the middle of transactions like the test I put up
>above.
>
>
>


Re: Recomended FS

From
Christopher Browne
Date:
rellis9@yahoo.com (Richard Ellis) wrote:
> Some of their (3Ware's) larger cards allow you to attach up to 12 IDE
> disks to the card as well as giving you hot swap capability.

This is all well and good, but may not sufficiently cover over the
Vital Problem with IDE drives, namely that they are likely to cache
writes and not tell the 3Ware controller about that.

It would doubtless be a slick thing to have an IDE RAID controller
with cache (that might well overcome some of the traditional problems
with IDE), but that only forcibly helps if you can turn off write
cacheing on the drives.
--
output = reverse("moc.enworbbc" "@" "enworbbc")
http://cbbrowne.com/info/advocacy.html
"What this list needs is a good five-dollar plasma weapon."
--paraphrased from `/usr/bin/fortune`

Re: Recomended FS

From
Bruce Momjian
Date:
Mark Kirkwood wrote:
> Its worth checking - isn't it ?
>
> I appeciate that you may have performed such tests previously - but as
> hardware and software evolve its often worth repeating such tests (goes
> away to do the suggested one tonight).
>
> Note that I am not trying to argue away the issue about write caching -
> it *has* to increase the risk of database corruption following a power
> failure, however if your backups are regular and reliable this may be a
> risk worth taking to achieve acceptable performance at a low price.

Sure, but how many people are taking that risk and not knowing it!

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Recomended FS

From
Mark Kirkwood
Date:
I suspect almost everyone using IDE drives -

We the "consumers" of this technology need to demand that the vendors:

1. Be honest about these limitations / bugs
2. Work to fix obvious bugs - e.g. drives lying about write cache status
need to have their behaviour changed as soon as possible.

In the meantime I guess all we can do is try to understand the issue and
raise awareness

regards

Mark

Bruce Momjian wrote:

>Mark Kirkwood wrote:
>
>
>>Its worth checking - isn't it ?
>>
>>I appeciate that you may have performed such tests previously - but as
>>hardware and software evolve its often worth repeating such tests (goes
>>away to do the suggested one tonight).
>>
>>Note that I am not trying to argue away the issue about write caching -
>>it *has* to increase the risk of database corruption following a power
>>failure, however if your backups are regular and reliable this may be a
>>risk worth taking to achieve acceptable performance at a low price.
>>
>>
>
>Sure, but how many people are taking that risk and not knowing it!
>
>
>


Re: Recomended FS

From
"scott.marlowe"
Date:
On Mon, 20 Oct 2003, Ben-Nes Michael wrote:

> ----- Original Message -----
> From: "Nick Burrett" <nick@dsvr.net>
> To: "Ben-Nes Michael" <miki@canaan.co.il>
> Cc: "postgresql" <pgsql-general@postgresql.org>
> Sent: Monday, October 20, 2003 2:54 PM
> Subject: Re: [GENERAL] Recomended FS
>
> > >>>But still the greatest question is what FS to put on ?
> > >>>
> > >>>I heard Reiesref can handle small files very quickly.
> > >>
> > >>Switching from ext3 to reiserfs for our name servers reduced the time
> > >>taken to load 110,000 zones from 45 minutes to 5 minutes.
> > >>
> > >>However for a database, I don't think you can really factor this type of
> > >>stuff into the equation.  The performance benefits you get from
> > >>different filesystem types are going to be small compared to the
> > >>modifications that you can make to your database structure, queries and
> > >>applications.  The actual algorithms used in processing the data will be
> > >>much slower than the time taken to fetch the data off disk.
> > >
> > >
> > > So you say the FS has no real speed impact on the SB ?
> > >
> > > In my pg data folder i have 2367 files, some big some small.
> >
> > I'm saying: don't expect your DB performance to come on leaps and bounds
> > just because you changed to a different filesystem format.  If you've
> > got speed problems then it might help to look elsewhere first.
> >
> I dont expect miracles :)
> but still i have to choose one,so why shouldnt i choose the one which best
> fit ?

I agree.  I also think that the top of that logic develoment tree you
should ask yourself the first question of

"Is it ok that if the machine should suffer sudden catastrophic shutdown
due to any reason that I would have a corrupted database and would be
willing to reinitdb/restore from scratch?"

While I agree that in many instances this is acceptable, in
many it is not.  If you may need it one day, SCSI is so much faster than
IDE when you turn off IDE's write cache that you now have a machine 1/2
as fast when you're on the IDE machine.

I pitted two systems against each other.

Machine A:   < Clone of our current production box
Dual PIII-750MHz
1.5 Gig PC133 memory
dual 18 gig 10Krpm USCSI 160 drives

Maching B:  < New machines intended to replace production box
Dual PIV Xeons-2.4GHz
2 Gig 400MHz memory
dual 80 gig 7200 RPM UDMA 133 drives

With two configs  (all fresh 'initdb --locale=C'):
and postgresql.conf: wal_sync_method = open_sync, buffers = 4000.

Config 1:
/db on one partition (on IDE this always had write cache on.)
/pg_xlog on another (write cache on or off (W0/W1))

Config 2:
everything on /db/ which is a RAID-1  (both with write cache on or off on
W0/W1 on IDE)  Allowed the software RAID-1 to replicate on both machines
before starting the tests.

With two possible IDE settings:

W0: Write cache off
W1: Write cache on

Note that W1 does not guarantee data integrity if power is lost while a
transaction is in progress (i.e. it's like running with fsync=false all
the time)

I ran pgbench -i -s 5 then pgbench -c 5 -t 1000 several times to
settle the machine, then ran pgbench -c 5 -t 1000 three times and chose
the median result of those three.

MachineA Config1:
141 tps

MachineB Config1 W0:
60 tps

MachineB Config1 W1:
112 tps

MachineA Config2:
101 tps

MachineB Config2 W0:
44 tps

MachineB Config2 W1:
135 tps

Just some numbers someone might find useful.  I'll try to test both setups
in the same box later on if I get a chance.  But it would seem that RAID
is performing better.  I've tested all these configurations with the "pull
the plug" test.  The SCSI survives in both configurations, while the IDE
will only survive uncorrupted when Write cache is off (W0).


Re: Recomended FS

From
Michael Teter
Date:
Here are some recent benchmarks on different Linux filesystems.  As with
any benchmarks, take what you will from the numbers.

Note the Summary section, and then the detailed benchmark numbers (if
you have a stomach for huge tables of pure numbers :)

http://fsbench.netnation.com/



scott.marlowe wrote:

> On Mon, 20 Oct 2003, Ben-Nes Michael wrote:
>
>
>>----- Original Message -----
>>From: "Nick Burrett" <nick@dsvr.net>
>>To: "Ben-Nes Michael" <miki@canaan.co.il>
>>Cc: "postgresql" <pgsql-general@postgresql.org>
>>Sent: Monday, October 20, 2003 2:54 PM
>>Subject: Re: [GENERAL] Recomended FS
>>
>>
>>>>>>But still the greatest question is what FS to put on ?
>>>>>>
>>>>>>I heard Reiesref can handle small files very quickly.
>>>>>
>>>>>Switching from ext3 to reiserfs for our name servers reduced the time
>>>>>taken to load 110,000 zones from 45 minutes to 5 minutes.
>>>>>
>>>>>However for a database, I don't think you can really factor this type of
>>>>>stuff into the equation.  The performance benefits you get from
>>>>>different filesystem types are going to be small compared to the
>>>>>modifications that you can make to your database structure, queries and
>>>>>applications.  The actual algorithms used in processing the data will be
>>>>>much slower than the time taken to fetch the data off disk.
>>>>
>>>>
>>>>So you say the FS has no real speed impact on the SB ?
>>>>
>>>>In my pg data folder i have 2367 files, some big some small.
>>>
>>>I'm saying: don't expect your DB performance to come on leaps and bounds
>>>just because you changed to a different filesystem format.  If you've
>>>got speed problems then it might help to look elsewhere first.
>>>
>>
>>I dont expect miracles :)
>>but still i have to choose one,so why shouldnt i choose the one which best
>>fit ?
>
>
> I agree.  I also think that the top of that logic develoment tree you
> should ask yourself the first question of
>
> "Is it ok that if the machine should suffer sudden catastrophic shutdown
> due to any reason that I would have a corrupted database and would be
> willing to reinitdb/restore from scratch?"
>
> While I agree that in many instances this is acceptable, in
> many it is not.  If you may need it one day, SCSI is so much faster than
> IDE when you turn off IDE's write cache that you now have a machine 1/2
> as fast when you're on the IDE machine.
>
> I pitted two systems against each other.
>
> Machine A:   < Clone of our current production box
> Dual PIII-750MHz
> 1.5 Gig PC133 memory
> dual 18 gig 10Krpm USCSI 160 drives
>
> Maching B:  < New machines intended to replace production box
> Dual PIV Xeons-2.4GHz
> 2 Gig 400MHz memory
> dual 80 gig 7200 RPM UDMA 133 drives
>
> With two configs  (all fresh 'initdb --locale=C'):
> and postgresql.conf: wal_sync_method = open_sync, buffers = 4000.
>
> Config 1:
> /db on one partition (on IDE this always had write cache on.)
> /pg_xlog on another (write cache on or off (W0/W1))
>
> Config 2:
> everything on /db/ which is a RAID-1  (both with write cache on or off on
> W0/W1 on IDE)  Allowed the software RAID-1 to replicate on both machines
> before starting the tests.
>
> With two possible IDE settings:
>
> W0: Write cache off
> W1: Write cache on
>
> Note that W1 does not guarantee data integrity if power is lost while a
> transaction is in progress (i.e. it's like running with fsync=false all
> the time)
>
> I ran pgbench -i -s 5 then pgbench -c 5 -t 1000 several times to
> settle the machine, then ran pgbench -c 5 -t 1000 three times and chose
> the median result of those three.
>
> MachineA Config1:
> 141 tps
>
> MachineB Config1 W0:
> 60 tps
>
> MachineB Config1 W1:
> 112 tps
>
> MachineA Config2:
> 101 tps
>
> MachineB Config2 W0:
> 44 tps
>
> MachineB Config2 W1:
> 135 tps
>
> Just some numbers someone might find useful.  I'll try to test both setups
> in the same box later on if I get a chance.  But it would seem that RAID
> is performing better.  I've tested all these configurations with the "pull
> the plug" test.  The SCSI survives in both configurations, while the IDE
> will only survive uncorrupted when Write cache is off (W0).
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>
>



Re: Recomended FS

From
"scott.marlowe"
Date:
On Fri, 24 Oct 2003, Michael Teter wrote:

> Here are some recent benchmarks on different Linux filesystems.  As with
> any benchmarks, take what you will from the numbers.
>
> Note the Summary section, and then the detailed benchmark numbers (if
> you have a stomach for huge tables of pure numbers :)
>
> http://fsbench.netnation.com/

Right, but NONE of the benchmarks I've seen have been with IDE drives with
their cache disabled, which is the only way to make them reliable under
postgresql should something bad happen.  but thanks for the benchmarks,
I'll look them over.


Re: Recomended FS

From
Scott Chapman
Date:
On Friday 24 October 2003 16:23, scott.marlowe wrote:
> Right, but NONE of the benchmarks I've seen have been with IDE drives with
> their cache disabled, which is the only way to make them reliable under
> postgresql should something bad happen.  but thanks for the benchmarks,
> I'll look them over.

I don't recall seeing anyone explain how to disable caching on a drive in this
thread.  Did I miss that?  'Would be useful.  I'm running a 3Ware mirror of 2
IDE drives.

Scott

Re: Recomended FS

From
Murthy Kambhampaty
Date:
This has been discussed on the XFS list. Basically, IIRC, you have to get a
drive tool like OnTrak, attach the drive via the IDE controller, disable the
cache, then reconnect it to the 3-ware controller (which does not include an
option to disable write caching; pester 3ware).



>-----Original Message-----
>From: Scott Chapman [mailto:scott_list@mischko.com]
>Sent: Friday, October 24, 2003 21:38
>To: scott.marlowe; Michael Teter
>Cc: postgresql
>Subject: Re: [GENERAL] Recomended FS
>
>
>On Friday 24 October 2003 16:23, scott.marlowe wrote:
>> Right, but NONE of the benchmarks I've seen have been with
>IDE drives with
>> their cache disabled, which is the only way to make them
>reliable under
>> postgresql should something bad happen.  but thanks for the
>benchmarks,
>> I'll look them over.
>
>I don't recall seeing anyone explain how to disable caching on
>a drive in this
>thread.  Did I miss that?  'Would be useful.  I'm running a
>3Ware mirror of 2
>IDE drives.
>
>Scott
>
>---------------------------(end of
>broadcast)---------------------------
>TIP 4: Don't 'kill -9' the postmaster
>

Re: Recomended FS

From
Bruno Wolff III
Date:
On Tue, Oct 21, 2003 at 11:42:50 +0200,
  Ben-Nes Michael <miki@canaan.co.il> wrote:
> what about mirroring only ? raid 1 ?
>
> I always thought that raid 1 is the fastest, am I true ?

If you have more than two disks than mirroring plus striping can be faster.

Re: Recomended FS

From
Mark Kirkwood
Date:
Got to going this today, after a small delay due to the arrival of new
disks,

So the system is  2x700Mhz PIII, 512 Mb, Promise TX2000, 2x40G ATA-133
Maxtor Diamond+8 .
The relevent software is Freebsd 4.8 and Postgresql 7.4 Beta 2.

Two runs of 'pgbench -c 50 -t 1000000 -s 10 bench' with a power cord
removal after about 2 minutes were performed, one with hw.ata.wc = 1
(write cache enabled) and other with hw.ata.wc = 0 (disabled).

In *both* cases the Pg server survived - i.e it came up, performed
automatic recovery. Subsequent 'vacuum full' and further runs of pgbench
completed with no issues.

I would conclude that it not *always* the case that power failure
renders the database unuseable.

I have just noticed a similar posting from Scott were he finds the cache
enabled case has an dead database after power failure. It seems that
it's a question of how *likely* is it that the database will survive/not
survive a power failure...

The other interesting possibility is that Freebsd with soft updates
helped things remain salvageable in the cache enabled case (as some
writes *must* be lost at power off in this case)....

regards

Mark

scott.marlowe wrote:

>
>OK, but here's the real test.  As the postgres user, run 'pgbench -i',
>then after that runs, run 'pgbench -c 50 -t 1000000'.  While it's running
>and settled (pg aux|grep postgres|wc -l should show a number of ~54 or
>so.) pull the plug. Wait for the hard drives to spin down, then plug it
>back in and power it one.  With SCSI you will still have a coherent
>database.
>
>
>


Re: Recomended FS

From
"James Moe"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sun, 26 Oct 2003 16:24:17 +1300, Mark Kirkwood wrote:

>I would conclude that it not *always* the case that power failure
>renders the database unuseable.
>
>I have just noticed a similar posting from Scott were he finds the cache
>enabled case has a dead database after power failure.
>
  Other posts have noted that SCSI never fails under this condition. Apparently SCSI
drives sense an impending power loss and flush the cache before power completely
disappears. Speed *and* reliability. Hm.
  Of course, anyone serious about a server would have it backed up with a UPS and
appropriate software to shut the system down during an extended power outage. This just
leaves people tripping over the power cords or maliciously pulling the plugs.


- --
jimoe at sohnen-moe dot com
pgp/gpg public key: http://www.keyserver.net/en/
-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 5.0 OS/2 for non-commercial use
Comment: PGP 5.0 for OS/2
Charset: cp850

wj8DBQE/m2PQsxxMki0foKoRAjsOAJ0ed1MV8FcWcALoxIJk66wn40EEvwCfVTPB
n/rxejkV2upgeZmoy3yipes=
=fDes
-----END PGP SIGNATURE-----



Re: Recomended FS

From
Martijn van Oosterhout
Date:
On Sat, Oct 25, 2003 at 11:04:00PM -0700, James Moe wrote:
>   Other posts have noted that SCSI never fails under this condition. Apparently SCSI
> drives sense an impending power loss and flush the cache before power completely
> disappears. Speed *and* reliability. Hm.

I understood it differently. Postgresql has WAL to deal with this situation.
This issue that it only works as long as the drive doesn't lie about which
blocks have been written and which are merely in cache. Apparently IDE disks
lie and SCSI disks don't. It may be a protocol thing.

The other alternative is battery backed memory. i.e. keep the blocks in
memory hoping that power will return to the drive before it fails. Some RAID
cards do this.

Another thing is that 3ware RAID controllers stick a SCSI interface in
front of the IDE drives, so perhaps it has more scope to deal with this
issue.

Remember, when power fails the first thing that happens is the system
cancels any DMA tranfer in progress as memory is the part most sensative to
power fluctuations.

>   Of course, anyone serious about a server would have it backed up with a UPS and
> appropriate software to shut the system down during an extended power outage. This just
> leaves people tripping over the power cords or maliciously pulling the plugs.

If you start adding up the points of failure it's quite a lot. But you
should be able to proof the system against even malicious tampering.
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> "All that is needed for the forces of evil to triumph is for enough good
> men to do nothing." - Edmond Burke
> "The penalty good people pay for not being interested in politics is to be
> governed by people worse than themselves." - Plato

Attachment

Re: Recomended FS

From
"Ben-Nes Michael"
Date:
Don't forget that the power supply can fail too, so its not all about UPS,
and cords.

--------------------------
Canaan Surfing Ltd.
Internet Service Providers
Ben-Nes Michael - Manager
Tel: 972-4-6991122
Fax: 972-4-6990098
http://www.canaan.net.il
--------------------------
----- Original Message -----
From: "James Moe" <jimoe@sohnen-moe.com>
To: "Postgresql General Mail List" <pgsql-general@postgresql.org>
Sent: Sunday, October 26, 2003 8:04 AM
Subject: Re: [GENERAL] Recomended FS


> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Sun, 26 Oct 2003 16:24:17 +1300, Mark Kirkwood wrote:
>
> >I would conclude that it not *always* the case that power failure
> >renders the database unuseable.
> >
> >I have just noticed a similar posting from Scott were he finds the cache
> >enabled case has a dead database after power failure.
> >
>   Other posts have noted that SCSI never fails under this condition.
Apparently SCSI
> drives sense an impending power loss and flush the cache before power
completely
> disappears. Speed *and* reliability. Hm.
>   Of course, anyone serious about a server would have it backed up with a
UPS and
> appropriate software to shut the system down during an extended power
outage. This just
> leaves people tripping over the power cords or maliciously pulling the
plugs.
>
>
> - --
> jimoe at sohnen-moe dot com
> pgp/gpg public key: http://www.keyserver.net/en/
> -----BEGIN PGP SIGNATURE-----
> Version: PGPfreeware 5.0 OS/2 for non-commercial use
> Comment: PGP 5.0 for OS/2
> Charset: cp850
>
> wj8DBQE/m2PQsxxMki0foKoRAjsOAJ0ed1MV8FcWcALoxIJk66wn40EEvwCfVTPB
> n/rxejkV2upgeZmoy3yipes=
> =fDes
> -----END PGP SIGNATURE-----
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html
>


Re: Recomended FS

From
Fernando Schapachnik
Date:
En un mensaje anterior, Scott Chapman escribió:
> I don't recall seeing anyone explain how to disable caching on a drive in this
> thread.  Did I miss that?  'Would be useful.  I'm running a 3Ware mirror of 2
> IDE drives.

In FreeBSD, add "hw.ata.wc=0" to /boot/loader.conf.

Regards.

Re: Recomended FS

From
"scott.marlowe"
Date:
On Sat, 25 Oct 2003, James Moe wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Sun, 26 Oct 2003 16:24:17 +1300, Mark Kirkwood wrote:
>
> >I would conclude that it not *always* the case that power failure
> >renders the database unuseable.
> >
> >I have just noticed a similar posting from Scott were he finds the cache
> >enabled case has a dead database after power failure.
> >
>   Other posts have noted that SCSI never fails under this condition. Apparently SCSI
> drives sense an impending power loss and flush the cache before power completely
> disappears. Speed *and* reliability. Hm.

Actually, it would appear that the SCSI drives simply don't lie about
fsync.  I.e. when they tell the OS that they wrote the data, they wrote
the data.  Some of them may have caching flushing with lying about fsync
built in, but the performance looks more like just good fsyncing to me.
It's all a guess without examining the microcode though... :-)

>   Of course, anyone serious about a server would have it backed up with a UPS and
> appropriate software to shut the system down during an extended power outage. This just
> leaves people tripping over the power cords or maliciously pulling the plugs.

Or a CPU frying, or a power supply dying, or a motherboard failure, or a
kernel panic, or any number of other possibilities.  Admittedly, the first
line of defense is always good backups, but it's nice knowing that if one
of my CPUs fry, I can pull it, put in the terminator / replacement, and my
whole machine will likely come back up.

But anyone serious about a server will also likely be running on SCSI as
well as on a UPS.  We use a hosting center with 3 UPS and a Diesel
generator, and we still managed to lose power about a year ago when one
UPS went haywire, browned out the circuits of the other two, and the
diesel generator's switch burnt out.  Millions of dollars worth of UPS /
high reliability equipment, and a $50 switch brought it all down.


Re: Recomended FS

From
"scott.marlowe"
Date:
On Sun, 26 Oct 2003, Mark Kirkwood wrote:

> Got to going this today, after a small delay due to the arrival of new
> disks,
>
> So the system is  2x700Mhz PIII, 512 Mb, Promise TX2000, 2x40G ATA-133
> Maxtor Diamond+8 .
> The relevent software is Freebsd 4.8 and Postgresql 7.4 Beta 2.
>
> Two runs of 'pgbench -c 50 -t 1000000 -s 10 bench' with a power cord
> removal after about 2 minutes were performed, one with hw.ata.wc = 1
> (write cache enabled) and other with hw.ata.wc = 0 (disabled).
>
> In *both* cases the Pg server survived - i.e it came up, performed
> automatic recovery. Subsequent 'vacuum full' and further runs of pgbench
> completed with no issues.

Sweet.  It may be that the promise is turning off the cache, or that the
new generation of IDE drives is finally reporting fsync correctly.  Was
there a performance difference in the set with write cache on or off?

> I would conclude that it not *always* the case that power failure
> renders the database unuseable.

But it usually is if write cache is enabled.

> I have just noticed a similar posting from Scott were he finds the cache
> enabled case has an dead database after power failure. It seems that
> it's a question of how *likely* is it that the database will survive/not
> survive a power failure...
>
> The other interesting possibility is that Freebsd with soft updates
> helped things remain salvageable in the cache enabled case (as some
> writes *must* be lost at power off in this case)....

Free BSD may be the reason here.  If it's softupdates are ordered in the
right way, it may be that even with write caching on, the drives "do the
right thing" under BSD.  Time to get out my 5.0 disks and start playing
with my test server.  Thanks for the test!


Re: Recomended FS

From
"scott.marlowe"
Date:
On Fri, 24 Oct 2003, Scott Chapman wrote:

> On Friday 24 October 2003 16:23, scott.marlowe wrote:
> > Right, but NONE of the benchmarks I've seen have been with IDE drives with
> > their cache disabled, which is the only way to make them reliable under
> > postgresql should something bad happen.  but thanks for the benchmarks,
> > I'll look them over.
>
> I don't recall seeing anyone explain how to disable caching on a drive in this
> thread.  Did I miss that?  'Would be useful.  I'm running a 3Ware mirror of 2
> IDE drives.
>
> Scott

Each OS has it's own methods, and some IDE RAID cards don't give you
direct access to the drives to enable / disable write cache.

On Linux you can disable write cache like so:

hdparm -W0 /dev/hda

back on:

hdparm -W1 /dev/hda


Re: Recomended FS

From
Greg Stark
Date:
"scott.marlowe" <scott.marlowe@ihs.com> writes:

> Sweet.  It may be that the promise is turning off the cache, or that the
> new generation of IDE drives is finally reporting fsync correctly.  Was
> there a performance difference in the set with write cache on or off?

Check out this thread. It seems the ATA standard does not include any way to
make fsync work properly without destroying performance. At least on linux
even that much is impossible without disabling caching entirely as the
operation required isn't exposed to user-space. There is some hope for the
future though.

http://www.ussg.iu.edu/hypermail/linux/kernel/0310.2/0163.html

> > The other interesting possibility is that Freebsd with soft updates
> > helped things remain salvageable in the cache enabled case (as some
> > writes *must* be lost at power off in this case)....
>
> Free BSD may be the reason here.  If it's softupdates are ordered in the
> right way, it may be that even with write caching on, the drives "do the
> right thing" under BSD.  Time to get out my 5.0 disks and start playing
> with my test server.  Thanks for the test!

I thought soft updates applied only to directory metadata changes.

--
greg

Re: Recomended FS

From
Bruce Momjian
Date:
Greg Stark wrote:
> "scott.marlowe" <scott.marlowe@ihs.com> writes:
>
> > Sweet.  It may be that the promise is turning off the cache, or that the
> > new generation of IDE drives is finally reporting fsync correctly.  Was
> > there a performance difference in the set with write cache on or off?
>
> Check out this thread. It seems the ATA standard does not include any way to
> make fsync work properly without destroying performance. At least on linux
> even that much is impossible without disabling caching entirely as the
> operation required isn't exposed to user-space. There is some hope for the
> future though.
>
> http://www.ussg.iu.edu/hypermail/linux/kernel/0310.2/0163.html

I thought the operating system has to write the block and force it to
disk, and that happened the same with SCSI and IDE.  I didn't assume the
drive would associate multiple blocks with the fsync.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Recomended FS

From
Greg Stark
Date:
"scott.marlowe" <scott.marlowe@ihs.com> writes:

> Or a CPU frying, or a power supply dying, or a motherboard failure, or a
> kernel panic, or any number of other possibilities.  Admittedly, the first
> line of defense is always good backups, but it's nice knowing that if one
> of my CPUs fry, I can pull it, put in the terminator / replacement, and my
> whole machine will likely come back up.

Well, note that in all of those cases the disk drive would still have a chance
to sync its buffers to disk. Linux isn't lying about fsync as far as its
buffers getting flushed, only the drive itself.

In theory even in those cases there's no guarantee of exactly how long the
drive will hold the buffers without committing them, but in practice I think
any sane drive will commit pretty damn soon or else normal power-off wouldn't
work.

--
greg

Re: Recomended FS

From
Mark Kirkwood
Date:

scott.marlowe wrote:

>Was there a performance difference in the set with write cache on or off?
>
>
Yes - just in the process of a little study concerning this - I will
post some preliminary results soon

cheers

Mark


Re: Recomended FS

From
Lynn.Tilby@asu.edu
Date:
Really solid microcode actually reads the sectors
just written and confirms the write at the hardware level
by comparing it with what is in the controller memory.
It then returns with a successfull confirmation or an error
if differences were detected.

Any data storage device controller, disk, memory stick, whatever
that does not follow this fundamental common sense protocol is
not reliable and should not be used, period!

Perhaps the IDE designers have folded to management pressure
and tried to make their drives "seem" faster by not taking
the time to actually confirm the write at the hardware level.
I don't know, but it looks like this may be a possiblity.

Lynn

Quoting "scott.marlowe" <scott.marlowe@ihs.com>:

> On Sat, 25 Oct 2003, James Moe wrote:
>
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > On Sun, 26 Oct 2003 16:24:17 +1300, Mark Kirkwood wrote:
> >
> > >I would conclude that it not *always* the case that power failure
> > >renders the database unuseable.
> > >
> > >I have just noticed a similar posting from Scott were he finds the
> cache
> > >enabled case has a dead database after power failure.
> > >
> >   Other posts have noted that SCSI never fails under this condition.
> Apparently SCSI
> > drives sense an impending power loss and flush the cache before power
> completely
> > disappears. Speed *and* reliability. Hm.
>
> Actually, it would appear that the SCSI drives simply don't lie about
> fsync.  I.e. when they tell the OS that they wrote the data, they wrote
>
> the data.  Some of them may have caching flushing with lying about fsync
>
> built in, but the performance looks more like just good fsyncing to me.
>
> It's all a guess without examining the microcode though... :-)
>
> >   Of course, anyone serious about a server would have it backed up
> with a UPS and
> > appropriate software to shut the system down during an extended power
> outage. This just
> > leaves people tripping over the power cords or maliciously pulling the
> plugs.
>
> Or a CPU frying, or a power supply dying, or a motherboard failure, or a
>
> kernel panic, or any number of other possibilities.  Admittedly, the
> first
> line of defense is always good backups, but it's nice knowing that if
> one
> of my CPUs fry, I can pull it, put in the terminator / replacement, and
> my
> whole machine will likely come back up.
>
> But anyone serious about a server will also likely be running on SCSI as
>
> well as on a UPS.  We use a hosting center with 3 UPS and a Diesel
> generator, and we still managed to lose power about a year ago when one
>
> UPS went haywire, browned out the circuits of the other two, and the
> diesel generator's switch burnt out.  Millions of dollars worth of UPS /
>
> high reliability equipment, and a $50 switch brought it all down.
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to
> majordomo@postgresql.org
>


Re: Recomended FS

From
Mark Kirkwood
Date:
Maybe it is a little late to be posting on this thread - but I was doing
pgbench runs with a Raid 0 ATA system and thought the results might be
interesting.

So here they are : pgbench -c 5 -t 1000 -s 5, median of 3 runs on a
Dual PIII 700 512Mb 2x7200 RPM ATA 133  Promise TX200
(same method / Pg configuration parameters as Scott's):

2 disk Raid0 W0
66 tps

2 disk Raid0 W1
220 tps

I was expecting a slightly better result for W0 (write caching off),
mind you the point could be made that you get about half the performance
of the SCSI system - for about half the price.

And the W1 result - that's fast, when (or if)  that little power saving
capacitor arrives for these drives we could see performance, reliability
*and* economy....

regards

Mark

scott.marlowe wrote:

>
>MachineA Config1:
>141 tps
>
>MachineB Config1 W0:
>60 tps
>
>MachineB Config1 W1:
>112 tps
>
>MachineA Config2:
>101 tps
>
>MachineB Config2 W0:
>44 tps
>
>MachineB Config2 W1:
>135 tps
>
>
>
>