Thread: SCSI disk: still the way to go?
Hi guys,
I have to update a Linux box with PostgreSQL on it, essentially for data warehousing purposes. I had set it up about 3 years ago and at that time the best solution I had been recommended was to use SCSI disks with hardware RAID controllers.
Is this still the way to go or things have recently changed? Any other suggestion/advice? What about SAN?
Thanks.
Cheers,
Riccardo
I have to update a Linux box with PostgreSQL on it, essentially for data warehousing purposes. I had set it up about 3 years ago and at that time the best solution I had been recommended was to use SCSI disks with hardware RAID controllers.
Is this still the way to go or things have recently changed? Any other suggestion/advice? What about SAN?
Thanks.
Cheers,
Riccardo
On Tue, 2006-05-30 at 16:28, Riccardo Inverni wrote: > Hi guys, > > I have to update a Linux box with PostgreSQL on it, essentially for > data warehousing purposes. I had set it up about 3 years ago and at > that time the best solution I had been recommended was to use SCSI > disks with hardware RAID controllers. > > Is this still the way to go or things have recently changed? Any > other suggestion/advice? What about SAN? Actually, modern SATA server drives are now considered competitive with the proper RAID controller. Nowadays most people seem to recommend the Areca controllers. I haven't used them myself, but would be happy to test them some day.
SAS and SATA will give you the best throughput for your array total. U320 is limited to 320MB/channel.
Alex
Alex
On 5/30/06, Scott Marlowe < smarlowe@g2switchworks.com> wrote:
On Tue, 2006-05-30 at 16:28, Riccardo Inverni wrote:
> Hi guys,
>
> I have to update a Linux box with PostgreSQL on it, essentially for
> data warehousing purposes. I had set it up about 3 years ago and at
> that time the best solution I had been recommended was to use SCSI
> disks with hardware RAID controllers.
>
> Is this still the way to go or things have recently changed? Any
> other suggestion/advice? What about SAN?
Actually, modern SATA server drives are now considered competitive with
the proper RAID controller.
Nowadays most people seem to recommend the Areca controllers. I haven't
used them myself, but would be happy to test them some day.
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org
How much money do you want to spend? If you don't care, SAN is probably the way to go. How much data do you have to store? If you can afford to fit it onto scsi, scsi probably is still the way to go. Otherwise, sata arrays have come a long way in 3 years, and they are by FAR the cheapest solution out there. Do some research and see if they're good enough for you. On Tue, 30 May 2006, Riccardo Inverni wrote: > Hi guys, > > I have to update a Linux box with PostgreSQL on it, essentially for data > warehousing purposes. I had set it up about 3 years ago and at that time the > best solution I had been recommended was to use SCSI disks with hardware > RAID controllers. > > Is this still the way to go or things have recently changed? Any other > suggestion/advice? What about SAN? > > Thanks. > > Cheers, > Riccardo >
Scott Marlowe wrote: > On Tue, 2006-05-30 at 16:28, Riccardo Inverni wrote: >> Hi guys, >> >> I have to update a Linux box with PostgreSQL on it, essentially for >> data warehousing purposes. I had set it up about 3 years ago and at >> that time the best solution I had been recommended was to use SCSI >> disks with hardware RAID controllers. >> >> Is this still the way to go or things have recently changed? Any >> other suggestion/advice? What about SAN? > > Actually, modern SATA server drives are now considered competitive with > the proper RAID controller. And for a DW application they are the most megabyte per dollar you can by. > > Nowadays most people seem to recommend the Areca controllers. I haven't > used them myself, but would be happy to test them some day. I have heard good things about the Areca, but I have never used them. I have had excellent luck with the LSI controllers however. Sincerely, Joshua D. Drake > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
Compare these two drives:
http://www.storagereview.com/php/benchmark/suite_v4.php?typeID=10&testbedID=4&osID=6&raidconfigID=1&numDrives=1&devID_0=279&devID_1=308&devCnt=2
Prices:
http://www.cdw.com/shop/products/default.aspx?EDC=984588 - SAS - ~$950
http://www.cdw.com/shop/products/default.aspx?EDC=912784 SATA - ~$320
For a third of the price you can have 90% of the throughput performance, which is probably where you will be most stressing your drives in a data warehouse.
I have only seen good benchmarks from LSI's MegaRAID controllers for SCSI in linux, I have seen good results from LSI, 3Ware (now AMCC) and Areca in Linux for their SATA products (in RAID 10). There are plenty of large drive number chasis out there with SATA hot swap bays if you want them. Tyan makes a great dual CPU board with two independant PCI-X buses. that will give 1066MB/sec total through put each which I have great benchmark number from.
it's possible to reach these numbers with SAN, but it will cost major major $$s. Each FC line in a SAN is typically 2Gb last time I checked, so you need multiple channels to acheive a max of 1066MB/sec throughput per PCI-X bus. If you run the numbers, you theoretically need 24 drives in a RAID 10 to get max throughput (Areca makes a 24 channel SATA card: http://www.newegg.com/Product/Product.asp?Item=N82E16816151004 - Although I couldn't find one with multilane support). I have seen chassis that can hold 40 drives. If you go for the 74Gig cousin that has similar throughput, which you can get OEM for $160/each you are talking just about $6400 in drives, plus about $4k for the chasis ( http://rackmountmart.stores.yahoo.net/rm8uracchasw.html), plus about $5k for other components (depending on RAM/CPU), so a massively kick ass whitebox can be had for about $16k that will acheive close to the maximum theoretical throughput acheivable in a single server for MB/sec.
Now there are arguments to be had about splitting up table spaces etc, but I present this as a concrete example of components that can be had for not alot of money to build a majorly kick ass server using SATA technology.
Alex
http://www.storagereview.com/php/benchmark/suite_v4.php?typeID=10&testbedID=4&osID=6&raidconfigID=1&numDrives=1&devID_0=279&devID_1=308&devCnt=2
Prices:
http://www.cdw.com/shop/products/default.aspx?EDC=984588 - SAS - ~$950
http://www.cdw.com/shop/products/default.aspx?EDC=912784 SATA - ~$320
For a third of the price you can have 90% of the throughput performance, which is probably where you will be most stressing your drives in a data warehouse.
I have only seen good benchmarks from LSI's MegaRAID controllers for SCSI in linux, I have seen good results from LSI, 3Ware (now AMCC) and Areca in Linux for their SATA products (in RAID 10). There are plenty of large drive number chasis out there with SATA hot swap bays if you want them. Tyan makes a great dual CPU board with two independant PCI-X buses. that will give 1066MB/sec total through put each which I have great benchmark number from.
it's possible to reach these numbers with SAN, but it will cost major major $$s. Each FC line in a SAN is typically 2Gb last time I checked, so you need multiple channels to acheive a max of 1066MB/sec throughput per PCI-X bus. If you run the numbers, you theoretically need 24 drives in a RAID 10 to get max throughput (Areca makes a 24 channel SATA card: http://www.newegg.com/Product/Product.asp?Item=N82E16816151004 - Although I couldn't find one with multilane support). I have seen chassis that can hold 40 drives. If you go for the 74Gig cousin that has similar throughput, which you can get OEM for $160/each you are talking just about $6400 in drives, plus about $4k for the chasis ( http://rackmountmart.stores.yahoo.net/rm8uracchasw.html), plus about $5k for other components (depending on RAM/CPU), so a massively kick ass whitebox can be had for about $16k that will acheive close to the maximum theoretical throughput acheivable in a single server for MB/sec.
Now there are arguments to be had about splitting up table spaces etc, but I present this as a concrete example of components that can be had for not alot of money to build a majorly kick ass server using SATA technology.
Alex
On 5/30/06, Ben <bench@silentmedia.com> wrote:
How much money do you want to spend? If you don't care, SAN is probably the way
to go.
How much data do you have to store? If you can afford to fit it onto scsi, scsi
probably is still the way to go.
Otherwise, sata arrays have come a long way in 3 years, and they are by FAR the
cheapest solution out there. Do some research and see if they're good enough for
you.
On Tue, 30 May 2006, Riccardo Inverni wrote:
> Hi guys,
>
> I have to update a Linux box with PostgreSQL on it, essentially for data
> warehousing purposes. I had set it up about 3 years ago and at that time the
> best solution I had been recommended was to use SCSI disks with hardware
> RAID controllers.
>
> Is this still the way to go or things have recently changed? Any other
> suggestion/advice? What about SAN?
>
> Thanks.
>
> Cheers,
> Riccardo
>
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
Hello,
3 weeks ago I install a PostgreSQL 8.1.3 server on Windows 2003 server standar edition.
The box is a NEC express5800 TM800 with 4 SATA/300 250 Gb 7200 rpm in RAID10 (0+1).
It works fine, faster than the old server ALTOS with SCSI-3 disks Ultra160 10000 rpm.
I upgraded to 8.1.4.
The embded RAID controler is based on an Intel chipset.
SATA seem convenient.
Luc
----- Original Message -----From: Riccardo InverniSent: Tuesday, May 30, 2006 11:28 PMSubject: [GENERAL] SCSI disk: still the way to go?Hi guys,
I have to update a Linux box with PostgreSQL on it, essentially for data warehousing purposes. I had set it up about 3 years ago and at that time the best solution I had been recommended was to use SCSI disks with hardware RAID controllers.
Is this still the way to go or things have recently changed? Any other suggestion/advice? What about SAN?
Thanks.
Cheers,
Riccardo
riccardo.inverni@gmail.com ("Riccardo Inverni") writes: > I have to update a Linux box with PostgreSQL on it, essentially > for data warehousing purposes. I had set it up about 3 years ago and > at that time the best solution I had been recommended was to use > SCSI disks with hardware RAID controllers. Is this still the way > to go or things have recently changed? Any other suggestion/advice? > What about SAN? You're probably better off with SATA, now. SCSI disks may individually be faster and more reliable than SATA disks, but you can probably get 3x as many SATA disks for the price of the SCSI disks, and 3x more *probably* makes up for the deficiences, given a good SATA host adapter. (Areca, 3Ware are all well regarded.) SAN doesn't change the question; you'll still hold much the same debate, whether to compose the SAN of SCSI or SATA disk, and the answers will be similar. The challenge you'll see on Linux is that Very Large Filesystems are somewhat novel. When we were trying to do DW stuff on Linux + Opteron + FibreChannel + EMC DiskArray, we too frequently found filesystems keeling over. It was neither cheap nor reliable. At some point, I want to try FreeBSD+Opteron+Areca+SATA Array, and see if that gives a better answer for this. I'm afraid I don't trust Linux for this sort of thing anymore :-(. -- let name="cbbrowne" and tld="ntlug.org" in name ^ "@" ^ tld;; http://cbbrowne.com/info/finances.html Rules of the Evil Overlord #32. "I will not fly into a rage and kill a messenger who brings me bad news just to illustrate how evil I really am. Good messengers are hard to come by." <http://www.eviloverlord.com/>
> > When we were trying to do DW stuff on Linux + Opteron + FibreChannel + > EMC DiskArray, we too frequently found filesystems keeling over. It > was neither cheap nor reliable. When is When? Not trying to start a flame war but I am curious as to your specifications to make this stuff work. Was it kernel 2.4 or 2.6? It 2.6, which? What filesystem are we talking about? Are we talking the last 12 months? Or earlier then that? Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
jd@commandprompt.com ("Joshua D. Drake") writes: >> When we were trying to do DW stuff on Linux + Opteron + FibreChannel + >> EMC DiskArray, we too frequently found filesystems keeling over. It >> was neither cheap nor reliable. > > When is When? Not trying to start a flame war but I am curious as to > your specifications to make this stuff work. Was it kernel 2.4 or 2.6? > It 2.6, which? What filesystem are we talking about? > > Are we talking the last 12 months? Or earlier then that? The project ended last fall, roughly speaking. And that was with kernel 2.6; 2.4 was a complete non-starter as far as Opteron was concerned. If memory serves, 2.6.13 was about the best option, but it turned out to be pretty easy to toast filesystems. When Josh presented last year at OSCON <http://conferences.oreillynet.com/cs/os2005/view/e_sess/6574>, he had a "sidebar" where he discussed the contortions of kernel versioning he had to go through in order to get PostgreSQL to play reasonably well with Opteron + Disk Array; it seemed quite similar to our experience, particularly in that he had to pick very specific kernel versions in order to get a modicum of stability. The trouble seems to be that what with the vast amounts of hacking Gitting into the Linux kernel, somewhere in between [FibreChannel Drivers | SCSI processing layer | VFS | FileSystems], things aren't anywhere near completely stable on AMD64. There's not one place to pin down: it's somewhere in the interfacing between all of these "layers." If you take out any of the "exotic" parts, things get better: - Opteron introduces 64 bittedness, and changes memory addressing over "plain old Intel." - "Everyone" runs ATA, so funky FibreChannel is exotic enough that it doesn't get used enough to get easily debugged. But for real high performance, you *want* 64 bits, and FibreChannel interfaces. And Linux just isn't ready for that. Nor is *BSD, I expect, for that matter, but they're more straightforward about documenting what *isn't* expected to work. -- (format nil "~S@~S" "cbbrowne" "acm.org") http://cbbrowne.com/info/wp.html If anyone ever markets a really well-documented Unix that doesn't require babysitting by a phalanx of provincial Unix clones, there'll be a lot of unemployable, twinky-braindamaged misfits out deservedly pounding the pavements.
On 5/31/06, Joshua D. Drake <jd@commandprompt.com> wrote: > > When we were trying to do DW stuff on Linux + Opteron + FibreChannel + > > EMC DiskArray, we too frequently found filesystems keeling over. It > > was neither cheap nor reliable. I completely agree with the above statements with a small objection. At our place we tried out a 70k$ SAN from a major vendor and hooked up all the 2g fibre cables only to find out the box could only do around 50mb/sec in real world bonnie++/dd tests. It was a huge mess...the performance team from the vendor couldn't do anything about it except to try and upsell us to the 200k$ product. All the time we could never get hard numbers about what the box was supposed to do, etc. Meanwhile the sales reps were lecturing us about 'enterprise this, enterprise that'....barf. now for the objection: I think you guys need to take a look Xyratex, specifically their FC attached SAS enclosure. It is dual 4gb fc and can hook up SAS or SATA drives. best of all, it's cost competitve with attached scsi for total system cost. The flexibilty of being able to hook up SATA or SAS is great. merlin
mmoncure@gmail.com ("Merlin Moncure") writes: > Xyratex From their web site, they sound like they'll be as challenging to get straight answers from as any of the other disk array vendors :-(. And there's nothing about what I see there that seems to address anything at all about the "instability hiding in there somewhere" problem. I see no reason at all for a Xyratex FC array to be the slightest bit more stable than any other vendor's product. The only reason I'd be interested is if I knew that their products were priced at some fraction substantially lower than 1/1 of the competing products from EMC, IBM, and such. And it seems pretty clear that that would involve the usual irritating sets of vendor visits and negotiations, which amount to, "No, it's not gonna be cheap." -- output = reverse("gro.gultn" "@" "enworbbc") http://cbbrowne.com/info/sap.html "Note that if I can get you to `su and say' something just by asking, you have a very serious security problem on your system and you should look into it." -- Paul Vixie, vixie-cron 3.0.1 installation notes
Hi Alex,
thanks for the answer (thanks to the other guys too!).
Is there a particular reason why you chose a SATA-150 drive? What about SATA-300?
Cheers,
Riccardo
thanks for the answer (thanks to the other guys too!).
Is there a particular reason why you chose a SATA-150 drive? What about SATA-300?
Cheers,
Riccardo
On 5/31/06, Chris Browne <cbbrowne@acm.org> wrote: > mmoncure@gmail.com ("Merlin Moncure") writes: > > Xyratex > > From their web site, they sound like they'll be as challenging to get > straight answers from as any of the other disk array vendors :-(. > valid concerns. I don't have an answer yet except to say that it is price competitive with attached scsi...much (much) cheaper than the major SAN vendors. Let's put it this way, we were quoted a price about half what a major san vendor charges for their 2gbit fc product with less cache. Also at 16 drives for 3u space its about as dense storage as you can get. They were willing (through their retailer) to set us up with a 30 day trial on the box. results to follow. merlin
Maximum througput of a single drive is around 80MB/second, a 300MB/sec interface won't change that.
Alex
Alex
On 6/1/06, Riccardo Inverni < riccardo.inverni@gmail.com> wrote:
Hi Alex,
thanks for the answer (thanks to the other guys too!).
Is there a particular reason why you chose a SATA-150 drive? What about SATA-300?
Cheers,
Riccardo