Thread: suggestions for postgresql setup on Dell 2950 , PERC6i controller
Hi, I am going to get a Dell 2950 with PERC6i with 8 * 73 15K SAS drives + 300 GB EMC SATA SAN STORAGE, I seek suggestions from users sharing their experience with similar hardware if any. I have following specific concerns. 1. On list i read that RAID10 function in PERC5 is not really striping but spanning and does not give performance boost is it still true in case of PERC6i ? 2. I am planning for RAID10 array of 8 drives for entrire database ( including pg_xlog) , the controller has a write back cache (256MB) is it a good idea ? or is it better to have 6 drives in HW RAID1 and RAID0 of 3 mirrors in s/w and leave 2 drives (raid1) for OS ? 3. Is there any preferred Stripe Size for RAID0 for postgresql usage ? 4. Although i would benchmark (with bonnie++) how would the EMC SATA SAN storage compare with locally attached SAS storage for the purpose of hosting the data , i am hiring the storage primarily for storing base base backups and log archives for PITR implementation. as retal of separate machine was higher than SATA SAN. Regds mallah.
On Wed, Feb 4, 2009 at 11:45 AM, Rajesh Kumar Mallah <mallah.rajesh@gmail.com> wrote: > Hi, > > I am going to get a Dell 2950 with PERC6i with > 8 * 73 15K SAS drives + > 300 GB EMC SATA SAN STORAGE, > > I seek suggestions from users sharing their experience with > similar hardware if any. I have following specific concerns. > > 1. On list i read that RAID10 function in PERC5 is not really > striping but spanning and does not give performance boost > is it still true in case of PERC6i ? I have little experience with the 6i. I do have experience with all the Percs from the 3i/3c series to the 5e series. My experience has taught me that a brand new, latest model $700 Dell RAID controller is about as good as a $150 LSI, Areca, or Escalade/3Ware controller. I.e. a four or five year old design. And that's being generous. > 2. I am planning for RAID10 array of 8 drives for entrire database > ( including pg_xlog) , the controller has a write back cache (256MB) > is it a good idea ? > or is it better to have 6 drives in HW RAID1 and RAID0 of 3 mirrors > in s/w and leave 2 drives (raid1) for OS ? Hard to say without testing. Some controllers work fine with all the drives in one big RAID 10 array, some don't. What I'd do is install the OS on a separate drive from the RAID controller, and start benchmarking the performance of your RAID controller with various configurations, like RAID-10, RAID-5 and RAID-6 (assuming it supports all three) and how it behaves when the array is degraded. You may well find that your machine is faster if you either run the controller in JBOD mode and do all the RAID in the kernel, or with a mix, with the RAID controller running a bunch of RAID-1 mirrors and the OS building a RAI(D)-0 on top of that. With larger arrays and busy dbs I usually always put the OS and pg_xlog on either a single mirror set or two different mirrorsets. Whether or not this will be faster for you depends greatly on your usage scenario, which I don't think you've mentioned. For transactional databases it's almost always a win to split out the pg_xlog from the main array. Unless you have a LOT of disks, a single RAID-1 pair is usually sufficient. > 3. Is there any preferred Stripe Size for RAID0 for postgresql usage ? You'll really have to test that with your controller, as on some it makes a difference to change it and on others, the default setting is as good as it ever gets. > 4. Although i would benchmark (with bonnie++) how would the EMC > SATA SAN storage compare with locally attached SAS storage for the > purpose of hosting the data , i am hiring the storage primarily for > storing base base backups and log archives for PITR implementation. > as retal of separate machine was higher than SATA SAN. That really depends on how the SAN is implemented I'd think. I only have a bit of experience with storage arrays, and that experience hasn't been all that great in terms of performance. -- When fascism comes to America, it will be the intolerant selling it as diversity.
Rajesh Kumar Mallah wrote: > Hi, > > I am going to get a Dell 2950 with PERC6i with > 8 * 73 15K SAS drives + > 300 GB EMC SATA SAN STORAGE, > > I seek suggestions from users sharing their experience with > similar hardware if any. I have following specific concerns. > > 1. On list i read that RAID10 function in PERC5 is not really > striping but spanning and does not give performance boost > is it still true in case of PERC6i ? It's long been our policy to buy Dell servers and I agree with most people here that the performance of the PERCs (5 and earlier) have been generally pretty poor However, they seem to have listened and got it right, or at least a lot better, with the PERC6. I have recently installed Ubuntu server on 2 Dell 2950s with 8GB RAM and six 2.5 inch 15K rpm SAS disks in a single RAID10. I only got chance to run bonnie++ on them a few times, but I was consistently getting around 200MB/sec for both sequential read and write (16GB file). Similar setup with the older Dell 2850 (PERC5, 6 x 15K rpm 3.5 inch SCSI) gave only around 120GB/sec whatever I did. Hope this helps. Cheers, Gary.
Re: suggestions for postgresql setup on Dell 2950 , PERC6i controller
From
Arjen van der Meijden
Date:
On 4-2-2009 21:09 Scott Marlowe wrote: > I have little experience with the 6i. I do have experience with all > the Percs from the 3i/3c series to the 5e series. My experience has > taught me that a brand new, latest model $700 Dell RAID controller is > about as good as a $150 LSI, Areca, or Escalade/3Ware controller. > I.e. a four or five year old design. And that's being generous. Afaik the Perc 5/i and /e are more or less rebranded LSI-cards (they're not identical in layout etc), so it would be a bit weird if they performed much less than the similar LSI's wouldn't you think? And as far as I can remember, our Perc 5/e actually performed similar to a LSI with similar specs (external sas, 256MB ram, etc) we had at the time of testing. Areca may be the fastest around right now, but if you'd like to get it all from one supplier, its not too bad to be stuck with Dell's perc 5 or 6 series. Best regards, Arjen
On Wed, Feb 4, 2009 at 2:11 PM, Arjen van der Meijden <acmmailing@tweakers.net> wrote: > On 4-2-2009 21:09 Scott Marlowe wrote: >> >> I have little experience with the 6i. I do have experience with all >> the Percs from the 3i/3c series to the 5e series. My experience has >> taught me that a brand new, latest model $700 Dell RAID controller is >> about as good as a $150 LSI, Areca, or Escalade/3Ware controller. >> I.e. a four or five year old design. And that's being generous. > > Afaik the Perc 5/i and /e are more or less rebranded LSI-cards (they're not > identical in layout etc), so it would be a bit weird if they performed much > less than the similar LSI's wouldn't you think? > And as far as I can remember, our Perc 5/e actually performed similar to a > LSI with similar specs (external sas, 256MB ram, etc) we had at the time of > testing. > Areca may be the fastest around right now, but if you'd like to get it all > from one supplier, its not too bad to be stuck with Dell's perc 5 or 6 > series. We purhcased the Perc 5E, which dell wanted $728 for last fall with 8 SATA disks in an MD-1000 and the performance is just terrible. No matter what we do the best throughput on any RAID setup was about 30 megs/second write and 60 Megs/second read. I can get that from a mirror set of the same drives under linux kernel software RAID. This was with battery backed cache enabled. Could be an interaction issue with the MD-1000, or something, but the numbers are just awful. We have a Perc 6(i or e not sure) on a 6 disk SAS array and it's a little better, getting into the hundred meg/second range, but nothing spectacular. They're stable, which is more than I can say for a lot of older PERCs and the servers they came in (x600 series with Perc 3i for instance).
Sorry for the top posts, I don’t have a client that is inline post friendly.
Most PERCs are rebranded LSI’s lately. The difference between the 5 and 6 is PCIX versus PCIe LSI series, relatively recent ones. Just look at the OpenSolaris drivers for the PERC cards for a clue to what is what.
Bonnie ++ is a horrible benchmark IMO (for server disk performance checks beyond very basic sanity). I’ve tried iozone, dd, fio, and manual shell script stuff...
Fio is very good, there’s one quirk with how it does random writes (sparsely) that can make XFS freak out, don’t test it with sparse random writes — Postgres doesn’t do this, it writes random re-writes and only appends to files to grow.
FIO is also good because you can make useful profiles, such as multiple concurrent readers of different types, or mix in some writes. A real postgres benchmark may be better, but some more sophisticated synthetic ones were able to show how far from the ideal the PERC falls with more sophisticated load than a better card.
My experience with 12 nearline-SAS 7200 RPM drives and a Perc 6e, then the same system with another card:
Ext3, out of the box, 12 drives raid 10: ~225MB/sec.
ext3, os readahead tuned up: 350MB/sec.
XFS, out of the box, 12 drives raid 10: ~300MB/sec.
Tune OS readahead (24576 or so), with xfs, 410MB/sec.
Higher Linux device readahead did not impact the random access performance, and the defaults are woeful for the PERC cards.
10 disk and 8 disk setups performed the same. PERC did not really scale past 8 disks in raid 10, I did not try 6 disks. Each disk can do 115MB/sec or so at the front of the drive with JBOD tests tuned with the right readahead Linux filesystem value.
All tests were done on the first 20% or so carved out, to limit the effects of transfer rate decrease on higher LBA’s and be fair between file systems (otherwise, ext3 looks worse, as it is more likely than xfs to allocate you some stuff way out on the disk in a somewhat empty partition).
Adaptec card (5445), untuned readahead, 500MB/sec +
Tuned readahead, 600MB/sec (and xfs now the same as dd, with 100GB files+), at the maximum expectation for this sort of raid 10 (that can’t use all drives for reading, like zfs).
I did not get much higher random IOPS out of smaller block sizes than the default. 15K SAS drives will be more likely to benefit from smaller blocks, but I don’t have experience with that on a PERC. General experience says that going below 64K on any setup is a waste of time with today’s hardware. Reading 64K takes less than 1ms.
Do not bother with the PERC BIOS’ read-ahead setting, it just makes things worse, the Linux block device readahead is far superior.
Best performance achieved on a set of 20 drives in my testing was to use two Adaptec cards, each with moderate sized raid 10 sets (adaptec 10 drives) and software linux ‘md’ raid 0 on top of that. It takes at least two concurrent sequential readers to max the I/O in this case, and 1000MB/sec to 1150MB/sec is the peak depending on the mix of sequential readers. In the real world, that only happens when writes are low and there are about 4 concurrent sequential scans on large (multi-GB) tables. Most people will be optimizing for much higher random access rates rather than sequential scans mixed with random access.
Placing the xlogs on a separate volume helped quite a bit in the real world postgres tests with mixed load.
On 2/4/09 12:09 PM, "Scott Marlowe" <scott.marlowe@gmail.com> wrote:
On Wed, Feb 4, 2009 at 11:45 AM, Rajesh Kumar Mallah
<mallah.rajesh@gmail.com> wrote:
> Hi,
>
> I am going to get a Dell 2950 with PERC6i with
> 8 * 73 15K SAS drives +
> 300 GB EMC SATA SAN STORAGE,
>
> I seek suggestions from users sharing their experience with
> similar hardware if any. I have following specific concerns.
>
> 1. On list i read that RAID10 function in PERC5 is not really
> striping but spanning and does not give performance boost
> is it still true in case of PERC6i ?
I have little experience with the 6i. I do have experience with all
the Percs from the 3i/3c series to the 5e series. My experience has
taught me that a brand new, latest model $700 Dell RAID controller is
about as good as a $150 LSI, Areca, or Escalade/3Ware controller.
I.e. a four or five year old design. And that's being generous.
Sorry for the top post --
Assuming Linux --
1: PERC 6 is still a bit inferior to other options, but not that bad. Its random IOPS is fine, sequential speeds are noticeably less than say the latest from Adaptec or Areca.
2: Random iops will probably scale ok from 6 to 8 drives, but depending on your use case, having a single mirror for OS and xlogs can be a significant performance improvement. My base suggestion would be to go with 6 drives in raid 10, perhaps try 3 mirrors with software raid 0 on top to compare, and leave one mirror for the OS and xlogs (separate partitions, use ext2 for the xlogs and that can be a fairly small partition). There isn’t any way to know which of these options will be best for you, it is very dependant on the data and applications accessing it.
3. No, its too hardware dependant to have an ideal raid block size. Its slightly usage dependant too. The default is probably going to be best based on my PERC 6 experience. You’ll gain a lot more from tuning other things.
4. Can’t say much about the SAN. High end ones can do good iops, the one listed looks more like archival storage to me though.
Make sure you tune the Linux block device readahead, it makes a huge difference in sequential access performance ( see blockdev —getra <device>). 1MB to 4MB per raid spindle ‘width’ is usually ideal. The default is 128k, the default with software raid is 4MB, often the performance difference when using software raid is largely just this setting. If you’re comfortable with XFS, it works well for the postgres data files.
On 2/4/09 10:45 AM, "Rajesh Kumar Mallah" <mallah.rajesh@gmail.com> wrote:
Assuming Linux --
1: PERC 6 is still a bit inferior to other options, but not that bad. Its random IOPS is fine, sequential speeds are noticeably less than say the latest from Adaptec or Areca.
2: Random iops will probably scale ok from 6 to 8 drives, but depending on your use case, having a single mirror for OS and xlogs can be a significant performance improvement. My base suggestion would be to go with 6 drives in raid 10, perhaps try 3 mirrors with software raid 0 on top to compare, and leave one mirror for the OS and xlogs (separate partitions, use ext2 for the xlogs and that can be a fairly small partition). There isn’t any way to know which of these options will be best for you, it is very dependant on the data and applications accessing it.
3. No, its too hardware dependant to have an ideal raid block size. Its slightly usage dependant too. The default is probably going to be best based on my PERC 6 experience. You’ll gain a lot more from tuning other things.
4. Can’t say much about the SAN. High end ones can do good iops, the one listed looks more like archival storage to me though.
Make sure you tune the Linux block device readahead, it makes a huge difference in sequential access performance ( see blockdev —getra <device>). 1MB to 4MB per raid spindle ‘width’ is usually ideal. The default is 128k, the default with software raid is 4MB, often the performance difference when using software raid is largely just this setting. If you’re comfortable with XFS, it works well for the postgres data files.
On 2/4/09 10:45 AM, "Rajesh Kumar Mallah" <mallah.rajesh@gmail.com> wrote:
Hi,
I am going to get a Dell 2950 with PERC6i with
8 * 73 15K SAS drives +
300 GB EMC SATA SAN STORAGE,
I seek suggestions from users sharing their experience with
similar hardware if any. I have following specific concerns.
1. On list i read that RAID10 function in PERC5 is not really
striping but spanning and does not give performance boost
is it still true in case of PERC6i ?
2. I am planning for RAID10 array of 8 drives for entrire database
( including pg_xlog) , the controller has a write back cache (256MB)
is it a good idea ?
or is it better to have 6 drives in HW RAID1 and RAID0 of 3 mirrors
in s/w and leave 2 drives (raid1) for OS ?
3. Is there any preferred Stripe Size for RAID0 for postgresql usage ?
4. Although i would benchmark (with bonnie++) how would the EMC
SATA SAN storage compare with locally attached SAS storage for the
purpose of hosting the data , i am hiring the storage primarily for
storing base base backups and log archives for PITR implementation.
as retal of separate machine was higher than SATA SAN.
Regds
mallah.
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Scott Carey wrote: Sorry for the top post --
Assuming Linux --
1: PERC 6 is still a bit inferior to other options, but not that bad. Its random IOPS is fine, sequential speeds are noticeably less than say the latest from Adaptec or Areca.
In the archives there was big thread about this very setup. is very similar to mine although i'm getting close to a year old
http://archives.postgresql.org/pgsql-performance/2008-03/thrd3.php#00264
Assuming Linux --
1: PERC 6 is still a bit inferior to other options, but not that bad. Its random IOPS is fine, sequential speeds are noticeably less than say the latest from Adaptec or Areca.
In the archives there was big thread about this very setup. is very similar to mine although i'm getting close to a year old
http://archives.postgresql.org/pgsql-performance/2008-03/thrd3.php#00264
Arjen van der Meijden wrote: > Afaik the Perc 5/i and /e are more or less rebranded LSI-cards (they're > not identical in layout etc), so it would be a bit weird if they > performed much less than the similar LSI's wouldn't you think? I've recently had to replace a PERC4/DC with the exact same card made by LSI (320-2) because the PERCs firmware was crippled. Its idea of RAID10 actually appears to be concatenated RAID1 arrays. Since replacing it and rebuilding the array on the LSI card, performance has been considerably better (14 disk SCSI shelf) > Areca may be the fastest around right now, but if you'd like to get it > all from one supplier, its not too bad to be stuck with Dell's perc 5 or > 6 series. The PERC6 isn't too bad, however it grinds to a halt when the IO queue gets large and it has the serious limitation of not supporting more than 8 spans, so trying to build a RAID10 array greater than 16 disks is pointless if you're not just after the extra capacity. Are there any reasonable choices for bigger (3+ shelf) direct-connected RAID10 arrays, or are hideously expensive SANs the only option? I've checked out the latest Areca controllers, but the manual available on their website states there's a limitation of 32 disks in an array...
--- On Thu, 5/2/09, Matt Burke <mattblists@icritical.com> wrote: > From: Matt Burke <mattblists@icritical.com> > Subject: Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller > To: pgsql-performance@postgresql.org > Date: Thursday, 5 February, 2009, 12:40 PM > Arjen van der Meijden wrote: > > > Afaik the Perc 5/i and /e are more or less rebranded > LSI-cards (they're > > not identical in layout etc), so it would be a bit > weird if they > > performed much less than the similar LSI's > wouldn't you think? > > I've recently had to replace a PERC4/DC with the exact > same card made by > LSI (320-2) because the PERCs firmware was crippled. Its > idea of RAID10 > actually appears to be concatenated RAID1 arrays. > Did you try flashing the PERC with the LSI firmware? I tried flashing a PERC3/dc with LSI firmware, it worked fine but I saw no difference in performance so I assumed it mustbe somethign else on the board that cripples it.
Matt Burke wrote: > Arjen van der Meijden wrote: > >> Afaik the Perc 5/i and /e are more or less rebranded LSI-cards (they're >> not identical in layout etc), so it would be a bit weird if they >> performed much less than the similar LSI's wouldn't you think? > > I've recently had to replace a PERC4/DC with the exact same card made by > LSI (320-2) because the PERCs firmware was crippled. Its idea of RAID10 > actually appears to be concatenated RAID1 arrays. > > Since replacing it and rebuilding the array on the LSI card, performance > has been considerably better (14 disk SCSI shelf) > >> Areca may be the fastest around right now, but if you'd like to get it >> all from one supplier, its not too bad to be stuck with Dell's perc 5 or >> 6 series. > > The PERC6 isn't too bad, however it grinds to a halt when the IO queue > gets large and it has the serious limitation of not supporting more than > 8 spans, so trying to build a RAID10 array greater than 16 disks is > pointless if you're not just after the extra capacity. > > Are there any reasonable choices for bigger (3+ shelf) direct-connected > RAID10 arrays, or are hideously expensive SANs the only option? I've > checked out the latest Areca controllers, but the manual available on > their website states there's a limitation of 32 disks in an array... In the context of RAID 10, what are the drawbacks of sticking several such controllers and use them only for hardware RAID 1 arraylets, running RAID 0 across them in software? You'd lose booting from the array but data safety should be about the same since the hardware is mirroring data, right?
Attachment
Glyn Astill wrote: > Did you try flashing the PERC with the LSI firmware? > > I tried flashing a PERC3/dc with LSI firmware, it worked fine but I > saw no difference in performance so I assumed it must be somethign > else on the board that cripples it. No, for a few reasons: 1. I read somewhere on the interwebs that doing so would brick the card 2. I don't have access to a DOS/Windows machine 3. Dodgy hardware isn't what you want when dealing with large databases If it's not just a firmware issue it wouldn't surprise me if you could just link a couple of pins/contacts/etc on the card and gain the LSIs capabilities, but it's not an idea I'd entertain outside of personal use... --
Scott Marlowe <scott.marlowe@gmail.com> writes: > We purhcased the Perc 5E, which dell wanted $728 for last fall with 8 > SATA disks in an MD-1000 and the performance is just terrible. No > matter what we do the best throughput on any RAID setup was about 30 > megs/second write and 60 Megs/second read. Is that sequential or a mix of random and sequential (It's too high to be purely random i/o)? A single consumer drive should be able to beat those numbers on sequential i/o. If it's a mix of random and sequential then performance will obviously depend on the mix. > I can get that from a mirror set of the same drives under linux kernel > software RAID. Why is that surprising? I would expect software raid to be able to handle 8 drives perfectly well assuming you had an controller and bus you aren't saturating. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning
On Thu, 2009-02-05 at 12:40 +0000, Matt Burke wrote: > Arjen van der Meijden wrote: > > Are there any reasonable choices for bigger (3+ shelf) direct-connected > RAID10 arrays, or are hideously expensive SANs the only option? I've > checked out the latest Areca controllers, but the manual available on > their website states there's a limitation of 32 disks in an array... > HP P800. -- PostgreSQL - XMPP: jdrake@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997
Re: suggestions for postgresql setup on Dell 2950 , PERC6i controller
From
Rajesh Kumar Mallah
Date:
On Thu, Feb 5, 2009 at 6:10 PM, Matt Burke <mattblists@icritical.com> wrote: > Arjen van der Meijden wrote: > >> Afaik the Perc 5/i and /e are more or less rebranded LSI-cards (they're >> not identical in layout etc), so it would be a bit weird if they >> performed much less than the similar LSI's wouldn't you think? > > I've recently had to replace a PERC4/DC with the exact same card made by > LSI (320-2) because the PERCs firmware was crippled. Its idea of RAID10 > actually appears to be concatenated RAID1 arrays. > > Since replacing it and rebuilding the array on the LSI card, performance > has been considerably better (14 disk SCSI shelf) > >> Areca may be the fastest around right now, but if you'd like to get it >> all from one supplier, its not too bad to be stuck with Dell's perc 5 or >> 6 series. > > The PERC6 isn't too bad, however it grinds to a halt when the IO queue > gets large and it has the serious limitation of not supporting more than > 8 spans, so trying to build a RAID10 array greater than 16 disks is > pointless if you're not just after the extra capacity. > > Are there any reasonable choices for bigger (3+ shelf) direct-connected > RAID10 arrays, or are hideously expensive SANs the only option? I've > checked out the latest Areca controllers, but the manual available on > their website states there's a limitation of 32 disks in an array... Where exactly is there limitation of 32 drives. the datasheet of 1680 states support upto 128drives using enclosures. regds rajesh kumar mallah. > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
On 2/5/09 4:40 AM, "Matt Burke" <mattblists@icritical.com> wrote:
Adaptec / Areca cards + Promise V-Trac J610S (for 3.5” drives, if total storage is your concern). Multiple cards if necessary and you want dual-path to each drive.
http://www.promise.com/product/product_detail_eng.asp?segment=undefined&product_id=190
http://www.promise.com/product/product_detail_eng.asp?segment=undefined&product_id=189
Using two of the former with two Adaptec cards (Software raid 0 on top of them) with great success.
There’s 2.5” drive ones too from other manufacturers ... The Promise one here scared me at first until I got confirmation from several experts actually using them in place of Dell MD1000 and HP SAS expander boxes because of higher device compatibility (Dells only works with PERC, etc) and reasonable cost.
You probably don’t want a single array with more than 32 drives anyway, its almost always better to start carving out chunks and using software raid 0 or 1 on top of that for various reasons. I wouldn’t put more than 16 drives in one array on any of these RAID cards, they’re just not optimized for really big arrays and tend to fade between 6 to 16 in one array, depending on the quality.
High quality SAS expander boxes compatible with good, non-proprietary RAID cards are not those from T1 vendors. The Promise above has a large compatibility list, since it uses ‘standard’ controller chips, etc. There are several others. See the Adaptec and Areca SAS expander compatibility lists. Dual redundant path to drives is nice.
You can do direct-attached storage to 100+ drives or more if you want. The price and manageability cost go up a lot if it gets too big however. Having global hot spare drives is critical. Not that the cost of using SAN’s and such is low... SAS expanders have made DAS with large arrays very accessible though.
Sun has some nice solutions here too, but like all T1 vendors the compatibility lists are smaller. Their RAID card they sell is an OEM’d Adaptec and performs nicely. The Sun 4150 with a direct-attached SAS storage makes a pretty good DB server. And yes, you can run Linux on it or Solaris or OpenSolaris or Windows or some BSD variants.
Are there any reasonable choices for bigger (3+ shelf) direct-connectedWhat I’m using currently:
RAID10 arrays, or are hideously expensive SANs the only option? I've
checked out the latest Areca controllers, but the manual available on
their website states there's a limitation of 32 disks in an array...
Adaptec / Areca cards + Promise V-Trac J610S (for 3.5” drives, if total storage is your concern). Multiple cards if necessary and you want dual-path to each drive.
http://www.promise.com/product/product_detail_eng.asp?segment=undefined&product_id=190
http://www.promise.com/product/product_detail_eng.asp?segment=undefined&product_id=189
Using two of the former with two Adaptec cards (Software raid 0 on top of them) with great success.
There’s 2.5” drive ones too from other manufacturers ... The Promise one here scared me at first until I got confirmation from several experts actually using them in place of Dell MD1000 and HP SAS expander boxes because of higher device compatibility (Dells only works with PERC, etc) and reasonable cost.
You probably don’t want a single array with more than 32 drives anyway, its almost always better to start carving out chunks and using software raid 0 or 1 on top of that for various reasons. I wouldn’t put more than 16 drives in one array on any of these RAID cards, they’re just not optimized for really big arrays and tend to fade between 6 to 16 in one array, depending on the quality.
High quality SAS expander boxes compatible with good, non-proprietary RAID cards are not those from T1 vendors. The Promise above has a large compatibility list, since it uses ‘standard’ controller chips, etc. There are several others. See the Adaptec and Areca SAS expander compatibility lists. Dual redundant path to drives is nice.
You can do direct-attached storage to 100+ drives or more if you want. The price and manageability cost go up a lot if it gets too big however. Having global hot spare drives is critical. Not that the cost of using SAN’s and such is low... SAS expanders have made DAS with large arrays very accessible though.
Sun has some nice solutions here too, but like all T1 vendors the compatibility lists are smaller. Their RAID card they sell is an OEM’d Adaptec and performs nicely. The Sun 4150 with a direct-attached SAS storage makes a pretty good DB server. And yes, you can run Linux on it or Solaris or OpenSolaris or Windows or some BSD variants.
Rajesh Kumar Mallah wrote: >> I've checked out the latest Areca controllers, but the manual >> available on their website states there's a limitation of 32 disks >> in an array... > > Where exactly is there limitation of 32 drives. the datasheet of > 1680 states support upto 128drives using enclosures. The 1680 manual: http://www.areca.us//support/download/RaidCards/Documents/Manual_Spec/SAS_Manual.zip Page 25: > Note: > > 1. The maximum no. is 32 disk drived included in a single RAID set Page 49: > 1. Up to 32 disk drives can be included in a single RAID set. > 2. Up to 8 RAID sets can be created per controller (point 2 meaning you can't do s/w RAID over umpteen h/w RAID1 pairs) Page 50: > To create RAID 30/50/60 volume, you need create multiple RAID sets > first with the same disk members on each RAID set. The max no. disk > drives per volume set: 32 for RAID 0/1/10/3/5/6 and 128 for RAID > 30/50/60. ...and a few more times saying the same thing --
Scott Carey wrote: > You probably don’t want a single array with more than 32 drives anyway, > its almost always better to start carving out chunks and using software > raid 0 or 1 on top of that for various reasons. I wouldn’t put more than > 16 drives in one array on any of these RAID cards, they’re just not > optimized for really big arrays and tend to fade between 6 to 16 in one > array, depending on the quality. This is what I'm looking at now. The server I'm working on at the moment currently has a PERC6/e and 3xMD1000s which needs to be tested in a few setups. I need to code a benchmarker yet (I haven't found one yet that can come close to replicating our DB usage patterns), but I intend to try: 1. 3x h/w RAID10 (one per shelf), sofware RAID0 2. lots x h/w RAID1, software RAID0 if the PERC will let me create enough arrays 3. Pure s/w RAID10 if I can convince the PERC to let the OS see the disks 4. 2x h/w RAID30, software RAID0 I'm not holding much hope out for the last one :) I'm just glad work on a rewrite of my inherited backend systems should start soon; get rid of the multi-TB MySQL hell and move to a distributed PG setup on dirt cheap Dell R200s/blades > You can do direct-attached storage to 100+ drives or more if you want. > The price and manageability cost go up a lot if it gets too big > however. Having global hot spare drives is critical. Not that the cost > of using SAN’s and such is low... SAS expanders have made DAS with > large arrays very accessible though. For large storage arrays (RAID60 or similar) you can't beat a RAID controller and disk shelf(s), especially if you keep the raidsets small and use cheap ludicrous capacity SATA disks You just need to be aware that performance doesn't scale well/easily over 1-2 shelves on the things --
On Fri, Feb 6, 2009 at 2:04 AM, Matt Burke <mattblists@icritical.com> wrote: > Scott Carey wrote: >> You probably don't want a single array with more than 32 drives anyway, >> its almost always better to start carving out chunks and using software >> raid 0 or 1 on top of that for various reasons. I wouldn't put more than >> 16 drives in one array on any of these RAID cards, they're just not >> optimized for really big arrays and tend to fade between 6 to 16 in one >> array, depending on the quality. > > This is what I'm looking at now. The server I'm working on at the moment > currently has a PERC6/e and 3xMD1000s which needs to be tested in a few > setups. I need to code a benchmarker yet (I haven't found one yet that > can come close to replicating our DB usage patterns), but I intend to try: > > 1. 3x h/w RAID10 (one per shelf), sofware RAID0 Should work pretty well. > 2. lots x h/w RAID1, software RAID0 if the PERC will let me create > enough arrays I don't recall the max number arrays. I'm betting it's less than that. > 3. Pure s/w RAID10 if I can convince the PERC to let the OS see the disks Look for JBOD mode. > 4. 2x h/w RAID30, software RAID0 > > I'm not holding much hope out for the last one :) Me either. :)
Matt Burke wrote: > Scott Carey wrote: > > You probably don?t want a single array with more than 32 drives anyway, > > its almost always better to start carving out chunks and using software > > raid 0 or 1 on top of that for various reasons. I wouldn?t put more than > > 16 drives in one array on any of these RAID cards, they?re just not > > optimized for really big arrays and tend to fade between 6 to 16 in one > > array, depending on the quality. > > This is what I'm looking at now. The server I'm working on at the moment > currently has a PERC6/e and 3xMD1000s which needs to be tested in a few > setups. I need to code a benchmarker yet (I haven't found one yet that > can come close to replicating our DB usage patterns), but I intend to try: Stupid question, but why do people bother with the Perc line of cards if the LSI brand is better? It seems the headache of trying to get the Perc cards to perform is not worth any money saved. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
--- On Fri, 6/2/09, Bruce Momjian <bruce@momjian.us> wrote: > Stupid question, but why do people bother with the Perc > line of cards if > the LSI brand is better? It seems the headache of trying > to get the > Perc cards to perform is not worth any money saved. I think in most cases the dell cards actually cost more, people end up stuck with them because they come bundled with theirservers - they find out too late that they've got a lemon. Up until recently those in charge of buying hardware where I work insisted everything be supplied from dell. Fortunatelythat policy is no more; I have enough paperweights.
Glyn Astill wrote: >> Stupid question, but why do people bother with the Perc line of >> cards if the LSI brand is better? It seems the headache of trying >> to get the Perc cards to perform is not worth any money saved. > > I think in most cases the dell cards actually cost more, people end > up stuck with them because they come bundled with their servers - > they find out too late that they've got a lemon. That's what's been happening with me... The fact Dell prices can have a fair bit of downward movement when you get the account manager on the phone makes them especially attractive to the people controlling the purse strings. The biggest reason for me however is the lack of comparative reviews. I struggled to get the LSI card to replace the PERC3 because all I had to go on was qualitative mailing list/forum posts from strangers. The only way I got it was to make the argument that other than trying the LSI, we'd have no choice other than replacing the server+shelf+disks. I want to see just how much better a high-end Areca/Adaptec controller is, but I just don't think I can get approval for a £1000 card "because some guy on the internet said the PERC sucks". Would that same person say it sucked if it came in Areca packaging? Am I listening to the advice of a professional, or a fanboy?
Matt Burke wrote: > Glyn Astill wrote: > >> Stupid question, but why do people bother with the Perc line of > >> cards if the LSI brand is better? It seems the headache of trying > >> to get the Perc cards to perform is not worth any money saved. > > > > I think in most cases the dell cards actually cost more, people end > > up stuck with them because they come bundled with their servers - > > they find out too late that they've got a lemon. > > That's what's been happening with me... The fact Dell prices can have a > fair bit of downward movement when you get the account manager on the > phone makes them especially attractive to the people controlling the > purse strings. > > The biggest reason for me however is the lack of comparative reviews. I > struggled to get the LSI card to replace the PERC3 because all I had to > go on was qualitative mailing list/forum posts from strangers. The only > way I got it was to make the argument that other than trying the LSI, > we'd have no choice other than replacing the server+shelf+disks. > > I want to see just how much better a high-end Areca/Adaptec controller > is, but I just don't think I can get approval for a ?1000 card "because > some guy on the internet said the PERC sucks". Would that same person > say it sucked if it came in Areca packaging? Am I listening to the > advice of a professional, or a fanboy? The experiences I have heard is that Dell looks at server hardware in the same way they look at their consumer gear, "If I put in a cheaper part, how much will it cost Dell to warranty replace it". Sorry, but I don't look at my performance or downtime in the same way Dell does. ;-) -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: <blockquote cite="mid:200902061527.n16FREp25859@momjian.us" type="cite"><pre wrap="">Matt Burke wrote:</pre><blockquote type="cite"><pre wrap=""> we'd have no choice other than replacing the server+shelf+disks. I want to see just how much better a high-end Areca/Adaptec controller is, but I just don't think I can get approval for a ?1000 card "because some guy on the internet said the PERC sucks". Would that same person say it sucked if it came in Areca packaging? Am I listening to the advice of a professional, or a fanboy? </pre></blockquote><pre wrap=""> The experiences I have heard is that Dell looks at server hardware in the same way they look at their consumer gear, "If I put in a cheaper part, how much will it cost Dell to warranty replace it". Sorry, but I don't look at my performance or downtime in the same way Dell does. ;-) </pre></blockquote> It always boils down to money. To communicate to the ones controlling the purse strings talk dollar bills. To get what one wants from the pursestring holders give examples like this. <br /> Buying cheap hardware can result in a complete shut down resulting inlost sales and/or non productive labor being spent. <br /><br /> Example would be a company generates 100 sales ordersan hour average $100 = $10,000 if the server is down for 8 hours 1 business day thats $80,000 lost in business. nowlets throw in labor average hourly rate lets say $15.00 an hour for 10 people = $150.00 for 8 hours = $1200 in lost labor. Now throw in overtime to get caught up $1800 total labor cost = $3000<br /><br /> The $200 to $300 saved on thecard was a good decision :-(<br /><br /> Now the argument can be made hardware failures are low so that goes out thedoor<br /><br /> Your next best argument is showing the waste in lost productivity. Lets say because of the cheap hardwarepurchased the users must sit idle 3 seconds per transactions times 100 transactions per day = 300 seconds lostX 10 people = 3000 Seconds per day X 235 working days = 705000/60/60 = 196 hours lost per year times 3years for averagelife span of the server = 588 hours X average pay rate $15 = $8820.00 lost labor<br /><br /> Again smart thinking. <br /><br /> There are all kind of ways to win these arguments to push for higher quality hardware.
On Fri, Feb 6, 2009 at 8:19 AM, Matt Burke <mattblists@icritical.com> wrote: > Glyn Astill wrote: >>> Stupid question, but why do people bother with the Perc line of >>> cards if the LSI brand is better? It seems the headache of trying >>> to get the Perc cards to perform is not worth any money saved. >> >> I think in most cases the dell cards actually cost more, people end >> up stuck with them because they come bundled with their servers - >> they find out too late that they've got a lemon. > > That's what's been happening with me... The fact Dell prices can have a > fair bit of downward movement when you get the account manager on the > phone makes them especially attractive to the people controlling the > purse strings. > > The biggest reason for me however is the lack of comparative reviews. I > struggled to get the LSI card to replace the PERC3 because all I had to > go on was qualitative mailing list/forum posts from strangers. The only > way I got it was to make the argument that other than trying the LSI, > we'd have no choice other than replacing the server+shelf+disks. > > I want to see just how much better a high-end Areca/Adaptec controller > is, but I just don't think I can get approval for a £1000 card "because > some guy on the internet said the PERC sucks". Would that same person > say it sucked if it came in Areca packaging? Am I listening to the > advice of a professional, or a fanboy? The best reviews I've seen have been on Tweakers first, then tomshardware. I am both a professional who recommends the Areca 16xx series and a bit of a fanboy, mainly because they saved out bacon this last year, compared to the crapware we'd have had to buy from Dell at twice to price to come even close to it in performance. A $11.5k white box with the Areac is a match for over $20k worth of Dell hardware, and it just works. Can you get an evaluation unit from a supplier?
Re: suggestions for postgresql setup on Dell 2950 , PERC6i controller
From
Arjen van der Meijden
Date:
On 4-2-2009 22:36 Scott Marlowe wrote: > We purhcased the Perc 5E, which dell wanted $728 for last fall with 8 > SATA disks in an MD-1000 and the performance is just terrible. No > matter what we do the best throughput on any RAID setup was about 30 > megs/second write and 60 Megs/second read. I can get that from a > mirror set of the same drives under linux kernel software RAID. This > was with battery backed cache enabled. Could be an interaction issue > with the MD-1000, or something, but the numbers are just awful. We > have a Perc 6(i or e not sure) on a 6 disk SAS array and it's a little > better, getting into the hundred meg/second range, but nothing > spectacular. They're stable, which is more than I can say for a lot > of older PERCs and the servers they came in (x600 series with Perc 3i > for instance). When we purchased our Perc 5/e with MD1000 filled with 15 15k rpm sas disks, my colleague actually spend some time benchmarking the PERC and a ICP Vortex (basically a overclocked Adaptec) on those drives. Unfortunately he doesn't have too many comparable results, but it basically boiled down to quite good scores for the PERC and a bit less for the ICP Vortex. IOMeter sequential reads are above 300MB/s for the RAID5 and above 240MB/s for a RAID10 (and winbench99 versions range from 400+ to 600+MB/s). The results for a 10, 12 and to 14 disk configuration also showed nice increments in performance. So we've based our purchase on my colleague's earlier bad experience with Adaptec (much worse results than LSI) and weren't dissapointed by Dell's scores. I have no idea whether Adaptec's results have increased over time, unfortunately we haven't had a larger scale disk IO-benchmark for quite some time. If you're able to understand Dutch, you can click around here: http://tweakers.net/benchdb/test/90 Best regards, Arjen
PERC 6 does not have JBOD mode exposed. Dell disables the feature from the LSI firmware in their customization.
> 3. Pure s/w RAID10 if I can convince the PERC to let the OS see the disks
Look for JBOD mode.
However, I have been told that you can convince them to tell you the ‘secret handshake’ or whatever that allows JBOD to be enabled. The more adventurous flash the card with the LSI firmware, though I’m sure that voids all sorts of support from DELL.
Re: suggestions for postgresql setup on Dell 2950 , PERC6i controller
From
Arjen van der Meijden
Date:
On 6-2-2009 16:27 Bruce Momjian wrote: > The experiences I have heard is that Dell looks at server hardware in > the same way they look at their consumer gear, "If I put in a cheaper > part, how much will it cost Dell to warranty replace it". Sorry, but I > don't look at my performance or downtime in the same way Dell does. ;-) I'm pretty sure all major server-suppliers will have some form of risk-analysis for their servers, especially in the high volume x86 market where most servers are replaced in three years time anyway. And although Dell's image for quality hardware isn't too good, the servers we have from them all reached high uptimes before we did hardware unrelated reboots. Our Dell-desktops/workstations have seen a bit more support-technician's though, so we're not becoming fanboys any time soon ;-) They seem to be much more serious on quality for their servers compared to the other stuff. Best regards, Arjen
On Fri, 6 Feb 2009, Bruce Momjian wrote: > Stupid question, but why do people bother with the Perc line of cards if > the LSI brand is better? Because when you're ordering a Dell server, all you do is click a little box and you get a PERC card with it. There aren't that many places that carry the LSI cards either, so most people are looking at "get the PERC as part of a supported package from Dell" vs. "have one rogue component I bought from random reseller mixed in, complicating all future support calls". The failure in this logic is assuming that "support" from Dell includes making sure the box actually performs well. If it works at any speed, that's good enough for Dell. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On 2/6/09 9:53 AM, "Arjen van der Meijden" <acmmailing@tweakers.net> wrote:
I have no idea if the Vortex referred to correlates with this newer generation card or the old generation. But if it was in relation to a Perc 5, which is also an older (PCI-X, not PCIe) generation, then I’m not sure how much this relates to the Perc 6 and new Adaptecs or Areca 16xxx series which are all PCIe.
One manufacturer may be good in one generation and stink in another. For those long established providers, every generation tends to yield new winners and losers. Some are more often on the top or the bottom, but it wouldn’t be a total shocker a recent low performer like LSI’s or 3Ware had a next generation product that ended up on or near the top, or if the next PERC is one of the better ones. What is more consistent between generations is the management software and recovery process.
Adaptec’s previous generation stuff was not very good. Its the ‘5IE5’ (I = internal port count, E = external) series that is much improved, and very much like the Areca 16xxx series. Maybe not as good — I haven’t seen a head to head on those two but they are built very similarly.
When we purchased our Perc 5/e with MD1000 filled with 15 15k rpm sas
disks, my colleague actually spend some time benchmarking the PERC and a
ICP Vortex (basically a overclocked Adaptec) on those drives.
Unfortunately he doesn't have too many comparable results, but it
basically boiled down to quite good scores for the PERC and a bit less
for the ICP Vortex.
I have no idea if the Vortex referred to correlates with this newer generation card or the old generation. But if it was in relation to a Perc 5, which is also an older (PCI-X, not PCIe) generation, then I’m not sure how much this relates to the Perc 6 and new Adaptecs or Areca 16xxx series which are all PCIe.
One manufacturer may be good in one generation and stink in another. For those long established providers, every generation tends to yield new winners and losers. Some are more often on the top or the bottom, but it wouldn’t be a total shocker a recent low performer like LSI’s or 3Ware had a next generation product that ended up on or near the top, or if the next PERC is one of the better ones. What is more consistent between generations is the management software and recovery process.
Arjen van der Meijden <acmmailing@tweakers.net> writes: > When we purchased our Perc 5/e with MD1000 filled with 15 15k rpm sas disks, my > colleague actually spend some time benchmarking the PERC and a ICP Vortex > (basically a overclocked Adaptec) on those drives. Unfortunately he doesn't > have too many comparable results, but it basically boiled down to quite good > scores for the PERC and a bit less for the ICP Vortex. > IOMeter sequential reads are above 300MB/s for the RAID5 and above 240MB/s for > a RAID10 (and winbench99 versions range from 400+ to 600+MB/s). FWIW those are pretty terrible numbers for fifteen 15k rpm drives. They're about what you would expect if for a PCI-X card which was bus bandwidth limited. A PCI-e card should be able to get about 3x that from the drives. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!
Re: suggestions for postgresql setup on Dell 2950 , PERC6i controller
From
Rajesh Kumar Mallah
Date:
BTW our Machine got build with 8 15k drives in raid10 , from bonnie++ results its looks like the machine is able to do 400 Mbytes/s seq write and 550 Mbytes/s read. the BB cache is enabled with 256MB sda6 --> xfs with default formatting options. sda7 --> mkfs.xfs -f -d sunit=128,swidth=512 /dev/sda7 sda8 --> ext3 (default) it looks like mkfs.xfs options sunit=128 and swidth=512 did not improve io throughtput as such in bonnie++ tests . it looks like ext3 with default options performed worst in my case. regds -- mallah NOTE: observations made in this post are interpretations by the poster only which may or may not be indicative of the true suitablity of the filesystem. On Mon, Feb 16, 2009 at 7:01 PM, Gregory Stark <stark@enterprisedb.com> wrote: > Arjen van der Meijden <acmmailing@tweakers.net> writes: > >> When we purchased our Perc 5/e with MD1000 filled with 15 15k rpm sas disks, my >> colleague actually spend some time benchmarking the PERC and a ICP Vortex >> (basically a overclocked Adaptec) on those drives. Unfortunately he doesn't >> have too many comparable results, but it basically boiled down to quite good >> scores for the PERC and a bit less for the ICP Vortex. >> IOMeter sequential reads are above 300MB/s for the RAID5 and above 240MB/s for >> a RAID10 (and winbench99 versions range from 400+ to 600+MB/s). > > FWIW those are pretty terrible numbers for fifteen 15k rpm drives. They're > about what you would expect if for a PCI-X card which was bus bandwidth > limited. A PCI-e card should be able to get about 3x that from the drives. > > -- > Gregory Stark > EnterpriseDB http://www.enterprisedb.com > Ask me about EnterpriseDB's RemoteDBA services! > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
Re: suggestions for postgresql setup on Dell 2950 , PERC6i controller
From
Rajesh Kumar Mallah
Date:
The URL of the result is http://98.129.214.99/bonnie/report.html (sorry if this was a repost) On Tue, Feb 17, 2009 at 2:04 AM, Rajesh Kumar Mallah <mallah.rajesh@gmail.com> wrote: > BTW > > our Machine got build with 8 15k drives in raid10 , > from bonnie++ results its looks like the machine is > able to do 400 Mbytes/s seq write and 550 Mbytes/s > read. the BB cache is enabled with 256MB > > sda6 --> xfs with default formatting options. > sda7 --> mkfs.xfs -f -d sunit=128,swidth=512 /dev/sda7 > sda8 --> ext3 (default) > > it looks like mkfs.xfs options sunit=128 and swidth=512 did not improve > io throughtput as such in bonnie++ tests . > > it looks like ext3 with default options performed worst in my case. > > regds > -- mallah > > > NOTE: observations made in this post are interpretations by the poster > only which may or may not be indicative of the true suitablity of the > filesystem. > > > > On Mon, Feb 16, 2009 at 7:01 PM, Gregory Stark <stark@enterprisedb.com> wrote: >> Arjen van der Meijden <acmmailing@tweakers.net> writes: >> >>> When we purchased our Perc 5/e with MD1000 filled with 15 15k rpm sas disks, my >>> colleague actually spend some time benchmarking the PERC and a ICP Vortex >>> (basically a overclocked Adaptec) on those drives. Unfortunately he doesn't >>> have too many comparable results, but it basically boiled down to quite good >>> scores for the PERC and a bit less for the ICP Vortex. >>> IOMeter sequential reads are above 300MB/s for the RAID5 and above 240MB/s for >>> a RAID10 (and winbench99 versions range from 400+ to 600+MB/s). >> >> FWIW those are pretty terrible numbers for fifteen 15k rpm drives. They're >> about what you would expect if for a PCI-X card which was bus bandwidth >> limited. A PCI-e card should be able to get about 3x that from the drives. >> >> -- >> Gregory Stark >> EnterpriseDB http://www.enterprisedb.com >> Ask me about EnterpriseDB's RemoteDBA services! >> >> -- >> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-performance >> >
On Tue, 17 Feb 2009, Rajesh Kumar Mallah wrote: > sda6 --> xfs with default formatting options. > sda7 --> mkfs.xfs -f -d sunit=128,swidth=512 /dev/sda7 > sda8 --> ext3 (default) > > it looks like mkfs.xfs options sunit=128 and swidth=512 did not improve > io throughtput as such in bonnie++ tests . > > it looks like ext3 with default options performed worst in my case. Of course, doing comparisons using a setup like that (on separate partitions) will skew the results, because discs' performance differs depending on the portion of the disc being accessed. You should perform the different filesystem tests on the same partition one after the other instead. Matthew -- "We did a risk management review. We concluded that there was no risk of any management." -- Hugo Mills <hugo@carfax.nildram.co.uk>
Re: suggestions for postgresql setup on Dell 2950 , PERC6i controller
From
Rajesh Kumar Mallah
Date:
On Tue, Feb 17, 2009 at 5:15 PM, Matthew Wakeling <matthew@flymine.org> wrote: > On Tue, 17 Feb 2009, Rajesh Kumar Mallah wrote: >> >> sda6 --> xfs with default formatting options. >> sda7 --> mkfs.xfs -f -d sunit=128,swidth=512 /dev/sda7 >> sda8 --> ext3 (default) >> >> it looks like mkfs.xfs options sunit=128 and swidth=512 did not improve >> io throughtput as such in bonnie++ tests . >> >> it looks like ext3 with default options performed worst in my case. > > Of course, doing comparisons using a setup like that (on separate > partitions) will skew the results, because discs' performance differs > depending on the portion of the disc being accessed. You should perform the > different filesystem tests on the same partition one after the other > instead. point noted . will redo the test on ext3. > > Matthew > > -- > "We did a risk management review. We concluded that there was no risk > of any management." -- Hugo Mills <hugo@carfax.nildram.co.uk> > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
Generally speaking, you will want to use a partition that is 25% or less the size of the whole disk as well. If it is thewhole thing, one file system can place the file you are testing in a very different place on disk and skew results aswell. My own tests, using the first 20% of an array for all, showed that xfs with default settings beat out or equalled 'tuned'settings with hardware raid 10, and was far faster than ext3 in sequential transfer rate. If testing STR, you will also want to tune the block device read ahead value (example: /sbin/blockdev -getra /dev/sda6). This has very large impact on sequential transfer performance (and no impact on random access). How large ofan impact depends quite a bit on what kernel you're on since the readahead code has been getting better over time and requiresless tuning. But it still defaults out-of-the-box to more optimal settings for a single drive than RAID. For SAS, try 256 or 512 * the number of effective spindles (spindles * 0.5 for raid 10). For SATA, try 1024 or 2048 * thenumber of effective spindles. The value is in blocks (512 bytes). There is documentation on the blockdev command, andhere is a little write-up I found with a couple web searches: http://portal.itauth.com/2007/11/20/howto-linux-double-your-disk-read-performance-single-command ________________________________________ From: pgsql-performance-owner@postgresql.org [pgsql-performance-owner@postgresql.org] On Behalf Of Rajesh Kumar Mallah [mallah.rajesh@gmail.com] Sent: Tuesday, February 17, 2009 5:25 AM To: Matthew Wakeling Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller On Tue, Feb 17, 2009 at 5:15 PM, Matthew Wakeling <matthew@flymine.org> wrote: > On Tue, 17 Feb 2009, Rajesh Kumar Mallah wrote: >> >> sda6 --> xfs with default formatting options. >> sda7 --> mkfs.xfs -f -d sunit=128,swidth=512 /dev/sda7 >> sda8 --> ext3 (default) >> >> it looks like mkfs.xfs options sunit=128 and swidth=512 did not improve >> io throughtput as such in bonnie++ tests . >> >> it looks like ext3 with default options performed worst in my case. > > Of course, doing comparisons using a setup like that (on separate > partitions) will skew the results, because discs' performance differs > depending on the portion of the disc being accessed. You should perform the > different filesystem tests on the same partition one after the other > instead. point noted . will redo the test on ext3. > > Matthew > > -- > "We did a risk management review. We concluded that there was no risk > of any management." -- Hugo Mills <hugo@carfax.nildram.co.uk> > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: suggestions for postgresql setup on Dell 2950 , PERC6i controller
From
Rajesh Kumar Mallah
Date:
the raid10 voulme was benchmarked again taking in consideration above points # fdisk -l /dev/sda Disk /dev/sda: 290.9 GB, 290984034304 bytes 255 heads, 63 sectors/track, 35376 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 * 1 12 96358+ 83 Linux /dev/sda2 13 1317 10482412+ 83 Linux /dev/sda3 1318 1578 2096482+ 83 Linux /dev/sda4 1579 35376 271482435 5 Extended /dev/sda5 1579 1839 2096451 82 Linux swap / Solaris /dev/sda6 1840 7919 48837568+ 83 Linux /dev/sda7 29297 35376 48837600 83 Linux CASE writes reads KB/s KB/s ext3(whole disk) 244194 , 352093 one part whole disk xfs(whole disk) 402352 , 547674 25ext3 260132 , 420905 partition only first 25% 25xfs 404291 , 547672 (/dev/sda6) ext3_25 227307, 348237 partition specifically last 25% xfs25 350661, 474481 (/dev/sda7) Effect of ReadAhead Settings disabled,256(default) , 512,1024 xfs_ra0 414741 , 66144 xfs_ra256 403647, 545026 all tests on sda6 xfs_ra512 411357, 564769 xfs_ra1024 404392, 431168 looks like 512 was the best setting for this controller Considering these two figures xfs25 350661, 474481 (/dev/sda7) 25xfs 404291 , 547672 (/dev/sda6) looks like the beginning of the drives are 15% faster than the ending sections , considering this is it worth creating a special tablespace at the begining of drives if at all done what kind of data objects should be placed towards begining , WAL , indexes , frequently updated tables or sequences ? regds mallah. >On Tue, Feb 17, 2009 at 9:49 PM, Scott Carey <scott@richrelevance.com> wrote: > Generally speaking, you will want to use a partition that is 25% or less the size of the whole disk as well. If it is>the whole thing, one file system can place the file you are testing in a very different place on disk and skew resultsas well. > > My own tests, using the first 20% of an array for all, showed that xfs with default settings beat out or equalled >'tuned'settings with hardware raid 10, and was far faster than ext3 in sequential transfer rate. same here. > > If testing STR, you will also want to tune the block device read ahead value (example: /sbin/blockdev -getra > /dev/sda6). This has very large impact on sequential transfer performance (and no impact on random access). >How largeof an impact depends quite a bit on what kernel you're on since the readahead code has been getting >better over timeand requires less tuning. But it still defaults out-of-the-box to more optimal settings for a single >drive than RAID. > For SAS, try 256 or 512 * the number of effective spindles (spindles * 0.5 for raid 10). For SATA, try 1024 or >2048 *the number of effective spindles. The value is in blocks (512 bytes). There is documentation on the >blockdev command,and here is a little write-up I found with a couple web searches: >http://portal.itauth.com/2007/11/20/howto-linux-double-your-disk-read-performance-single-command > > ________________________________________ > From: pgsql-performance-owner@postgresql.org [pgsql-performance-owner@postgresql.org] On Behalf Of Rajesh Kumar Mallah[mallah.rajesh@gmail.com] > Sent: Tuesday, February 17, 2009 5:25 AM > To: Matthew Wakeling > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller > > On Tue, Feb 17, 2009 at 5:15 PM, Matthew Wakeling <matthew@flymine.org> wrote: >> On Tue, 17 Feb 2009, Rajesh Kumar Mallah wrote: >>> >>> sda6 --> xfs with default formatting options. >>> sda7 --> mkfs.xfs -f -d sunit=128,swidth=512 /dev/sda7 >>> sda8 --> ext3 (default) >>> >>> it looks like mkfs.xfs options sunit=128 and swidth=512 did not improve >>> io throughtput as such in bonnie++ tests . >>> >>> it looks like ext3 with default options performed worst in my case. >> >> Of course, doing comparisons using a setup like that (on separate >> partitions) will skew the results, because discs' performance differs >> depending on the portion of the disc being accessed. You should perform the >> different filesystem tests on the same partition one after the other >> instead. > > point noted . will redo the test on ext3. > > >> >> Matthew >> >> -- >> "We did a risk management review. We concluded that there was no risk >> of any management." -- Hugo Mills <hugo@carfax.nildram.co.uk> >> >> -- >> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-performance >> > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
Re: suggestions for postgresql setup on Dell 2950 , PERC6i controller
From
Rajesh Kumar Mallah
Date:
Detailed bonnie++ figures. http://98.129.214.99/bonnie/report.html On Wed, Feb 18, 2009 at 1:22 PM, Rajesh Kumar Mallah <mallah.rajesh@gmail.com> wrote: > the raid10 voulme was benchmarked again > taking in consideration above points > > # fdisk -l /dev/sda > Disk /dev/sda: 290.9 GB, 290984034304 bytes > 255 heads, 63 sectors/track, 35376 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Device Boot Start End Blocks Id System > /dev/sda1 * 1 12 96358+ 83 Linux > /dev/sda2 13 1317 10482412+ 83 Linux > /dev/sda3 1318 1578 2096482+ 83 Linux > /dev/sda4 1579 35376 271482435 5 Extended > /dev/sda5 1579 1839 2096451 82 Linux swap / Solaris > /dev/sda6 1840 7919 48837568+ 83 Linux > /dev/sda7 29297 35376 48837600 83 Linux > > > CASE writes reads > KB/s KB/s > > ext3(whole disk) 244194 , 352093 one part whole disk > xfs(whole disk) 402352 , 547674 > > 25ext3 260132 , 420905 partition only first 25% > 25xfs 404291 , 547672 (/dev/sda6) > > ext3_25 227307, 348237 partition > specifically last 25% > xfs25 350661, 474481 (/dev/sda7) > > > Effect of ReadAhead Settings > disabled,256(default) , 512,1024 > > xfs_ra0 414741 , 66144 > xfs_ra256 403647, 545026 all tests on sda6 > xfs_ra512 411357, 564769 > xfs_ra1024 404392, 431168 > > looks like 512 was the best setting for this controller > > Considering these two figures > xfs25 350661, 474481 (/dev/sda7) > 25xfs 404291 , 547672 (/dev/sda6) > > looks like the beginning of the drives are 15% faster > than the ending sections , considering this is it worth > creating a special tablespace at the begining of drives > > if at all done what kind of data objects should be placed > towards begining , WAL , indexes , frequently updated tables > or sequences ? > > regds > mallah. > >>On Tue, Feb 17, 2009 at 9:49 PM, Scott Carey <scott@richrelevance.com> wrote: >> Generally speaking, you will want to use a partition that is 25% or less the size of the whole disk as well. If it is>the whole thing, one file system can place the file you are testing in a very different place on disk and skew resultsas well. >> >> My own tests, using the first 20% of an array for all, showed that xfs with default settings beat out or equalled >'tuned'settings with hardware raid 10, and was far faster than ext3 in sequential transfer rate. > > same here. > >> >> If testing STR, you will also want to tune the block device read ahead value (example: /sbin/blockdev -getra >> /dev/sda6). This has very large impact on sequential transfer performance (and no impact on random access). >How largeof an impact depends quite a bit on what kernel you're on since the readahead code has been getting >better over timeand requires less tuning. But it still defaults out-of-the-box to more optimal settings for a single >drive than RAID. >> For SAS, try 256 or 512 * the number of effective spindles (spindles * 0.5 for raid 10). For SATA, try 1024 or >2048* the number of effective spindles. The value is in blocks (512 bytes). There is documentation on the >blockdev command,and here is a little write-up I found with a couple web searches: >>http://portal.itauth.com/2007/11/20/howto-linux-double-your-disk-read-performance-single-command > > >> >> ________________________________________ >> From: pgsql-performance-owner@postgresql.org [pgsql-performance-owner@postgresql.org] On Behalf Of Rajesh Kumar Mallah[mallah.rajesh@gmail.com] >> Sent: Tuesday, February 17, 2009 5:25 AM >> To: Matthew Wakeling >> Cc: pgsql-performance@postgresql.org >> Subject: Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller >> >> On Tue, Feb 17, 2009 at 5:15 PM, Matthew Wakeling <matthew@flymine.org> wrote: >>> On Tue, 17 Feb 2009, Rajesh Kumar Mallah wrote: >>>> >>>> sda6 --> xfs with default formatting options. >>>> sda7 --> mkfs.xfs -f -d sunit=128,swidth=512 /dev/sda7 >>>> sda8 --> ext3 (default) >>>> >>>> it looks like mkfs.xfs options sunit=128 and swidth=512 did not improve >>>> io throughtput as such in bonnie++ tests . >>>> >>>> it looks like ext3 with default options performed worst in my case. >>> >>> Of course, doing comparisons using a setup like that (on separate >>> partitions) will skew the results, because discs' performance differs >>> depending on the portion of the disc being accessed. You should perform the >>> different filesystem tests on the same partition one after the other >>> instead. >> >> point noted . will redo the test on ext3. >> >> >>> >>> Matthew >>> >>> -- >>> "We did a risk management review. We concluded that there was no risk >>> of any management." -- Hugo Mills <hugo@carfax.nildram.co.uk> >>> >>> -- >>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) >>> To make changes to your subscription: >>> http://www.postgresql.org/mailpref/pgsql-performance >>> >> >> -- >> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-performance >> >
On Wed, Feb 18, 2009 at 12:52 AM, Rajesh Kumar Mallah <mallah.rajesh@gmail.com> wrote: > the raid10 voulme was benchmarked again > taking in consideration above points > Effect of ReadAhead Settings > disabled,256(default) , 512,1024 > > xfs_ra0 414741 , 66144 > xfs_ra256 403647, 545026 all tests on sda6 > xfs_ra512 411357, 564769 > xfs_ra1024 404392, 431168 > > looks like 512 was the best setting for this controller That's only known for sequential access. How did it perform under the random access, or did the numbers not change too much? > Considering these two figures > xfs25 350661, 474481 (/dev/sda7) > 25xfs 404291 , 547672 (/dev/sda6) > > looks like the beginning of the drives are 15% faster > than the ending sections , considering this is it worth > creating a special tablespace at the begining of drives It's also good because you will be short stroking the drives. They will naturally have a smaller space to move back and forth in and this can increase the random speed access at the same time.
Re: suggestions for postgresql setup on Dell 2950 , PERC6i controller
From
Rajesh Kumar Mallah
Date:
>> Effect of ReadAhead Settings >> disabled,256(default) , 512,1024 >> SEQUENTIAL >> xfs_ra0 414741 , 66144 >> xfs_ra256 403647, 545026 all tests on sda6 >> xfs_ra512 411357, 564769 >> xfs_ra1024 404392, 431168 >> >> looks like 512 was the best setting for this controller > > That's only known for sequential access. > How did it perform under the random access, or did the numbers not > change too much? RANDOM SEEKS /sec xfs_ra0 6341.0 xfs_ra256 14642.7 xfs_ra512 14415.6 xfs_ra1024 14541.6 the value does not seems to be having much effect unless its totally disabled. regds mallah. >
On Wed, Feb 18, 2009 at 1:44 AM, Rajesh Kumar Mallah <mallah.rajesh@gmail.com> wrote: >>> Effect of ReadAhead Settings >>> disabled,256(default) , 512,1024 >>> > SEQUENTIAL >>> xfs_ra0 414741 , 66144 >>> xfs_ra256 403647, 545026 all tests on sda6 >>> xfs_ra512 411357, 564769 >>> xfs_ra1024 404392, 431168 >>> >>> looks like 512 was the best setting for this controller >> >> That's only known for sequential access. >> How did it perform under the random access, or did the numbers not >> change too much? > > RANDOM SEEKS /sec > > xfs_ra0 6341.0 > xfs_ra256 14642.7 > xfs_ra512 14415.6 > xfs_ra1024 14541.6 > > the value does not seems to be having much effect > unless its totally disabled. excellent. and yes, you have to dump and reload from 32 to 64 bit.
Re: suggestions for postgresql setup on Dell 2950 , PERC6i controller
From
Grzegorz Jaśkiewicz
Date:
have you tried hanging bunch of raid1 to linux's md, and let it do raid0 for you ? I heard plenty of stories where this actually sped up performance. One noticeable is case of youtube servers.
Re: suggestions for postgresql setup on Dell 2950 , PERC6i controller
From
Rajesh Kumar Mallah
Date:
On Wed, Feb 18, 2009 at 2:27 PM, Grzegorz Jaśkiewicz <gryzman@gmail.com> wrote: > have you tried hanging bunch of raid1 to linux's md, and let it do > raid0 for you ? Hmmm , i will have only 3 bunches in that case as system has to boot from first bunch as system has only 8 drives. i think reducing spindles will reduce perf. I also have a SATA SAN though from which i can boot! but the server needs to be rebuilt in that case too. I (may) give it a shot. regds -- mallah. > I heard plenty of stories where this actually sped up performance. One > noticeable is case of youtube servers. >
Re: suggestions for postgresql setup on Dell 2950 , PERC6i controller
From
Grzegorz Jaśkiewicz
Date:
2009/2/18 Rajesh Kumar Mallah <mallah.rajesh@gmail.com>: > On Wed, Feb 18, 2009 at 2:27 PM, Grzegorz Jaśkiewicz <gryzman@gmail.com> wrote: >> have you tried hanging bunch of raid1 to linux's md, and let it do >> raid0 for you ? > > Hmmm , i will have only 3 bunches in that case as system has to boot > from first bunch > as system has only 8 drives. i think reducing spindles will reduce perf. > > I also have a SATA SAN though from which i can boot! > but the server needs to be rebuilt in that case too. > I (may) give it a shot. Sure, if you do play with that - make sure to tweak 'chunk' size too. Default one is way to small (IMO) -- GJ
On 2/18/09 12:31 AM, "Scott Marlowe" <scott.marlowe@gmail.com> wrote:
The readaheaed value DOES affect random access as a side effect in favor of sequential reads when there is mixed random/sequential load, by decreasing the ‘read fragmentation’ effect of mixing random seeks into a sequential request stream. For most database loads, this is a good thing, since it increases total bytes read per unit of time, effectively ‘getting out of the way’ a sequential read rather than making it drag on for a long time by splitting it into non-sequential I/O’s while other random access is concurrent.
In my tests, I have never seen the readahead value affect random access performance (kernel 2.6.18 +). At the extreme, I tried a 128MB readahead, and random I/O rates were the same. This was with CentOS 5.2, other confirmation of this would be useful. The Linux readahead algorithm is smart enough to only seek ahead after detecting sequential access. The readahead algorithm has had various improvements to reduce the need to tune it from 2.6.18 to 2.6.24, but from what I gather, this tuning is skewed towards desktop/workstation drives and not large RAID arrays.
> Effect of ReadAhead Settings
> disabled,256(default) , 512,1024
>
> xfs_ra0 414741 , 66144
> xfs_ra256 403647, 545026 all tests on sda6
> xfs_ra512 411357, 564769
> xfs_ra1024 404392, 431168
>
> looks like 512 was the best setting for this controller
That's only known for sequential access.
How did it perform under the random access, or did the numbers not
change too much?
The readaheaed value DOES affect random access as a side effect in favor of sequential reads when there is mixed random/sequential load, by decreasing the ‘read fragmentation’ effect of mixing random seeks into a sequential request stream. For most database loads, this is a good thing, since it increases total bytes read per unit of time, effectively ‘getting out of the way’ a sequential read rather than making it drag on for a long time by splitting it into non-sequential I/O’s while other random access is concurrent.
One thing to note, is that linux’s md sets the readahead to 8192 by default instead of 128. I’ve noticed that in many situations, a large chunk of the performance boost reported is due to this alone.
On 2/18/09 12:57 AM, "Grzegorz Jaśkiewicz" <gryzman@gmail.com> wrote:
On 2/18/09 12:57 AM, "Grzegorz Jaśkiewicz" <gryzman@gmail.com> wrote:
have you tried hanging bunch of raid1 to linux's md, and let it do
raid0 for you ?
I heard plenty of stories where this actually sped up performance. One
noticeable is case of youtube servers.
On 2/17/09 11:52 PM, "Rajesh Kumar Mallah" <mallah.rajesh@gmail.com> wrote:
the raid10 voulme was benchmarked againTry 4096 or 8192 (or just to see, 32768), you should get numbers very close to a raw partition with xfs with a sufficient readahead value. It is controller dependant for sure, but I usually see a “small peak” in performance at 512 or 1024, followed by a dip, then a larger peak and plateau at somewhere near # of drives * the small peak. The higher quality the controller, the less you need to fiddle with this.
taking in consideration above points
Effect of ReadAhead Settings
disabled,256(default) , 512,1024
xfs_ra0 414741 , 66144
xfs_ra256 403647, 545026 all tests on sda6
xfs_ra512 411357, 564769
xfs_ra1024 404392, 431168
looks like 512 was the best setting for this controller
I use a script that runs fio benchmarks with the following profiles with readahead values from 128 to 65536. The single reader STR test peaks with a smaller readahead value than the concurrent reader one (2 ot 8 concurrent sequential readers) and the mixed random/sequential read loads become more biased to sequential transfer (and thus, higher overall throughput in bytes/sec) with larger readahead values. The choice between the cfq and deadline scheduler however will affect the priority of random vs sequential reads more than the readahead — cfq favoring random access due to dividing I/O by time slice.
The FIO profiles I use for benchmarking are at the end of this message.
For SAS drives, its typically a ~15% to 25% degradation (the last 5% is definitely slow). For SATA 3.5” drives the last 5% is 50% the STR as the front.
Considering these two figures
xfs25 350661, 474481 (/dev/sda7)
25xfs 404291 , 547672 (/dev/sda6)
looks like the beginning of the drives are 15% faster
than the ending sections , considering this is it worth
creating a special tablespace at the begining of drives
Graphs about half way down this page show what it looks like for a typical SATA drive: http://www.tomshardware.com/reviews/Seagate-Barracuda-1-5-TB,2032-5.html
And a couple figures for some SAS drives here http://www.storagereview.com/ST973451SS.sr?page=0%2C1
FIO benchmark profile examples (long, posting here for the archives):
>
> If testing STR, you will also want to tune the block device read ahead value (example: /sbin/blockdev -getra
> /dev/sda6). This has very large impact on sequential transfer performance (and no impact on random access). >How large of an impact depends quite a bit on what kernel you're on since the readahead code has been getting >better over time and requires less tuning. But it still defaults out-of-the-box to more optimal settings for a single >drive than RAID.
> For SAS, try 256 or 512 * the number of effective spindles (spindles * 0.5 for raid 10). For SATA, try 1024 or >2048 * the number of effective spindles. The value is in blocks (512 bytes). There is documentation on the >blockdev command, and here is a little write-up I found with a couple web searches:
>http://portal.itauth.com/2007/11/20/howto-linux-double-your-disk-read-performance-single-command
*Read benchmarks, sequential:
[read-seq]
; one sequential reader reading one 64g file
rw=read
size=64g
directory=/data/test
fadvise_hint=0
blocksize=8k
direct=0
ioengine=sync
iodepth=1
numjobs=1
nrfiles=1
runtime=1m
group_reporting=1
exec_prerun=echo 3 > /proc/sys/vm/drop_caches
[read-seq]
; two sequential readers, each concurrently reading a 32g file, for a total of 64g max
rw=read
size=32g
directory=/data/test
fadvise_hint=0
blocksize=8k
direct=0
ioengine=sync
iodepth=1
numjobs=2
nrfiles=1
runtime=1m
group_reporting=1
exec_prerun=echo 3 > /proc/sys/vm/drop_caches
[read-seq]
; eight sequential readers, each concurrently reading a 8g file, for a total of 64g max
rw=read
size=8g
directory=/data/test
fadvise_hint=0
blocksize=8k
direct=0
ioengine=sync
iodepth=1
numjobs=8
nrfiles=1
runtime=1m
group_reporting=1
exec_prerun=echo 3 > /proc/sys/vm/drop_caches
*Read benchmarks, random 8k reads.
[read-rand]
; random access on 2g file by single reader, best case scenario.
rw=randread
size=2g
directory=/data/test
fadvise_hint=0
blocksize=8k
direct=0
ioengine=sync
iodepth=1
numjobs=1
nrfiles=1
group_reporting=1
runtime=1m
exec_prerun=echo 3 > /proc/sys/vm/drop_caches
[read-rand]
; 8 concurrent random readers each to its own 1g file
rw=randread
size=1g
directory=/data/test
fadvise_hint=0
blocksize=8k
direct=0
ioengine=sync
iodepth=1
numjobs=8
nrfiles=1
group_reporting=1
runtime=1m
exec_prerun=echo 3 > /proc/sys/vm/drop_caches
*Mixed Load:
[global]
; one random reader concurrently with one sequential reader.
directory=/data/test
fadvise_hint=0
blocksize=8k
direct=0
ioengine=sync
iodepth=1
runtime=1m
exec_prerun=echo 3 > /proc/sys/vm/drop_caches
[seq-read]
rw=read
size=64g
numjobs=1
nrfiles=1
[read-rand]
rw=randread
size=1g
numjobs=1
nrfiles=1
[global]
; Four sequential readers concurrent with four random readers
directory=/data/test
fadvise_hint=0
blocksize=8k
direct=0
ioengine=sync
iodepth=1
runtime=1m
group_reporting=1
exec_prerun=echo 3 > /proc/sys/vm/drop_caches
[read-seq]
rw=read
size=8g
numjobs=4
nrfiles=1
[read-rand]
rw=randread
size=1g
numjobs=4
nrfiles=1
*Write tests
[write-seq]
rw=write
size=32g
directory=/data/test
fadvise_hint=0
blocksize=8k
direct=0
ioengine=sync
iodepth=1
numjobs=1
nrfiles=1
runtime=1m
group_reporting=1
end_fsync=1
[write-rand]
rw=randwrite
size=32g
directory=/data/test
fadvise_hint=0
blocksize=8k
direct=0
ioengine=sync
; overwrite= 1 is MANDATORY for xfs, otherwise the writes are sparse random writes and can slow performance to near zero. Postgres only does random re-writes, never sparse random writes.
overwrite=1
iodepth=1
numjobs=1
nrfiles=1
group_reporting=1
runtime=1m
end_fsync=1;
Re: suggestions for postgresql setup on Dell 2950 , PERC6i controller
From
Rajesh Kumar Mallah
Date:
There has been an error in the tests the dataset size was not 2*MEM it was 0.5*MEM i shall redo the tests and post results.