Thread: Advice sought : new database server
I'd be grateful for advice on specifying the new server We presently have one main database server which is performing well. As our services expand we are thinking of bringing another database server to work with it, and back each up via Postgres 9.1 streaming replication each to a VM server -- at present we are doing pg_dumps twice a day and using Postgres 8.4. The existing server is a 2 x Quad core E5420 Xeon (2.5GHz) with 8GB of RAM with an LSI battery-backed RAID 10 array of 4no 10K SCSI disks, providing about 230GB of usable storage, 150GB of which is on an LV providing reconfigurable space for the databases which are served off an XFS formatted volume. We presently have 90 databases using around 20GB of disk storage. However the larger databases are approaching 1GB in size, so in a year I imagine the disk requirement will have gone up to 40GB for the same number of databases. The server also serves some web content. Performance is generally good, although we have a few slow running queries due to poor plpgsql design. We would get faster performance, I believe, by providing more RAM. Sorry -- I should have some pg_bench output to share here. I believe our existing server together with the new server should be able to serve 200--300 databases of our existing type, with around 100 databases on our existing server and perhaps 150 on the new one. After that we would be looking to get a third database server. I'm presently looking at the following kit: 1U chassis with 8 2.5" disk bays 2x Intel Xeon E5630 Quad-Core / 4x 2.53GHz / 12MB cache 8 channel Areca ARC-1880i (PCI Express x8 card) presumably with BBU (can't see it listed at present) 2 x 300GB SAS 2.5" disks for operating system (Possibly also 300GB SATA VelociRaptor/10K RPM/32MB cache RAID 1 4 x 300GB SAS 2.5" storage disks RAID 10 48.0GB DDR3 1333MHz registered ECC (12x 4.0GB modules) My major question about this chassis, which is 1U, is that it only takes 2.5" disks, and presently the supplier does not show 15K SAS disk options. Assuming that I can get the BBU for the Areca card, and that 15K SAS disks are available, I'd be grateful for comments on this configuration. Regards Rory -- Rory Campbell-Lange rory@campbell-lange.net Campbell-Lange Workshop www.campbell-lange.net 0207 6311 555 3 Tottenham Street London W1T 2AF Registered in England No. 04551928
Hey! On 04.03.2012 10:58, Rory Campbell-Lange wrote: > 1U chassis with 8 2.5" disk bays > 2x Intel Xeon E5630 Quad-Core / 4x 2.53GHz / 12MB cache > 8 channel Areca ARC-1880i (PCI Express x8 card) > presumably with BBU (can't see it listed at present) > 2 x 300GB SAS 2.5" disks for operating system > (Possibly also 300GB SATA VelociRaptor/10K RPM/32MB cache > RAID 1 > 4 x 300GB SAS 2.5" storage disks > RAID 10 > 48.0GB DDR3 1333MHz registered ECC (12x 4.0GB modules) > Sorry, no answer for your question and a bit offtopic. Why do you take SAS disks for the OS and not much cheaper SATA ones? Im currently trying to get some informations together on this. Regards, Michi
On Sun, Mar 4, 2012 at 2:58 AM, Rory Campbell-Lange <rory@campbell-lange.net> wrote: > I'd be grateful for advice on specifying the new server > > We presently have one main database server which is performing well. As > our services expand we are thinking of bringing another database server > to work with it, and back each up via Postgres 9.1 streaming replication > each to a VM server -- at present we are doing pg_dumps twice a day and > using Postgres 8.4. > > The existing server is a 2 x Quad core E5420 Xeon (2.5GHz) with 8GB of > RAM with an LSI battery-backed RAID 10 array of 4no 10K SCSI disks, > providing about 230GB of usable storage, 150GB of which is on an LV > providing reconfigurable space for the databases which are served off an > XFS formatted volume. > > We presently have 90 databases using around 20GB of disk storage. > However the larger databases are approaching 1GB in size, so in a year I > imagine the disk requirement will have gone up to 40GB for the same > number of databases. The server also serves some web content. > > Performance is generally good, although we have a few slow running > queries due to poor plpgsql design. We would get faster performance, I > believe, by providing more RAM. Sorry -- I should have some pg_bench > output to share here. RAM is always a good thing, and it's cheap enough that you can throw 32 or 64G at a machine like this pretty cheaply. > I believe our existing server together with the new server should be > able to serve 200--300 databases of our existing type, with around 100 > databases on our existing server and perhaps 150 on the new one. After > that we would be looking to get a third database server. > > I'm presently looking at the following kit: > > 1U chassis with 8 2.5" disk bays > 2x Intel Xeon E5630 Quad-Core / 4x 2.53GHz / 12MB cache > 8 channel Areca ARC-1880i (PCI Express x8 card) > presumably with BBU (can't see it listed at present) > 2 x 300GB SAS 2.5" disks for operating system > (Possibly also 300GB SATA VelociRaptor/10K RPM/32MB cache > RAID 1 > 4 x 300GB SAS 2.5" storage disks > RAID 10 > 48.0GB DDR3 1333MHz registered ECC (12x 4.0GB modules) > > My major question about this chassis, which is 1U, is that it only takes > 2.5" disks, and presently the supplier does not show 15K SAS disk > options. Assuming that I can get the BBU for the Areca card, and that > 15K SAS disks are available, I'd be grateful for comments on this > configuration. The 15k RPM disks aren't that big of a deal unless you're pushing the bleeding edge on a transactional system. I'm gonna take a wild guess that you're not doing heavy transactions, in which case, the BBU on the areca is the single most important thing for you to get for good performance. The areca 1880 is a great controller and is much much easier to configure than the LSI. Performance wise it's one of the fastest DAS controllers made. If the guys you're looking at getting this from can't do custom orders, find a white box dealer who can, like www.aberdeeninc.com. It might not be on their site, but they can build dang near anything you want.
On 04/03/12, Scott Marlowe (scott.marlowe@gmail.com) wrote: > On Sun, Mar 4, 2012 at 2:58 AM, Rory Campbell-Lange > <rory@campbell-lange.net> wrote: > > [About existing server...] We would get faster performance, I > > believe, by providing more RAM. Sorry -- I should have some pg_bench > > output to share here. > > RAM is always a good thing, and it's cheap enough that you can throw > 32 or 64G at a machine like this pretty cheaply. Thanks for your note. > > 1U chassis with 8 2.5" disk bays > > 2x Intel Xeon E5630 Quad-Core / 4x 2.53GHz / 12MB cache > > 8 channel Areca ARC-1880i (PCI Express x8 card) > > presumably with BBU (can't see it listed at present) > > 2 x 300GB SAS 2.5" disks for operating system > > (Possibly also 300GB SATA VelociRaptor/10K RPM/32MB cache > > RAID 1 > > 4 x 300GB SAS 2.5" storage disks > > RAID 10 > > 48.0GB DDR3 1333MHz registered ECC (12x 4.0GB modules) > > > > My major question about this chassis, which is 1U, is that it only takes > > 2.5" disks, and presently the supplier does not show 15K SAS disk > > options. Assuming that I can get the BBU for the Areca card, and that > > 15K SAS disks are available, I'd be grateful for comments on this > > configuration. > > The 15k RPM disks aren't that big of a deal unless you're pushing the > bleeding edge on a transactional system. I'm gonna take a wild guess > that you're not doing heavy transactions, in which case, the BBU on > the areca is the single most important thing for you to get for good > performance. The areca 1880 is a great controller and is much much > easier to configure than the LSI. Performance wise it's one of the > fastest DAS controllers made. We do have complex transactions, but I haven't benchmarked the performance so I can't describe it. Few of the databases are at the many million row size at the moment, and we are moving to an agressive scheme of archiving old data, so we hope to keep things fast. However I thought 15k disks were a pre-requisite for a fast database system, if one can afford them? I assume if all else is equal the 1880 controller will run 20-40% faster with 15k disks in a write-heavy application. Also I would be grateful to learn if there is a good reason not to use 2.5" SATA disks. > If the guys you're looking at getting this from can't do custom > orders, find a white box dealer who can, like www.aberdeeninc.com. It > might not be on their site, but they can build dang near anything you > want. Thanks for the note about Aberdeen. I've seen the advertisements, but not tried them yet. Thanks for your comments Rory -- Rory Campbell-Lange rory@campbell-lange.net Campbell-Lange Workshop www.campbell-lange.net 0207 6311 555 3 Tottenham Street London W1T 2AF Registered in England No. 04551928
On 03/04/2012 03:58 AM, Rory Campbell-Lange wrote: > I'd be grateful for advice on specifying the new server > > providing about 230GB of usable storage, 150GB of which is on an LV > providing reconfigurable space for the databases which are served off an > XFS formatted volume. > Do you mean LVM? I've heard that LVM limits IO, so if you want full speed you might wanna drop LVM. (And XFS supports increasingfs size, and when are you ever really gonna want to decrease fs size?). -Andy
On Sun, Mar 4, 2012 at 11:36 AM, Rory Campbell-Lange <rory@campbell-lange.net> wrote: > On 04/03/12, Scott Marlowe (scott.marlowe@gmail.com) wrote: >> The 15k RPM disks aren't that big of a deal unless you're pushing the >> bleeding edge on a transactional system. I'm gonna take a wild guess >> that you're not doing heavy transactions, in which case, the BBU on >> the areca is the single most important thing for you to get for good >> performance. The areca 1880 is a great controller and is much much >> easier to configure than the LSI. Performance wise it's one of the >> fastest DAS controllers made. > > We do have complex transactions, but I haven't benchmarked the > performance so I can't describe it. Yeah try to get a measurement of how many transactions per second you're running at peak load, and if you're currently IO bound or CPU bound. > Few of the databases are at the many > million row size at the moment, and we are moving to an agressive scheme > of archiving old data, so we hope to keep things fast. The key here is that your whole db can fit into memory. 48G is cutting it close if you're figuring on being at 40G in a year. I'd spec it out with 96G to start. That way if you want to set work_mem to 8 or 16M you can without worrying about running the machine out of memory / scramming your OS file system cache with a few large queries etc. > However I thought 15k disks were a pre-requisite for a fast database > system, if one can afford them? The heads have to seek, settle and THEN you ahve to wait for the platters to rotate under the head i.e. latency. > I assume if all else is equal the 1880 > controller will run 20-40% faster with 15k disks in a write-heavy > application. > Also I would be grateful to learn if there is a good reason > not to use 2.5" SATA disks. The 10k 2.5 seagate drives have combined seek and latency figures of about 7ms, while the15k 2.5 seagate drives have a combined time of about 5ms. Even the fastest 3.5" seagates average 6ms average seek time, but with short stroking can get down to 4 or 5. Now all of this becomes moot if you compare them to SSDs, where the seek settle time is measured in microseconds or lower. The fastest spinning drive will look like a garbage truck next to the formula one car that is the SSD. Until recently incompatabilites with RAID controllers and firmware bugs kept most SSDs out of the hosting center, or made the ones you could get horrifically expensive. The newest generations of SSDs though seem to be working pretty well. >> If the guys you're looking at getting this from can't do custom >> orders, find a white box dealer who can, like www.aberdeeninc.com. It >> might not be on their site, but they can build dang near anything you >> want. > > Thanks for the note about Aberdeen. I've seen the advertisements, but > not tried them yet. There's lots of others to choose from. In the past I've gotten fantastic customer service from aberdeen, and they've never steered me wrong. I've had my sales guy simply refuse to sell me a particular drive because the failure rate was too high in the field, etc. They cross ship RAID cards overnight, and can build truly huge DAS servers if you need them. Like a lot of white box guys they specialize more in large storage arrays and virualization hardware, but there's a lot of similarity between that class of machine and a db server.
On Sun, Mar 4, 2012 at 12:45 PM, Andy Colson <andy@squeakycode.net> wrote: > On 03/04/2012 03:58 AM, Rory Campbell-Lange wrote: >> >> I'd be grateful for advice on specifying the new server >> >> providing about 230GB of usable storage, 150GB of which is on an LV >> providing reconfigurable space for the databases which are served off an >> XFS formatted volume. >> > > Do you mean LVM? I've heard that LVM limits IO, so if you want full speed > you might wanna drop LVM. (And XFS supports increasing fs size, and when > are you ever really gonna want to decrease fs size?). It certainly did in the past, I don't know if anyone's done any conclusive testing on in recently, but circa 2005 to 2008 we were running RHEL 4 and LVM limited the machine by quite a bit, with max sequential throughput dropping off by 50% or more on bigger ios subsystems. I.e. a 600MB/s system would be lucky to hit 300MB/s with a LV on top.
On 04/03/12, Scott Marlowe (scott.marlowe@gmail.com) wrote: > On Sun, Mar 4, 2012 at 11:36 AM, Rory Campbell-Lange > <rory@campbell-lange.net> wrote: > > On 04/03/12, Scott Marlowe (scott.marlowe@gmail.com) wrote: ... [Description of system with 2 * 4 core Xeons, 8GB RAM, LSI card with 4*15K SCSI drives in R10. We are looking for a new server to partner with this one.] ... > > We do have complex transactions, but I haven't benchmarked the > > performance so I can't describe it. > > Yeah try to get a measurement of how many transactions per second > you're running at peak load, and if you're currently IO bound or CPU > bound. Our existing server rarely goes above 7% sustained IO according to SAR. Similarly, CPU at peak times is at 5-7% on the SAR average (across all 8 cores). I'm not clear on how to read the memory stats, but the average kbcommit value for this morning's work is 12420282 which (assuming it means about 12GB memory) is 4GB more than physical RAM. However the system never swaps, probably due to our rather parsimonious postgres memory settings. > > Few of the databases are at the many > > million row size at the moment, and we are moving to an agressive scheme > > of archiving old data, so we hope to keep things fast. > > The key here is that your whole db can fit into memory. 48G is > cutting it close if you're figuring on being at 40G in a year. I'd > spec it out with 96G to start. That way if you want to set work_mem > to 8 or 16M you can without worrying about running the machine out of > memory / scramming your OS file system cache with a few large queries > etc. Thanks for this excellent point. > > However I thought 15k disks were a pre-requisite for a fast database > > system, if one can afford them? > > The heads have to seek, settle and THEN you ahve to wait for the > platters to rotate under the head i.e. latency. > > > I assume if all else is equal the 1880 > > controller will run 20-40% faster with 15k disks in a write-heavy > > application. > > Also I would be grateful to learn if there is a good reason > > not to use 2.5" SATA disks. > > The 10k 2.5 seagate drives have combined seek and latency figures of > about 7ms, while the15k 2.5 seagate drives have a combined time of > about 5ms. Even the fastest 3.5" seagates average 6ms average seek > time, but with short stroking can get down to 4 or 5. > > Now all of this becomes moot if you compare them to SSDs, where the > seek settle time is measured in microseconds or lower. The fastest > spinning drive will look like a garbage truck next to the formula one > car that is the SSD. Until recently incompatabilites with RAID > controllers and firmware bugs kept most SSDs out of the hosting > center, or made the ones you could get horrifically expensive. The > newest generations of SSDs though seem to be working pretty well. From your comments it appears there are 3 options: 1. Card + BBU + SAS disks (10K/15K doesnt matter) + lots of RAM 2. Card + BBU + Raptors + lots of RAM 3. SSDs + lots of RAM Is this correct? If my databases are unlikely to be IO bound might it not be better to go for cheaper drive subsystems (i.e. option 2) + lots of RAM, or alternatively SSDs based on the fact that we don't require much storage space? I am unclear of what the options are on the highly-reliable SSD front, and how to RAID SSD systems. An ancillary point is that our systems are designed to have more rather than fewer databases so that we can scale easily horizontally. -- Rory Campbell-Lange rory@campbell-lange.net Campbell-Lange Workshop www.campbell-lange.net 0207 6311 555 3 Tottenham Street London W1T 2AF Registered in England No. 04551928
On Sun, Mar 4, 2012 at 10:36 AM, Rory Campbell-Lange <rory@campbell-lange.net> wrote:
We do have complex transactions, but I haven't benchmarked the
performance so I can't describe it. Few of the databases are at the many
million row size at the moment, and we are moving to an agressive scheme
of archiving old data, so we hope to keep things fast.
However I thought 15k disks were a pre-requisite for a fast database
system, if one can afford them? I assume if all else is equal the 1880
controller will run 20-40% faster with 15k disks in a write-heavy
application. Also I would be grateful to learn if there is a good reason
not to use 2.5" SATA disks.
Without those benchmarks, you can't really say what "fast" means. There are many bottlenecks that will limit your database's performance; the disk's spinning rate is just one of them. Memory size, memory bandwidth, CPU, CPU cache size and speed, the disk I/O bandwidth in and out, the disk RPM, the presence of a BBU controller ... any of these can be the bottleneck. If you focus on the disk's RPM, you may be fixing a bottleneck that you'll never reach.
We 12 inexpensive 7K SATA drives with an LSI/3Ware 9650SE and a BBU, and have been very impressed by the performance. 8 drives in RAID10, two in RAID1 for the WAL, one for Linux and one spare. This is on an 8-core system with 12 GB memory:
pgbench -i -s 100 -U test
pgbench -U test -c ... -t ...
-c -t TPS
5 20000 3777
10 10000 2622
20 5000 3759
30 3333 5712
40 2500 5953
50 2000 6141
Craig
On 05/03/12, Craig James (cjames@emolecules.com) wrote: > On Sun, Mar 4, 2012 at 10:36 AM, Rory Campbell-Lange < > rory@campbell-lange.net> wrote: > > > We do have complex transactions, but I haven't benchmarked the > > performance so I can't describe it. Few of the databases are at the many > > million row size at the moment, and we are moving to an agressive scheme > > of archiving old data, so we hope to keep things fast. > > > > However I thought 15k disks were a pre-requisite for a fast database > > system, if one can afford them? I assume if all else is equal the 1880 > > controller will run 20-40% faster with 15k disks in a write-heavy > > application. Also I would be grateful to learn if there is a good reason > > not to use 2.5" SATA disks. > > Without those benchmarks, you can't really say what "fast" means. There > are many bottlenecks that will limit your database's performance; the > disk's spinning rate is just one of them. Memory size, memory bandwidth, > CPU, CPU cache size and speed, the disk I/O bandwidth in and out, the disk > RPM, the presence of a BBU controller ... any of these can be the > bottleneck. If you focus on the disk's RPM, you may be fixing a bottleneck > that you'll never reach. > > We 12 inexpensive 7K SATA drives with an LSI/3Ware 9650SE and a BBU, and > have been very impressed by the performance. 8 drives in RAID10, two in > RAID1 for the WAL, one for Linux and one spare. This is on an 8-core > system with 12 GB memory: > > pgbench -i -s 100 -U test > pgbench -U test -c ... -t ... > > -c -t TPS > 5 20000 3777 > 10 10000 2622 > 20 5000 3759 > 30 3333 5712 > 40 2500 5953 > 50 2000 6141 Thanks for this quick guide to using pgbenc. My 4-year old SCSI server with 4 RAID10 disks behind an LSI card achieved the following on a contended system: -c -t TPS 5 20000 446 10 10000 542 20 5000 601 30 3333 647 These results seem pretty lousy in comparison to yours. Interesting. -- Rory Campbell-Lange rory@campbell-lange.net Campbell-Lange Workshop www.campbell-lange.net 0207 6311 555 3 Tottenham Street London W1T 2AF Registered in England No. 04551928
On 04/03/12, Rory Campbell-Lange (rory@campbell-lange.net) wrote: > I'd be grateful for advice on specifying a new server > ... > The existing server is a 2 x Quad core E5420 Xeon (2.5GHz) with 8GB of > RAM with an LSI battery-backed RAID 10 array of 4no 10K SCSI disks, > providing about 230GB of usable storage, 150GB of which is on an LV > providing reconfigurable space for the databases which are served off an > XFS formatted volume. In conversation on the list I've established that our current server (while fine for our needs) isn't performing terribly well. It could do with more RAM and the disk IO seems slow. That said, I'm keen to buy a new server to improve on the current performance, so I've taken the liberty of replying here to my initial mail to ask specifically about new server recommendations. The initial plan was to share some of the load between the current and new server, and to buy something along the following lines: > 1U chassis with 8 2.5" disk bays > 2x Intel Xeon E5630 Quad-Core / 4x 2.53GHz / 12MB cache > 8 channel Areca ARC-1880i (PCI Express x8 card) > presumably with BBU (can't see it listed at present) > 2 x 300GB SAS 2.5" disks for operating system > (Possibly also 300GB SATA VelociRaptor/10K RPM/32MB cache > RAID 1 > 4 x 300GB SAS 2.5" storage disks > RAID 10 > 48.0GB DDR3 1333MHz registered ECC (12x 4.0GB modules) However, after comments on the list, I realise I could get two servers with the following specs for the same price as the above: 2x Intel Xeon E5620 Quad-Core / 4x 2.40GHz / 12MB cache 48.0GB DDR3 1066MHz registered ECC 4 channel Areca ARC-1212 (PCI Express x4 card) + BBU 4 x WD Raptors in RAID 10 (in 3.5" adapters) In other words, for GBP 5k I can get two servers that may better meet between them my requirements (lots of memory, reasonably fast disks) than a single server. A salient point is that individual databases are currently less than 1GB in size but will grow perhaps to be 2GB over the coming 18 months. The aim would be to contain all of the databases in memory on each server. I'd be very grateful for comments on this strategy. Rory -- Rory Campbell-Lange rory@campbell-lange.net Campbell-Lange Workshop www.campbell-lange.net 0207 6311 555 3 Tottenham Street London W1T 2AF Registered in England No. 04551928
On 03/04/2012 03:50 AM, Michael Friedl wrote: > Hey! > > On 04.03.2012 10:58, Rory Campbell-Lange wrote: >> 1U chassis with 8 2.5" disk bays >> 2x Intel Xeon E5630 Quad-Core / 4x 2.53GHz / 12MB cache >> 8 channel Areca ARC-1880i (PCI Express x8 card) >> presumably with BBU (can't see it listed at present) >> 2 x 300GB SAS 2.5" disks for operating system >> (Possibly also 300GB SATA VelociRaptor/10K RPM/32MB cache >> RAID 1 >> 4 x 300GB SAS 2.5" storage disks >> RAID 10 >> 48.0GB DDR3 1333MHz registered ECC (12x 4.0GB modules) >> > Sorry, no answer for your question and a bit offtopic. > > > Why do you take SAS disks for the OS and not much cheaper SATA ones? > > > Here's Intel's (very general) take. Your OS disks may not justify SAS on performance alone but other aspects may sway you. http://www.intel.com/support/motherboards/server/sb/CS-031831.htm Cheers, Steve
On Mon, Mar 5, 2012 at 10:56 AM, Craig James <cjames@emolecules.com> wrote: > On Sun, Mar 4, 2012 at 10:36 AM, Rory Campbell-Lange > <rory@campbell-lange.net> wrote: >> >> We do have complex transactions, but I haven't benchmarked the >> performance so I can't describe it. Few of the databases are at the many >> million row size at the moment, and we are moving to an agressive scheme >> of archiving old data, so we hope to keep things fast. >> >> However I thought 15k disks were a pre-requisite for a fast database >> system, if one can afford them? I assume if all else is equal the 1880 >> controller will run 20-40% faster with 15k disks in a write-heavy >> application. Also I would be grateful to learn if there is a good reason >> not to use 2.5" SATA disks. > > > Without those benchmarks, you can't really say what "fast" means. There are > many bottlenecks that will limit your database's performance; the disk's > spinning rate is just one of them. Memory size, memory bandwidth, CPU, CPU > cache size and speed, the disk I/O bandwidth in and out, the disk RPM, the > presence of a BBU controller ... any of these can be the bottleneck. If you > focus on the disk's RPM, you may be fixing a bottleneck that you'll never > reach. > > We 12 inexpensive 7K SATA drives with an LSI/3Ware 9650SE and a BBU, and > have been very impressed by the performance. 8 drives in RAID10, two in > RAID1 for the WAL, one for Linux and one spare. This is on an 8-core system > with 12 GB memory: > > pgbench -i -s 100 -U test > pgbench -U test -c ... -t ... > > -c -t TPS > 5 20000 3777 > 10 10000 2622 > 20 5000 3759 > 30 3333 5712 > 40 2500 5953 > 50 2000 6141 those numbers are stupendous for 8 drive sata. how much shared buffers do you have? merlin
On Mon, Mar 5, 2012 at 9:56 AM, Craig James <cjames@emolecules.com> wrote: > On Sun, Mar 4, 2012 at 10:36 AM, Rory Campbell-Lange > <rory@campbell-lange.net> wrote: >> >> We do have complex transactions, but I haven't benchmarked the >> performance so I can't describe it. Few of the databases are at the many >> million row size at the moment, and we are moving to an agressive scheme >> of archiving old data, so we hope to keep things fast. >> >> However I thought 15k disks were a pre-requisite for a fast database >> system, if one can afford them? I assume if all else is equal the 1880 >> controller will run 20-40% faster with 15k disks in a write-heavy >> application. Also I would be grateful to learn if there is a good reason >> not to use 2.5" SATA disks. > > > Without those benchmarks, you can't really say what "fast" means. There are > many bottlenecks that will limit your database's performance; the disk's > spinning rate is just one of them. Memory size, memory bandwidth, CPU, CPU > cache size and speed, the disk I/O bandwidth in and out, the disk RPM, the > presence of a BBU controller ... any of these can be the bottleneck. If you > focus on the disk's RPM, you may be fixing a bottleneck that you'll never > reach. > > We 12 inexpensive 7K SATA drives with an LSI/3Ware 9650SE and a BBU, and > have been very impressed by the performance. 8 drives in RAID10, two in > RAID1 for the WAL, one for Linux and one spare. This is on an 8-core system > with 12 GB memory: > > pgbench -i -s 100 -U test > pgbench -U test -c ... -t ... > > -c -t TPS > 5 20000 3777 > 10 10000 2622 > 20 5000 3759 > 30 3333 5712 > 40 2500 5953 > 50 2000 6141 Just wondering what your -c -t etc settings were, if the tests were long enough to fill up your RAID controllers write cache or not.
On Wed, Mar 7, 2012 at 12:18 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
It's actually 10 disks when you include the RAID1 for the WAL. Here are the non-default performance parameters that might be of interest.
shared_buffers = 1000MB
work_mem = 128MB
synchronous_commit = off
full_page_writes = off
wal_buffers = 256kB
checkpoint_segments = 30
I also do this at boot time (on a 12GB system):
echo 4294967296 >/proc/sys/kernel/shmmax # 4 GB shared memory
echo 4096 >/proc/sys/kernel/shmmni
echo 1572864 >/proc/sys/kernel/shmall # 6 GB max shared mem (block size is 4096 bytes)
We have two of these machines, and their performance is almost identical. One isn't doing much yet, so if you're interested in other benchmarks (that don't take me too long to run), let me know.
Craig
On Mon, Mar 5, 2012 at 10:56 AM, Craig James <cjames@emolecules.com> wrote:
> On Sun, Mar 4, 2012 at 10:36 AM, Rory Campbell-Lange
> <rory@campbell-lange.net> wrote:
>>
>> We do have complex transactions, but I haven't benchmarked the
>> performance so I can't describe it. Few of the databases are at the many
>> million row size at the moment, and we are moving to an agressive scheme
>> of archiving old data, so we hope to keep things fast.
>>
>> However I thought 15k disks were a pre-requisite for a fast database
>> system, if one can afford them? I assume if all else is equal the 1880
>> controller will run 20-40% faster with 15k disks in a write-heavy
>> application. Also I would be grateful to learn if there is a good reason
>> not to use 2.5" SATA disks.
>
>
> Without those benchmarks, you can't really say what "fast" means. There are
> many bottlenecks that will limit your database's performance; the disk's
> spinning rate is just one of them. Memory size, memory bandwidth, CPU, CPU
> cache size and speed, the disk I/O bandwidth in and out, the disk RPM, the
> presence of a BBU controller ... any of these can be the bottleneck. If you
> focus on the disk's RPM, you may be fixing a bottleneck that you'll never
> reach.
>
> We 12 inexpensive 7K SATA drives with an LSI/3Ware 9650SE and a BBU, and
> have been very impressed by the performance. 8 drives in RAID10, two in
> RAID1 for the WAL, one for Linux and one spare. This is on an 8-core system
> with 12 GB memory:
>
> pgbench -i -s 100 -U test
> pgbench -U test -c ... -t ...
>
> -c -t TPS
> 5 20000 3777
> 10 10000 2622
> 20 5000 3759
> 30 3333 5712
> 40 2500 5953
> 50 2000 6141
those numbers are stupendous for 8 drive sata. how much shared
buffers do you have?
It's actually 10 disks when you include the RAID1 for the WAL. Here are the non-default performance parameters that might be of interest.
shared_buffers = 1000MB
work_mem = 128MB
synchronous_commit = off
full_page_writes = off
wal_buffers = 256kB
checkpoint_segments = 30
I also do this at boot time (on a 12GB system):
echo 4294967296 >/proc/sys/kernel/shmmax # 4 GB shared memory
echo 4096 >/proc/sys/kernel/shmmni
echo 1572864 >/proc/sys/kernel/shmall # 6 GB max shared mem (block size is 4096 bytes)
We have two of these machines, and their performance is almost identical. One isn't doing much yet, so if you're interested in other benchmarks (that don't take me too long to run), let me know.
Craig
On 03/07/2012 03:07 PM, Craig James wrote: > echo 4294967296 >/proc/sys/kernel/shmmax # 4 GB shared memory > echo 4096 >/proc/sys/kernel/shmmni > echo 1572864 >/proc/sys/kernel/shmall # 6 GB max shared mem (block size > is 4096 bytes) For what it's worth, you can just make these entries in your /etc/sysctl.conf file and it'll do the same thing a little more cleanly: vm.shmmax = 4294967296 vm.shmmni = 4096 vm.shmall = 1572864 To commit changes made this way: sysctl -p -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@peak6.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
On Wed, Mar 7, 2012 at 10:18 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > those numbers are stupendous for 8 drive sata. how much shared > buffers do you have? Couple of things to notice: 1) The benchmark can run fully in memory, although not 100% in shared_buffers. 2) These are 100k transaction runs, meaning that probably no checkpointing was going on. 3) Given the amount of memory in the server, with dirty flush settings the OS will do mostly sequential writes. Just ran a quick test. With synchronous_commit=off to simulate a BBU I have no trouble hitting 11k tps on a single SATA disk. Seems to be mostly CPU bound on my workstation (Intel i5 2500K @ 3.9GHz, 16GB memory), dirty writes stay in OS buffers, about 220tps/6MBps of traffic to the xlog's, checkpoint dumps everything to OS cache which is then flushed at about 170MB/s (which probably would do nasty things to latency in real world cases). Unlogged tables are give me about 12k tps which seems to confirm mostly CPU bound. So regardless if the benchmark is a good representation of the target workload or not, it definitely isn't benchmarking the IO system. Ants Aasma
On 08/03/12, Ants Aasma (ants.aasma@eesti.ee) wrote: > So regardless if the benchmark is a good representation of the target > workload or not, it definitely isn't benchmarking the IO system. At the risk of hijacking the thread I started, I'd be grateful for comments on the following system IO results. Rather than using pgbench (which Ants responded about above), this uses fio. Our workload is several small databases totalling less than 40GB of disk space. The proposed system has 48GB RAM, 2 * quad core E5620 @ 2.40GHz and 4 WD Raptors behind an LSI SAS card. Is this IO respectable? LSI MegaRAID SAS 9260-8i Firmware: 12.12.0-0090 Kernel: 2.6.39.4 Hard disks: 4x WD6000BLHX Test done on 256GB volume BS = blocksize in bytes RAID 10 -------------------------------------- Read sequential BS MB/s IOPs 512 0129.26 264730.80 1024 0229.75 235273.40 4096 0363.14 092965.50 16384 0475.02 030401.50 65536 0472.79 007564.65 131072 0428.15 003425.20 -------------------------------------- Write sequential BS MB/s IOPs 512 0036.08 073908.00 1024 0065.61 067192.60 4096 0170.15 043560.40 16384 0219.80 014067.57 65536 0240.05 003840.91 131072 0243.96 001951.74 -------------------------------------- Random read BS MB/s IOPs 512 0001.50 003077.20 1024 0002.91 002981.40 4096 0011.59 002968.30 16384 0044.50 002848.28 65536 0156.96 002511.41 131072 0170.65 001365.25 -------------------------------------- Random write BS MB/s IOPs 512 0000.53 001103.60 1024 0001.15 001179.20 4096 0004.43 001135.30 16384 0017.61 001127.56 65536 0061.39 000982.39 131072 0079.27 000634.16 -------------------------------------- -- Rory Campbell-Lange rory@campbell-lange.net Campbell-Lange Workshop www.campbell-lange.net 0207 6311 555 3 Tottenham Street London W1T 2AF Registered in England No. 04551928
On Thu, Mar 8, 2012 at 4:43 AM, Ants Aasma <ants.aasma@eesti.ee> wrote: > On Wed, Mar 7, 2012 at 10:18 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >> those numbers are stupendous for 8 drive sata. how much shared >> buffers do you have? > > Couple of things to notice: > 1) The benchmark can run fully in memory, although not 100% in shared_buffers. > 2) These are 100k transaction runs, meaning that probably no > checkpointing was going on. > 3) Given the amount of memory in the server, with dirty flush > settings the OS will do mostly sequential writes. > > Just ran a quick test. With synchronous_commit=off to simulate a BBU I > have no trouble hitting 11k tps on a single SATA disk. fsync=off might be a better way to simulate a BBU. Cheers, Jeff
Wednesday, March 7, 2012, 11:24:25 PM you wrote: > On 03/07/2012 03:07 PM, Craig James wrote: >> echo 4294967296 >/proc/sys/kernel/shmmax # 4 GB shared memory >> echo 4096 >/proc/sys/kernel/shmmni >> echo 1572864 >/proc/sys/kernel/shmall # 6 GB max shared mem (block size >> is 4096 bytes) > For what it's worth, you can just make these entries in your > /etc/sysctl.conf file and it'll do the same thing a little more cleanly: > vm.shmmax = 4294967296 > vm.shmmni = 4096 > vm.shmall = 1572864 Shouldn't that be: kernel.shmmax = 4294967296 kernel.shmmni = 4096 kernel.shmall = 1572864 -- Jochen Erwied | home: jochen@erwied.eu +49-208-38800-18, FAX: -19 Sauerbruchstr. 17 | work: joe@mbs-software.de +49-2151-7294-24, FAX: -50 D-45470 Muelheim | mobile: jochen.erwied@vodafone.de +49-173-5404164
On 03/08/2012 10:15 AM, Jochen Erwied wrote: > Shouldn't that be: > > kernel.shmmax = 4294967296 > kernel.shmmni = 4096 > kernel.shmall = 1572864 Oops! Yes. That's definitely it. I'm too accustomed to having those set automatically, and then setting these: vm.swappiness = 0 vm.dirty_background_ratio = 1 vm.dirty_ratio = 10 Sorry about that! -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@peak6.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email