Thread: Advice sought : new database server

Advice sought : new database server

From
Rory Campbell-Lange
Date:
I'd be grateful for advice on specifying the new server

We presently have one main database server which is performing well. As
our services expand we are thinking of bringing another database server
to work with it, and back each up via Postgres 9.1 streaming replication
each to a VM server -- at present we are doing pg_dumps twice a day and
using Postgres 8.4.

The existing server is a 2 x Quad core E5420 Xeon (2.5GHz) with 8GB of
RAM with an LSI battery-backed RAID 10 array of 4no 10K SCSI disks,
providing about 230GB of usable storage, 150GB of which is on an LV
providing reconfigurable space for the databases which are served off an
XFS formatted volume.

We presently have 90 databases using around 20GB of disk storage.
However the larger databases are approaching 1GB in size, so in a year I
imagine the disk requirement will have gone up to 40GB for the same
number of databases. The server also serves some web content.

Performance is generally good, although we have a few slow running
queries due to poor plpgsql design. We would get faster performance, I
believe, by providing more RAM. Sorry -- I should have some pg_bench
output to share here.

I believe our existing server together with the new server should be
able to serve 200--300 databases of our existing type, with around 100
databases on our existing server and perhaps 150 on the new one. After
that we would be looking to get a third database server.

I'm presently looking at the following kit:

    1U chassis with 8 2.5" disk bays
    2x Intel Xeon E5630 Quad-Core / 4x 2.53GHz / 12MB cache
    8 channel Areca ARC-1880i (PCI Express x8 card)
      presumably with BBU (can't see it listed at present)
    2 x 300GB SAS  2.5" disks for operating system
      (Possibly also 300GB SATA VelociRaptor/10K RPM/32MB cache
      RAID 1
    4 x 300GB SAS  2.5" storage disks
      RAID 10
    48.0GB DDR3 1333MHz registered ECC (12x 4.0GB modules)

My major question about this chassis, which is 1U, is that it only takes
2.5" disks, and presently the supplier does not show 15K SAS disk
options. Assuming that I can get the BBU for the Areca card, and that
15K SAS disks are available, I'd be grateful for comments on this
configuration.

Regards
Rory

--
Rory Campbell-Lange
rory@campbell-lange.net

Campbell-Lange Workshop
www.campbell-lange.net
0207 6311 555
3 Tottenham Street London W1T 2AF
Registered in England No. 04551928

Re: Advice sought : new database server

From
Michael Friedl
Date:
Hey!

On 04.03.2012 10:58, Rory Campbell-Lange wrote:
>     1U chassis with 8 2.5" disk bays
>     2x Intel Xeon E5630 Quad-Core / 4x 2.53GHz / 12MB cache
>     8 channel Areca ARC-1880i (PCI Express x8 card)
>       presumably with BBU (can't see it listed at present)
>     2 x 300GB SAS  2.5" disks for operating system
>       (Possibly also 300GB SATA VelociRaptor/10K RPM/32MB cache
>       RAID 1
>     4 x 300GB SAS  2.5" storage disks
>       RAID 10
>     48.0GB DDR3 1333MHz registered ECC (12x 4.0GB modules)
>

Sorry, no answer for your question and a bit offtopic.


Why do you take SAS disks for the OS and not much cheaper SATA ones?


Im currently trying to get some informations together on this.


Regards,
Michi


Re: Advice sought : new database server

From
Scott Marlowe
Date:
On Sun, Mar 4, 2012 at 2:58 AM, Rory Campbell-Lange
<rory@campbell-lange.net> wrote:
> I'd be grateful for advice on specifying the new server
>
> We presently have one main database server which is performing well. As
> our services expand we are thinking of bringing another database server
> to work with it, and back each up via Postgres 9.1 streaming replication
> each to a VM server -- at present we are doing pg_dumps twice a day and
> using Postgres 8.4.
>
> The existing server is a 2 x Quad core E5420 Xeon (2.5GHz) with 8GB of
> RAM with an LSI battery-backed RAID 10 array of 4no 10K SCSI disks,
> providing about 230GB of usable storage, 150GB of which is on an LV
> providing reconfigurable space for the databases which are served off an
> XFS formatted volume.
>
> We presently have 90 databases using around 20GB of disk storage.
> However the larger databases are approaching 1GB in size, so in a year I
> imagine the disk requirement will have gone up to 40GB for the same
> number of databases. The server also serves some web content.
>
> Performance is generally good, although we have a few slow running
> queries due to poor plpgsql design. We would get faster performance, I
> believe, by providing more RAM. Sorry -- I should have some pg_bench
> output to share here.

RAM is always a good thing, and it's cheap enough that you can throw
32 or 64G at a machine like this pretty cheaply.

> I believe our existing server together with the new server should be
> able to serve 200--300 databases of our existing type, with around 100
> databases on our existing server and perhaps 150 on the new one. After
> that we would be looking to get a third database server.
>
> I'm presently looking at the following kit:
>
>    1U chassis with 8 2.5" disk bays
>    2x Intel Xeon E5630 Quad-Core / 4x 2.53GHz / 12MB cache
>    8 channel Areca ARC-1880i (PCI Express x8 card)
>      presumably with BBU (can't see it listed at present)
>    2 x 300GB SAS  2.5" disks for operating system
>      (Possibly also 300GB SATA VelociRaptor/10K RPM/32MB cache
>      RAID 1
>    4 x 300GB SAS  2.5" storage disks
>      RAID 10
>    48.0GB DDR3 1333MHz registered ECC (12x 4.0GB modules)
>
> My major question about this chassis, which is 1U, is that it only takes
> 2.5" disks, and presently the supplier does not show 15K SAS disk
> options. Assuming that I can get the BBU for the Areca card, and that
> 15K SAS disks are available, I'd be grateful for comments on this
> configuration.

The 15k RPM disks aren't that big of a deal unless you're pushing the
bleeding edge on a transactional system. I'm gonna take a wild guess
that you're not doing heavy transactions, in which case, the BBU on
the areca is the single most important thing for you to get for good
performance.  The areca 1880 is a great controller and is much much
easier to configure than the LSI.  Performance wise it's one of the
fastest DAS controllers made.

If the guys you're looking at getting this from can't do custom
orders, find a white box dealer who can, like www.aberdeeninc.com.  It
might not be on their site, but they can build dang near anything you
want.

Re: Advice sought : new database server

From
Rory Campbell-Lange
Date:
On 04/03/12, Scott Marlowe (scott.marlowe@gmail.com) wrote:
> On Sun, Mar 4, 2012 at 2:58 AM, Rory Campbell-Lange
> <rory@campbell-lange.net> wrote:

> > [About existing server...] We would get faster performance, I
> > believe, by providing more RAM. Sorry -- I should have some pg_bench
> > output to share here.
>
> RAM is always a good thing, and it's cheap enough that you can throw
> 32 or 64G at a machine like this pretty cheaply.

Thanks for your note.

> >    1U chassis with 8 2.5" disk bays
> >    2x Intel Xeon E5630 Quad-Core / 4x 2.53GHz / 12MB cache
> >    8 channel Areca ARC-1880i (PCI Express x8 card)
> >      presumably with BBU (can't see it listed at present)
> >    2 x 300GB SAS  2.5" disks for operating system
> >      (Possibly also 300GB SATA VelociRaptor/10K RPM/32MB cache
> >      RAID 1
> >    4 x 300GB SAS  2.5" storage disks
> >      RAID 10
> >    48.0GB DDR3 1333MHz registered ECC (12x 4.0GB modules)
> >
> > My major question about this chassis, which is 1U, is that it only takes
> > 2.5" disks, and presently the supplier does not show 15K SAS disk
> > options. Assuming that I can get the BBU for the Areca card, and that
> > 15K SAS disks are available, I'd be grateful for comments on this
> > configuration.
>
> The 15k RPM disks aren't that big of a deal unless you're pushing the
> bleeding edge on a transactional system. I'm gonna take a wild guess
> that you're not doing heavy transactions, in which case, the BBU on
> the areca is the single most important thing for you to get for good
> performance.  The areca 1880 is a great controller and is much much
> easier to configure than the LSI.  Performance wise it's one of the
> fastest DAS controllers made.

We do have complex transactions, but I haven't benchmarked the
performance so I can't describe it. Few of the databases are at the many
million row size at the moment, and we are moving to an agressive scheme
of archiving old data, so we hope to keep things fast.

However I thought 15k disks were a pre-requisite for a fast database
system, if one can afford them? I assume if all else is equal the 1880
controller will run 20-40% faster with 15k disks in a write-heavy
application. Also I would be grateful to learn if there is a good reason
not to use 2.5" SATA disks.

> If the guys you're looking at getting this from can't do custom
> orders, find a white box dealer who can, like www.aberdeeninc.com.  It
> might not be on their site, but they can build dang near anything you
> want.

Thanks for the note about Aberdeen. I've seen the advertisements, but
not tried them yet.

Thanks for your comments
Rory

--
Rory Campbell-Lange
rory@campbell-lange.net

Campbell-Lange Workshop
www.campbell-lange.net
0207 6311 555
3 Tottenham Street London W1T 2AF
Registered in England No. 04551928

Re: Advice sought : new database server

From
Andy Colson
Date:
On 03/04/2012 03:58 AM, Rory Campbell-Lange wrote:
> I'd be grateful for advice on specifying the new server
>
> providing about 230GB of usable storage, 150GB of which is on an LV
> providing reconfigurable space for the databases which are served off an
> XFS formatted volume.
>

Do you mean LVM?  I've heard that LVM limits IO, so if you want full speed you might wanna drop LVM.  (And XFS supports
increasingfs size, and when are you ever really gonna want to decrease fs size?). 

-Andy

Re: Advice sought : new database server

From
Scott Marlowe
Date:
On Sun, Mar 4, 2012 at 11:36 AM, Rory Campbell-Lange
<rory@campbell-lange.net> wrote:
> On 04/03/12, Scott Marlowe (scott.marlowe@gmail.com) wrote:

>> The 15k RPM disks aren't that big of a deal unless you're pushing the
>> bleeding edge on a transactional system. I'm gonna take a wild guess
>> that you're not doing heavy transactions, in which case, the BBU on
>> the areca is the single most important thing for you to get for good
>> performance.  The areca 1880 is a great controller and is much much
>> easier to configure than the LSI.  Performance wise it's one of the
>> fastest DAS controllers made.
>
> We do have complex transactions, but I haven't benchmarked the
> performance so I can't describe it.

Yeah try to get a measurement of how many transactions per second
you're running at peak load, and if you're currently IO bound or CPU
bound.

> Few of the databases are at the many
> million row size at the moment, and we are moving to an agressive scheme
> of archiving old data, so we hope to keep things fast.

The key here is that your whole db can fit into memory.  48G is
cutting it close if you're figuring on being at 40G in a year.  I'd
spec it out with 96G to start.  That way if you want to set work_mem
to 8 or 16M you can without worrying about running the machine out of
memory / scramming your OS file system cache with a few large queries
etc.

> However I thought 15k disks were a pre-requisite for a fast database
> system, if one can afford them?

The heads have to seek, settle and THEN you ahve to wait for the
platters to rotate under the head i.e. latency.

> I assume if all else is equal the 1880
> controller will run 20-40% faster with 15k disks in a write-heavy
> application.
> Also I would be grateful to learn if there is a good reason
> not to use 2.5" SATA disks.

The 10k 2.5 seagate drives have combined seek and latency figures of
about 7ms, while the15k 2.5 seagate drives have a combined time of
about 5ms.  Even the fastest 3.5" seagates average 6ms average seek
time, but with short stroking can get down to 4 or 5.

Now all of this becomes moot if you compare them to SSDs, where the
seek settle time is measured in microseconds or lower.  The fastest
spinning drive will look like a garbage truck next to the formula one
car that is the SSD.  Until recently incompatabilites with RAID
controllers and firmware bugs kept most SSDs out of the hosting
center, or made the ones you could get horrifically expensive.  The
newest generations of SSDs though seem to be working pretty well.

>> If the guys you're looking at getting this from can't do custom
>> orders, find a white box dealer who can, like www.aberdeeninc.com.  It
>> might not be on their site, but they can build dang near anything you
>> want.
>
> Thanks for the note about Aberdeen. I've seen the advertisements, but
> not tried them yet.

There's lots of others to choose from.  In the past I've gotten
fantastic customer service from aberdeen, and they've never steered me
wrong.  I've had my sales guy simply refuse to sell me a particular
drive because the failure rate was too high in the field, etc.  They
cross ship RAID cards overnight, and can build truly huge DAS servers
if you need them.  Like a lot of white box guys they specialize more
in large storage arrays and virualization hardware, but there's a lot
of similarity between that class of machine and a db server.

Re: Advice sought : new database server

From
Scott Marlowe
Date:
On Sun, Mar 4, 2012 at 12:45 PM, Andy Colson <andy@squeakycode.net> wrote:
> On 03/04/2012 03:58 AM, Rory Campbell-Lange wrote:
>>
>> I'd be grateful for advice on specifying the new server
>>
>> providing about 230GB of usable storage, 150GB of which is on an LV
>> providing reconfigurable space for the databases which are served off an
>> XFS formatted volume.
>>
>
> Do you mean LVM?  I've heard that LVM limits IO, so if you want full speed
> you might wanna drop LVM.  (And XFS supports increasing fs size, and when
> are you ever really gonna want to decrease fs size?).

It certainly did in the past, I don't know if anyone's done any
conclusive testing on in recently, but circa 2005 to 2008 we were
running RHEL 4 and LVM limited the machine by quite a bit, with max
sequential throughput dropping off by 50% or more on bigger ios
subsystems.  I.e. a 600MB/s system would be lucky to hit 300MB/s with
a LV on top.

Re: Advice sought : new database server

From
Rory Campbell-Lange
Date:
On 04/03/12, Scott Marlowe (scott.marlowe@gmail.com) wrote:
> On Sun, Mar 4, 2012 at 11:36 AM, Rory Campbell-Lange
> <rory@campbell-lange.net> wrote:
> > On 04/03/12, Scott Marlowe (scott.marlowe@gmail.com) wrote:
...
[Description of system with 2 * 4 core Xeons, 8GB RAM, LSI card with
4*15K SCSI drives in R10. We are looking for a new server to partner
with this one.]
...

> > We do have complex transactions, but I haven't benchmarked the
> > performance so I can't describe it.
>
> Yeah try to get a measurement of how many transactions per second
> you're running at peak load, and if you're currently IO bound or CPU
> bound.

Our existing server rarely goes above 7% sustained IO according to SAR.
Similarly, CPU at peak times is at 5-7% on the SAR average (across all 8
cores). I'm not clear on how to read the memory stats, but the average
kbcommit value for this morning's work is 12420282 which (assuming it
means about 12GB memory) is 4GB more than physical RAM. However the
system never swaps, probably due to our rather parsimonious postgres
memory settings.

> > Few of the databases are at the many
> > million row size at the moment, and we are moving to an agressive scheme
> > of archiving old data, so we hope to keep things fast.
>
> The key here is that your whole db can fit into memory.  48G is
> cutting it close if you're figuring on being at 40G in a year.  I'd
> spec it out with 96G to start.  That way if you want to set work_mem
> to 8 or 16M you can without worrying about running the machine out of
> memory / scramming your OS file system cache with a few large queries
> etc.

Thanks for this excellent point.

> > However I thought 15k disks were a pre-requisite for a fast database
> > system, if one can afford them?
>
> The heads have to seek, settle and THEN you ahve to wait for the
> platters to rotate under the head i.e. latency.
>
> > I assume if all else is equal the 1880
> > controller will run 20-40% faster with 15k disks in a write-heavy
> > application.
> > Also I would be grateful to learn if there is a good reason
> > not to use 2.5" SATA disks.
>
> The 10k 2.5 seagate drives have combined seek and latency figures of
> about 7ms, while the15k 2.5 seagate drives have a combined time of
> about 5ms.  Even the fastest 3.5" seagates average 6ms average seek
> time, but with short stroking can get down to 4 or 5.
>
> Now all of this becomes moot if you compare them to SSDs, where the
> seek settle time is measured in microseconds or lower.  The fastest
> spinning drive will look like a garbage truck next to the formula one
> car that is the SSD.  Until recently incompatabilites with RAID
> controllers and firmware bugs kept most SSDs out of the hosting
> center, or made the ones you could get horrifically expensive.  The
> newest generations of SSDs though seem to be working pretty well.

From your comments it appears there are 3 options:

    1. Card + BBU + SAS disks (10K/15K doesnt matter) + lots of RAM
    2. Card + BBU + Raptors + lots of RAM
    3. SSDs + lots of RAM

Is this correct? If my databases are unlikely to be IO bound might it not
be better to go for cheaper drive subsystems (i.e. option 2) + lots of
RAM, or alternatively SSDs based on the fact that we don't require much
storage space? I am unclear of what the options are on the
highly-reliable SSD front, and how to RAID SSD systems.

An ancillary point is that our systems are designed to have more rather
than fewer databases so that we can scale easily horizontally.

--
Rory Campbell-Lange
rory@campbell-lange.net

Campbell-Lange Workshop
www.campbell-lange.net
0207 6311 555
3 Tottenham Street London W1T 2AF
Registered in England No. 04551928

Re: Advice sought : new database server

From
Craig James
Date:
On Sun, Mar 4, 2012 at 10:36 AM, Rory Campbell-Lange <rory@campbell-lange.net> wrote:
We do have complex transactions, but I haven't benchmarked the
performance so I can't describe it. Few of the databases are at the many
million row size at the moment, and we are moving to an agressive scheme
of archiving old data, so we hope to keep things fast.

However I thought 15k disks were a pre-requisite for a fast database
system, if one can afford them? I assume if all else is equal the 1880
controller will run 20-40% faster with 15k disks in a write-heavy
application. Also I would be grateful to learn if there is a good reason
not to use 2.5" SATA disks.

Without those benchmarks, you can't really say what "fast" means.  There are many bottlenecks that will limit your database's performance; the disk's spinning rate is just one of them.  Memory size, memory bandwidth, CPU, CPU cache size and speed, the disk I/O bandwidth in and out, the disk RPM, the presence of a BBU controller ... any of these can be the bottleneck.  If you focus on the disk's RPM, you may be fixing a bottleneck that you'll never reach.

We 12 inexpensive 7K SATA drives with an LSI/3Ware 9650SE and a BBU, and have been very impressed by the performance.  8 drives in RAID10, two in RAID1 for the WAL, one for Linux and one spare.  This is on an 8-core system with 12 GB memory:

pgbench -i -s 100 -U test
pgbench -U test -c ... -t ...

-c  -t     TPS
5   20000  3777
10  10000  2622
20  5000   3759
30  3333   5712
40  2500   5953
50  2000   6141

Craig

Re: Advice sought : new database server

From
Rory Campbell-Lange
Date:
On 05/03/12, Craig James (cjames@emolecules.com) wrote:
> On Sun, Mar 4, 2012 at 10:36 AM, Rory Campbell-Lange <
> rory@campbell-lange.net> wrote:
>
> > We do have complex transactions, but I haven't benchmarked the
> > performance so I can't describe it. Few of the databases are at the many
> > million row size at the moment, and we are moving to an agressive scheme
> > of archiving old data, so we hope to keep things fast.
> >
> > However I thought 15k disks were a pre-requisite for a fast database
> > system, if one can afford them? I assume if all else is equal the 1880
> > controller will run 20-40% faster with 15k disks in a write-heavy
> > application. Also I would be grateful to learn if there is a good reason
> > not to use 2.5" SATA disks.
>
> Without those benchmarks, you can't really say what "fast" means.  There
> are many bottlenecks that will limit your database's performance; the
> disk's spinning rate is just one of them.  Memory size, memory bandwidth,
> CPU, CPU cache size and speed, the disk I/O bandwidth in and out, the disk
> RPM, the presence of a BBU controller ... any of these can be the
> bottleneck.  If you focus on the disk's RPM, you may be fixing a bottleneck
> that you'll never reach.
>
> We 12 inexpensive 7K SATA drives with an LSI/3Ware 9650SE and a BBU, and
> have been very impressed by the performance.  8 drives in RAID10, two in
> RAID1 for the WAL, one for Linux and one spare.  This is on an 8-core
> system with 12 GB memory:
>
> pgbench -i -s 100 -U test
> pgbench -U test -c ... -t ...
>
> -c  -t     TPS
> 5   20000  3777
> 10  10000  2622
> 20  5000   3759
> 30  3333   5712
> 40  2500   5953
> 50  2000   6141

Thanks for this quick guide to using pgbenc. My 4-year old SCSI server
with 4 RAID10 disks behind an LSI card achieved the following on a
contended system:

-c  -t     TPS
5   20000  446
10  10000  542
20   5000  601
30   3333  647

These results seem pretty lousy in comparison to yours. Interesting.

--
Rory Campbell-Lange
rory@campbell-lange.net

Campbell-Lange Workshop
www.campbell-lange.net
0207 6311 555
3 Tottenham Street London W1T 2AF
Registered in England No. 04551928

Re: Advice sought : new database server

From
Rory Campbell-Lange
Date:
On 04/03/12, Rory Campbell-Lange (rory@campbell-lange.net) wrote:
> I'd be grateful for advice on specifying a new server
>
...

> The existing server is a 2 x Quad core E5420 Xeon (2.5GHz) with 8GB of
> RAM with an LSI battery-backed RAID 10 array of 4no 10K SCSI disks,
> providing about 230GB of usable storage, 150GB of which is on an LV
> providing reconfigurable space for the databases which are served off an
> XFS formatted volume.

In conversation on the list I've established that our current server
(while fine for our needs) isn't performing terribly well. It could do
with more RAM and the disk IO seems slow.

That said, I'm keen to buy a new server to improve on the current
performance, so I've taken the liberty of replying here to my initial
mail to ask specifically about new server recommendations. The initial
plan was to share some of the load between the current and new server,
and to buy something along the following lines:

>     1U chassis with 8 2.5" disk bays
>     2x Intel Xeon E5630 Quad-Core / 4x 2.53GHz / 12MB cache
>     8 channel Areca ARC-1880i (PCI Express x8 card)
>       presumably with BBU (can't see it listed at present)
>     2 x 300GB SAS  2.5" disks for operating system
>       (Possibly also 300GB SATA VelociRaptor/10K RPM/32MB cache
>       RAID 1
>     4 x 300GB SAS  2.5" storage disks
>       RAID 10
>     48.0GB DDR3 1333MHz registered ECC (12x 4.0GB modules)

However, after comments on the list, I realise I could get two servers
with the following specs for the same price as the above:

    2x Intel Xeon E5620 Quad-Core / 4x 2.40GHz / 12MB cache
    48.0GB DDR3 1066MHz registered ECC
    4 channel Areca ARC-1212 (PCI Express x4 card) + BBU
    4 x WD Raptors in RAID 10 (in 3.5" adapters)

In other words, for GBP 5k I can get two servers that may better meet
between them my requirements (lots of memory, reasonably fast disks)
than a single server. A salient point is that individual databases are
currently less than 1GB in size but will grow perhaps to be 2GB over the
coming 18 months. The aim would be to contain all of the databases in
memory on each server.

I'd be very grateful for comments on this strategy.

Rory

--
Rory Campbell-Lange
rory@campbell-lange.net

Campbell-Lange Workshop
www.campbell-lange.net
0207 6311 555
3 Tottenham Street London W1T 2AF
Registered in England No. 04551928

Re: Advice sought : new database server

From
Steve Crawford
Date:
On 03/04/2012 03:50 AM, Michael Friedl wrote:
> Hey!
>
> On 04.03.2012 10:58, Rory Campbell-Lange wrote:
>>      1U chassis with 8 2.5" disk bays
>>      2x Intel Xeon E5630 Quad-Core / 4x 2.53GHz / 12MB cache
>>      8 channel Areca ARC-1880i (PCI Express x8 card)
>>        presumably with BBU (can't see it listed at present)
>>      2 x 300GB SAS  2.5" disks for operating system
>>        (Possibly also 300GB SATA VelociRaptor/10K RPM/32MB cache
>>        RAID 1
>>      4 x 300GB SAS  2.5" storage disks
>>        RAID 10
>>      48.0GB DDR3 1333MHz registered ECC (12x 4.0GB modules)
>>
> Sorry, no answer for your question and a bit offtopic.
>
>
> Why do you take SAS disks for the OS and not much cheaper SATA ones?
>
>
>

Here's Intel's (very general) take. Your OS disks may not justify SAS on
performance alone but other aspects may sway you.
http://www.intel.com/support/motherboards/server/sb/CS-031831.htm

Cheers,
Steve

Re: Advice sought : new database server

From
Merlin Moncure
Date:
On Mon, Mar 5, 2012 at 10:56 AM, Craig James <cjames@emolecules.com> wrote:
> On Sun, Mar 4, 2012 at 10:36 AM, Rory Campbell-Lange
> <rory@campbell-lange.net> wrote:
>>
>> We do have complex transactions, but I haven't benchmarked the
>> performance so I can't describe it. Few of the databases are at the many
>> million row size at the moment, and we are moving to an agressive scheme
>> of archiving old data, so we hope to keep things fast.
>>
>> However I thought 15k disks were a pre-requisite for a fast database
>> system, if one can afford them? I assume if all else is equal the 1880
>> controller will run 20-40% faster with 15k disks in a write-heavy
>> application. Also I would be grateful to learn if there is a good reason
>> not to use 2.5" SATA disks.
>
>
> Without those benchmarks, you can't really say what "fast" means.  There are
> many bottlenecks that will limit your database's performance; the disk's
> spinning rate is just one of them.  Memory size, memory bandwidth, CPU, CPU
> cache size and speed, the disk I/O bandwidth in and out, the disk RPM, the
> presence of a BBU controller ... any of these can be the bottleneck.  If you
> focus on the disk's RPM, you may be fixing a bottleneck that you'll never
> reach.
>
> We 12 inexpensive 7K SATA drives with an LSI/3Ware 9650SE and a BBU, and
> have been very impressed by the performance.  8 drives in RAID10, two in
> RAID1 for the WAL, one for Linux and one spare.  This is on an 8-core system
> with 12 GB memory:
>
> pgbench -i -s 100 -U test
> pgbench -U test -c ... -t ...
>
> -c  -t     TPS
> 5   20000  3777
> 10  10000  2622
> 20  5000   3759
> 30  3333   5712
> 40  2500   5953
> 50  2000   6141


those numbers are stupendous for 8 drive sata.  how much shared
buffers do you have?

merlin

Re: Advice sought : new database server

From
Scott Marlowe
Date:
On Mon, Mar 5, 2012 at 9:56 AM, Craig James <cjames@emolecules.com> wrote:
> On Sun, Mar 4, 2012 at 10:36 AM, Rory Campbell-Lange
> <rory@campbell-lange.net> wrote:
>>
>> We do have complex transactions, but I haven't benchmarked the
>> performance so I can't describe it. Few of the databases are at the many
>> million row size at the moment, and we are moving to an agressive scheme
>> of archiving old data, so we hope to keep things fast.
>>
>> However I thought 15k disks were a pre-requisite for a fast database
>> system, if one can afford them? I assume if all else is equal the 1880
>> controller will run 20-40% faster with 15k disks in a write-heavy
>> application. Also I would be grateful to learn if there is a good reason
>> not to use 2.5" SATA disks.
>
>
> Without those benchmarks, you can't really say what "fast" means.  There are
> many bottlenecks that will limit your database's performance; the disk's
> spinning rate is just one of them.  Memory size, memory bandwidth, CPU, CPU
> cache size and speed, the disk I/O bandwidth in and out, the disk RPM, the
> presence of a BBU controller ... any of these can be the bottleneck.  If you
> focus on the disk's RPM, you may be fixing a bottleneck that you'll never
> reach.
>
> We 12 inexpensive 7K SATA drives with an LSI/3Ware 9650SE and a BBU, and
> have been very impressed by the performance.  8 drives in RAID10, two in
> RAID1 for the WAL, one for Linux and one spare.  This is on an 8-core system
> with 12 GB memory:
>
> pgbench -i -s 100 -U test
> pgbench -U test -c ... -t ...
>
> -c  -t     TPS
> 5   20000  3777
> 10  10000  2622
> 20  5000   3759
> 30  3333   5712
> 40  2500   5953
> 50  2000   6141

Just wondering what your -c -t etc settings were, if the tests were
long enough to fill up your RAID controllers write cache or not.

Re: Advice sought : new database server

From
Craig James
Date:
On Wed, Mar 7, 2012 at 12:18 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Mon, Mar 5, 2012 at 10:56 AM, Craig James <cjames@emolecules.com> wrote:
> On Sun, Mar 4, 2012 at 10:36 AM, Rory Campbell-Lange
> <rory@campbell-lange.net> wrote:
>>
>> We do have complex transactions, but I haven't benchmarked the
>> performance so I can't describe it. Few of the databases are at the many
>> million row size at the moment, and we are moving to an agressive scheme
>> of archiving old data, so we hope to keep things fast.
>>
>> However I thought 15k disks were a pre-requisite for a fast database
>> system, if one can afford them? I assume if all else is equal the 1880
>> controller will run 20-40% faster with 15k disks in a write-heavy
>> application. Also I would be grateful to learn if there is a good reason
>> not to use 2.5" SATA disks.
>
>
> Without those benchmarks, you can't really say what "fast" means.  There are
> many bottlenecks that will limit your database's performance; the disk's
> spinning rate is just one of them.  Memory size, memory bandwidth, CPU, CPU
> cache size and speed, the disk I/O bandwidth in and out, the disk RPM, the
> presence of a BBU controller ... any of these can be the bottleneck.  If you
> focus on the disk's RPM, you may be fixing a bottleneck that you'll never
> reach.
>
> We 12 inexpensive 7K SATA drives with an LSI/3Ware 9650SE and a BBU, and
> have been very impressed by the performance.  8 drives in RAID10, two in
> RAID1 for the WAL, one for Linux and one spare.  This is on an 8-core system
> with 12 GB memory:
>
> pgbench -i -s 100 -U test
> pgbench -U test -c ... -t ...
>
> -c  -t     TPS
> 5   20000  3777
> 10  10000  2622
> 20  5000   3759
> 30  3333   5712
> 40  2500   5953
> 50  2000   6141


those numbers are stupendous for 8 drive sata.  how much shared
buffers do you have?

It's actually 10 disks when you include the RAID1 for the WAL.  Here are the non-default performance parameters that might be of interest.

shared_buffers = 1000MB
work_mem = 128MB
synchronous_commit = off
full_page_writes = off
wal_buffers = 256kB
checkpoint_segments = 30

I also do this at boot time (on a 12GB system):

echo 4294967296 >/proc/sys/kernel/shmmax        # 4 GB shared memory
echo 4096      >/proc/sys/kernel/shmmni
echo 1572864   >/proc/sys/kernel/shmall         # 6 GB max shared mem (block size is 4096 bytes)

We have two of these machines, and their performance is almost identical.  One isn't doing much yet, so if you're interested in other benchmarks (that don't take me too long to run), let me know.

Craig

Re: Advice sought : new database server

From
Shaun Thomas
Date:
On 03/07/2012 03:07 PM, Craig James wrote:

> echo 4294967296 >/proc/sys/kernel/shmmax # 4 GB shared memory
> echo 4096 >/proc/sys/kernel/shmmni
> echo 1572864 >/proc/sys/kernel/shmall # 6 GB max shared mem (block size
> is 4096 bytes)

For what it's worth, you can just make these entries in your
/etc/sysctl.conf file and it'll do the same thing a little more cleanly:

vm.shmmax = 4294967296
vm.shmmni = 4096
vm.shmall = 1572864

To commit changes made this way:

sysctl -p

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@peak6.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Re: Advice sought : new database server

From
Ants Aasma
Date:
On Wed, Mar 7, 2012 at 10:18 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> those numbers are stupendous for 8 drive sata.  how much shared
> buffers do you have?

Couple of things to notice:
1) The benchmark can run fully in memory, although not 100% in shared_buffers.
2) These are 100k transaction runs, meaning that probably no
checkpointing was going on.
3) Given the amount of memory in the server, with  dirty flush
settings the OS will do mostly sequential writes.

Just ran a quick test. With synchronous_commit=off to simulate a BBU I
have no trouble hitting 11k tps on a single SATA disk. Seems to be
mostly CPU bound on my workstation (Intel i5 2500K @ 3.9GHz, 16GB
memory), dirty writes stay in OS buffers, about 220tps/6MBps of
traffic to the xlog's, checkpoint dumps everything to OS cache which
is then flushed at about 170MB/s (which probably would do nasty things
to latency in real world cases). Unlogged tables are give me about 12k
tps which seems to confirm mostly CPU bound.

So regardless if the benchmark is a good representation of the target
workload or not, it definitely isn't benchmarking the IO system.

Ants Aasma

Re: Advice sought : new database server

From
Rory Campbell-Lange
Date:
On 08/03/12, Ants Aasma (ants.aasma@eesti.ee) wrote:

> So regardless if the benchmark is a good representation of the target
> workload or not, it definitely isn't benchmarking the IO system.

At the risk of hijacking the thread I started, I'd be grateful for
comments on the following system IO results. Rather than using pgbench
(which Ants responded about above), this uses fio. Our workload is
several small databases totalling less than 40GB of disk space. The
proposed system has 48GB RAM, 2 * quad core E5620 @ 2.40GHz and 4 WD
Raptors behind an LSI SAS card. Is this IO respectable?

LSI MegaRAID SAS 9260-8i
Firmware: 12.12.0-0090
Kernel: 2.6.39.4
Hard disks: 4x WD6000BLHX
Test done on 256GB volume
BS = blocksize in bytes


RAID 10
--------------------------------------
Read sequential

    BS           MB/s             IOPs
   512        0129.26        264730.80
  1024        0229.75        235273.40
  4096        0363.14        092965.50
 16384        0475.02        030401.50
 65536        0472.79        007564.65
131072        0428.15        003425.20
--------------------------------------
Write sequential

    BS           MB/s             IOPs
   512        0036.08        073908.00
  1024        0065.61        067192.60
  4096        0170.15        043560.40
 16384        0219.80        014067.57
 65536        0240.05        003840.91
131072        0243.96        001951.74
--------------------------------------
Random read

    BS           MB/s             IOPs
   512        0001.50        003077.20
  1024        0002.91        002981.40
  4096        0011.59        002968.30
 16384        0044.50        002848.28
 65536        0156.96        002511.41
131072        0170.65        001365.25
--------------------------------------
Random write

    BS           MB/s             IOPs
   512        0000.53        001103.60
  1024        0001.15        001179.20
  4096        0004.43        001135.30
 16384        0017.61        001127.56
 65536        0061.39        000982.39
131072        0079.27        000634.16
--------------------------------------


--
Rory Campbell-Lange
rory@campbell-lange.net

Campbell-Lange Workshop
www.campbell-lange.net
0207 6311 555
3 Tottenham Street London W1T 2AF
Registered in England No. 04551928

Re: Advice sought : new database server

From
Jeff Janes
Date:
On Thu, Mar 8, 2012 at 4:43 AM, Ants Aasma <ants.aasma@eesti.ee> wrote:
> On Wed, Mar 7, 2012 at 10:18 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> those numbers are stupendous for 8 drive sata.  how much shared
>> buffers do you have?
>
> Couple of things to notice:
> 1) The benchmark can run fully in memory, although not 100% in shared_buffers.
> 2) These are 100k transaction runs, meaning that probably no
> checkpointing was going on.
> 3) Given the amount of memory in the server, with  dirty flush
> settings the OS will do mostly sequential writes.
>
> Just ran a quick test. With synchronous_commit=off to simulate a BBU I
> have no trouble hitting 11k tps on a single SATA disk.

fsync=off might be a better way to simulate a BBU.

Cheers,

Jeff

Re: Advice sought : new database server

From
Jochen Erwied
Date:
Wednesday, March 7, 2012, 11:24:25 PM you wrote:

> On 03/07/2012 03:07 PM, Craig James wrote:

>> echo 4294967296 >/proc/sys/kernel/shmmax # 4 GB shared memory
>> echo 4096 >/proc/sys/kernel/shmmni
>> echo 1572864 >/proc/sys/kernel/shmall # 6 GB max shared mem (block size
>> is 4096 bytes)

> For what it's worth, you can just make these entries in your
> /etc/sysctl.conf file and it'll do the same thing a little more cleanly:

> vm.shmmax = 4294967296
> vm.shmmni = 4096
> vm.shmall = 1572864

Shouldn't that be:

kernel.shmmax = 4294967296
kernel.shmmni = 4096
kernel.shmall = 1572864

--
Jochen Erwied     |   home: jochen@erwied.eu     +49-208-38800-18, FAX: -19
Sauerbruchstr. 17 |   work: joe@mbs-software.de  +49-2151-7294-24, FAX: -50
D-45470 Muelheim  | mobile: jochen.erwied@vodafone.de       +49-173-5404164


Re: Advice sought : new database server

From
Shaun Thomas
Date:
On 03/08/2012 10:15 AM, Jochen Erwied wrote:

> Shouldn't that be:
>
> kernel.shmmax = 4294967296
> kernel.shmmni = 4096
> kernel.shmall = 1572864

Oops! Yes. That's definitely it. I'm too accustomed to having those set
automatically, and then setting these:

vm.swappiness = 0
vm.dirty_background_ratio = 1
vm.dirty_ratio = 10

Sorry about that!

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@peak6.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email