Thread: Comments requested on IO performance : new db server

Comments requested on IO performance : new db server

From
Rory Campbell-Lange
Date:
I've taken the liberty of reposting this message as my addendum to a
long thread that I started on the subject of adding a new db server to
our existing 4-year old workhorse got lost in discussion.

Our workload is several small databases totalling less than 40GB of disk
space. The proposed system has 48GB RAM, 2 * quad core E5620 @ 2.40GHz
and 4 WD Raptors behind an LSI SAS card. Our supplier has just run a set
of tests on the machine we intend to buy. The test rig had the following
setup:

LSI MegaRAID SAS 9260-8i
Firmware: 12.12.0-0090
Kernel: 2.6.39.4
Hard disks: 4x WD6000BLHX
Test done on 256GB volume
BS = blocksize in bytes

The test tool is fio. I'd be grateful to know if the results below are
considered acceptable. An ancillary question is whether a 4096 block
size is a good idea. I suppose we will be using XFS which I understand
has a default block size of 4096 bytes.

RAID 10
--------------------------------------
Read sequential

    BS           MB/s             IOPs
   512        0129.26        264730.80
  1024        0229.75        235273.40
  4096        0363.14        092965.50
 16384        0475.02        030401.50
 65536        0472.79        007564.65
131072        0428.15        003425.20
--------------------------------------
Write sequential

    BS           MB/s             IOPs
   512        0036.08        073908.00
  1024        0065.61        067192.60
  4096        0170.15        043560.40
 16384        0219.80        014067.57
 65536        0240.05        003840.91
131072        0243.96        001951.74
--------------------------------------
Random read

    BS           MB/s             IOPs
   512        0001.50        003077.20
  1024        0002.91        002981.40
  4096        0011.59        002968.30
 16384        0044.50        002848.28
 65536        0156.96        002511.41
131072        0170.65        001365.25
--------------------------------------
Random write

    BS           MB/s             IOPs
   512        0000.53        001103.60
  1024        0001.15        001179.20
  4096        0004.43        001135.30
 16384        0017.61        001127.56
 65536        0061.39        000982.39
131072        0079.27        000634.16
--------------------------------------


--
Rory Campbell-Lange
rory@campbell-lange.net

Campbell-Lange Workshop
www.campbell-lange.net
0207 6311 555
3 Tottenham Street London W1T 2AF
Registered in England No. 04551928

Re: Comments requested on IO performance : new db server

From
Merlin Moncure
Date:
On Fri, Mar 9, 2012 at 5:15 AM, Rory Campbell-Lange
<rory@campbell-lange.net> wrote:
> I've taken the liberty of reposting this message as my addendum to a
> long thread that I started on the subject of adding a new db server to
> our existing 4-year old workhorse got lost in discussion.
>
> Our workload is several small databases totalling less than 40GB of disk
> space. The proposed system has 48GB RAM, 2 * quad core E5620 @ 2.40GHz
> and 4 WD Raptors behind an LSI SAS card. Our supplier has just run a set
> of tests on the machine we intend to buy. The test rig had the following
> setup:
>
> LSI MegaRAID SAS 9260-8i
> Firmware: 12.12.0-0090
> Kernel: 2.6.39.4
> Hard disks: 4x WD6000BLHX
> Test done on 256GB volume
> BS = blocksize in bytes
>
> The test tool is fio. I'd be grateful to know if the results below are
> considered acceptable. An ancillary question is whether a 4096 block
> size is a good idea. I suppose we will be using XFS which I understand
> has a default block size of 4096 bytes.
>
> RAID 10
> --------------------------------------
> Read sequential
>
>    BS           MB/s             IOPs
>   512        0129.26        264730.80
>  1024        0229.75        235273.40
>  4096        0363.14        092965.50
>  16384        0475.02        030401.50
>  65536        0472.79        007564.65
> 131072        0428.15        003425.20
> --------------------------------------
> Write sequential
>
>    BS           MB/s             IOPs
>   512        0036.08        073908.00
>  1024        0065.61        067192.60
>  4096        0170.15        043560.40
>  16384        0219.80        014067.57
>  65536        0240.05        003840.91
> 131072        0243.96        001951.74
> --------------------------------------
> Random read
>
>    BS           MB/s             IOPs
>   512        0001.50        003077.20
>  1024        0002.91        002981.40
>  4096        0011.59        002968.30
>  16384        0044.50        002848.28
>  65536        0156.96        002511.41
> 131072        0170.65        001365.25
> --------------------------------------
> Random write
>
>    BS           MB/s             IOPs
>   512        0000.53        001103.60
>  1024        0001.15        001179.20
>  4096        0004.43        001135.30
>  16384        0017.61        001127.56
>  65536        0061.39        000982.39
> 131072        0079.27        000634.16
> --------------------------------------

since your RAM is larger than the database size, read performance is
essentially a non-issue.    your major gating factors are going to be
cpu bound queries and  random writes -- 1000 IOPS essentially puts an
upper bound on your write TPS, especially if your writes are frequent
and randomly distributed, the case that is more or less simulated by
pgbench with large scaling factors.

Now, 1000 write tps is quite alot  (3.6 mil transactions/hour) and
your workload will drive the hardware consideration.

merlin

Re: Comments requested on IO performance : new db server

From
Rory Campbell-Lange
Date:
On 09/03/12, Merlin Moncure (mmoncure@gmail.com) wrote:
> On Fri, Mar 9, 2012 at 5:15 AM, Rory Campbell-Lange
> <rory@campbell-lange.net> wrote:
> > I've taken the liberty of reposting this message as my addendum to a
> > long thread that I started on the subject of adding a new db server to
> > our existing 4-year old workhorse got lost in discussion.
> >
> > Our workload is several small databases totalling less than 40GB of disk
> > space. The proposed system has 48GB RAM, 2 * quad core E5620 @ 2.40GHz
> > and 4 WD Raptors behind an LSI SAS card. Our supplier has just run a set
> > of tests on the machine we intend to buy. The test rig had the following
> > setup:
> >
> > LSI MegaRAID SAS 9260-8i
> > Firmware: 12.12.0-0090
> > Kernel: 2.6.39.4
> > Hard disks: 4x WD6000BLHX
> > Test done on 256GB volume
> > BS = blocksize in bytes
> >
> > The test tool is fio. I'd be grateful to know if the results below are
> > considered acceptable. An ancillary question is whether a 4096 block
> > size is a good idea. I suppose we will be using XFS which I understand
> > has a default block size of 4096 bytes.
> >
> > RAID 10
> > --------------------------------------
...
> > --------------------------------------
> > Random write
> >
> >    BS           MB/s             IOPs
> >   512        0000.53        001103.60
> >  1024        0001.15        001179.20
> >  4096        0004.43        001135.30
> >  16384        0017.61        001127.56
> >  65536        0061.39        000982.39
> > 131072        0079.27        000634.16
> > --------------------------------------
>
> since your RAM is larger than the database size, read performance is
> essentially a non-issue.    your major gating factors are going to be
> cpu bound queries and  random writes -- 1000 IOPS essentially puts an
> upper bound on your write TPS, especially if your writes are frequent
> and randomly distributed, the case that is more or less simulated by
> pgbench with large scaling factors.
>
> Now, 1000 write tps is quite alot  (3.6 mil transactions/hour) and
> your workload will drive the hardware consideration.

Thanks for your comments, Merlin. With regard to the "gating factors" I
believe the following is pertinent:

CPU

My current server has 2 * quad Xeon  E5420 @ 2.50GHz. The server
occasionally reaches 20% sutained utilisation according to sar.
This cpu has a "passmark" of 7,730.
http://www.cpubenchmark.net/cpu_lookup.php?cpu=[Dual+CPU]+Intel+Xeon+E5420+%40+2.50GHz

My proposed CPU is an E5620 @ 2.40GHz with CPU "passmark" of 9,620
http://www.cpubenchmark.net/cpu_lookup.php?cpu=[Dual+CPU]+Intel+Xeon+E5620+%40+2.40GHz

Since the workload will be very similar I'm hoping for about 20% better
CPU performance from the new server, which should drop max CPU load by
5% or so.

Random Writes

I'll have to test this. My current server (R10 4*15K SCSI) produced the
following pgbench stats while running its normal workload:

    -c  -t     TPS
    5   20000  446
    10  10000  542
    20   5000  601
    30   3333  647

I'd be grateful to know what parameters I should use for a "large
scaling factor" pgbench test.

Many thanks
Rory

--
Rory Campbell-Lange
rory@campbell-lange.net

Campbell-Lange Workshop
www.campbell-lange.net
0207 6311 555
3 Tottenham Street London W1T 2AF
Registered in England No. 04551928

Re: Comments requested on IO performance : new db server

From
Rory Campbell-Lange
Date:
Is a block size of 4096 a good idea both for the filesystem and
postgresql? The analysis here:
http://www.fuzzy.cz/en/articles/benchmark-results-hdd-read-write-pgbench/
appears to suggest that at least for database block sizes of 4096
read/write performance is much higher than for smaller block sizes.

Rory

On 09/03/12, Rory Campbell-Lange (rory@campbell-lange.net) wrote:
> ...An ancillary question is whether a 4096 block size is a good idea.
> I suppose we will be using XFS which I understand has a default block
> size of 4096 bytes.
>
> RAID 10
> --------------------------------------
> Read sequential
>
>     BS           MB/s             IOPs
>    512        0129.26        264730.80
>   1024        0229.75        235273.40
>   4096        0363.14        092965.50
>  16384        0475.02        030401.50
>  65536        0472.79        007564.65
> 131072        0428.15        003425.20
> --------------------------------------
> Write sequential
>
>     BS           MB/s             IOPs
>    512        0036.08        073908.00
>   1024        0065.61        067192.60
>   4096        0170.15        043560.40
>  16384        0219.80        014067.57
>  65536        0240.05        003840.91
> 131072        0243.96        001951.74
> --------------------------------------
> Random read
>
>     BS           MB/s             IOPs
>    512        0001.50        003077.20
>   1024        0002.91        002981.40
>   4096        0011.59        002968.30
>  16384        0044.50        002848.28
>  65536        0156.96        002511.41
> 131072        0170.65        001365.25
> --------------------------------------
> Random write
>
>     BS           MB/s             IOPs
>    512        0000.53        001103.60
>   1024        0001.15        001179.20
>   4096        0004.43        001135.30
>  16384        0017.61        001127.56
>  65536        0061.39        000982.39
> 131072        0079.27        000634.16
> --------------------------------------
--
Rory Campbell-Lange
rory@campbell-lange.net

Campbell-Lange Workshop
www.campbell-lange.net
0207 6311 555
3 Tottenham Street London W1T 2AF
Registered in England No. 04551928

Re: Comments requested on IO performance : new db server

From
Tomas Vondra
Date:
On 10.3.2012 11:51, Rory Campbell-Lange wrote:
> Is a block size of 4096 a good idea both for the filesystem and
> postgresql? The analysis here:
> http://www.fuzzy.cz/en/articles/benchmark-results-hdd-read-write-pgbench/
> appears to suggest that at least for database block sizes of 4096
> read/write performance is much higher than for smaller block sizes.

Hi,

interpreting those results is a bit tricky for several reasons. First,
those are 'average results' for all filesystems (and the behavior of
filesystems may vary significantly). I'd recommend checking results for
the filesystem you're going to use (http://www.fuzzy.cz/bench)

Second, the article discusses just TPC-B (OLTP-like) workload results.
It's quite probable your workload is going to mix that with other
workload types (e.g. DSS/DWH). And that's exactly where larger block
sizes are better.

To me, 8kB seems like a good compromise. Don't use other block sizes
unless you actually test the benefits for your workload.

Tomas

>
> Rory
>
> On 09/03/12, Rory Campbell-Lange (rory@campbell-lange.net) wrote:
>> ...An ancillary question is whether a 4096 block size is a good idea.
>> I suppose we will be using XFS which I understand has a default block
>> size of 4096 bytes.
>>
>> RAID 10
>> --------------------------------------
>> Read sequential
>>
>>     BS           MB/s             IOPs
>>    512        0129.26        264730.80
>>   1024        0229.75        235273.40
>>   4096        0363.14        092965.50
>>  16384        0475.02        030401.50
>>  65536        0472.79        007564.65
>> 131072        0428.15        003425.20
>> --------------------------------------
>> Write sequential
>>
>>     BS           MB/s             IOPs
>>    512        0036.08        073908.00
>>   1024        0065.61        067192.60
>>   4096        0170.15        043560.40
>>  16384        0219.80        014067.57
>>  65536        0240.05        003840.91
>> 131072        0243.96        001951.74
>> --------------------------------------
>> Random read
>>
>>     BS           MB/s             IOPs
>>    512        0001.50        003077.20
>>   1024        0002.91        002981.40
>>   4096        0011.59        002968.30
>>  16384        0044.50        002848.28
>>  65536        0156.96        002511.41
>> 131072        0170.65        001365.25
>> --------------------------------------
>> Random write
>>
>>     BS           MB/s             IOPs
>>    512        0000.53        001103.60
>>   1024        0001.15        001179.20
>>   4096        0004.43        001135.30
>>  16384        0017.61        001127.56
>>  65536        0061.39        000982.39
>> 131072        0079.27        000634.16
>> --------------------------------------