Thread: linux distro for better pg performance

linux distro for better pg performance

From
pginfo
Date:
Hi,

I am using pg from 3 y. and generaly I do not have big problems with it.

I am searching for best pg distro to run pg (7.4.1).
At the moment I am using RedHat AS 3.0, but I think it have some
performance problems (I am not sure).
My configuration:
P4 2.8 GHz
1 GB RAM
120 GB IDE 7200 disk.
Kernel version 2.4.21-4.EL (it is the installation vesrion for rh 3.0) .

My problems:

If I run some query with many reads, I see a massive disk transfer :
procs                      memory      swap          io
system         cpu
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy
id wa
 0  0      0 261724   3252 670748    0    0     0     4  105    19  0  0
100  0
 0  0      0 261724   3252 670748    0    0     0     0  101    11  0  0
100  0
 0  0      0 261724   3260 670748    0    0     0     4  104    19  0  0
100  0
 0  1      0 259684   3268 674112    0    0   964     7  131    57  0  0
95  4
 1  0      0 119408   3288 808540    0    0 27960     0  572   630 13 14
24 49
 1  1      0  15896   3292 914436    0    0  7984 44744  531   275 11 18
24 47
 0  2      0  16292   3296 924996    0    0  4145  6413  384   176  2
5  0 92
 0  1      0  19928   3316 928844    0    0 11805 13335  497   388  5
9  5 81
 0  3      0  19124   3296 924452    0    0  3153 19164  287   295  5 11
16 68
 0  1      0  15956   3304 932984    0    0   536  6812  366   123  4
6  3 87
 0  2      0  24956   3300 921416    0    0  1931 22936

And if I run top, I see a big iowait % (some times 70-80) and very low
user % (10-15).

I readet many docs about this problem, but do not find any solution.

My question:

If some one is using RH 3.0, pls post some result or suggestions for it
performance with pg .

What is the best linux distro for pg?

Can I get better performance by using 15K SCSI disk ?
Or it will be better to have more RAM (2 or 3 GB) ?

regards,
ivan.


Re: linux distro for better pg performance

From
"Gavin M. Roy"
Date:
>I am searching for best pg distro to run pg (7.4.1).
>
>
This is generally based upon opinion.  Honestly though, your kernel
version is more important for performance than the distro.  Personally I
use gentoo, love gentoo, and would recommend very few other distros
(Slackware) for servers.  RedHat and others seem to include
kitchensinkd, when it's not needed.

>At the moment I am using RedHat AS 3.0, but I think it have some
>performance problems (I am not sure).
>My configuration:
>P4 2.8 GHz
>1 GB RAM
>120 GB IDE 7200 disk.
>
>
Your IDE drive is the biggest hardward bottleneck here.  RPM's and bus
transfers are slower than SCSI or SATA.

>Kernel version 2.4.21-4.EL (it is the installation vesrion for rh 3.0) .
>
>
Jump to 2.6, it's much better for performance related issues, in my
experience.

>My problems:
>
>If I run some query with many reads, I see a massive disk transfer :
>procs                      memory      swap          io
>system         cpu
> r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy
>id wa
> 0  0      0 261724   3252 670748    0    0     0     4  105    19  0  0
>100  0
> 0  0      0 261724   3252 670748    0    0     0     0  101    11  0  0
>100  0
> 0  0      0 261724   3260 670748    0    0     0     4  104    19  0  0
>100  0
> 0  1      0 259684   3268 674112    0    0   964     7  131    57  0  0
>95  4
> 1  0      0 119408   3288 808540    0    0 27960     0  572   630 13 14
>24 49
> 1  1      0  15896   3292 914436    0    0  7984 44744  531   275 11 18
>24 47
> 0  2      0  16292   3296 924996    0    0  4145  6413  384   176  2
>5  0 92
> 0  1      0  19928   3316 928844    0    0 11805 13335  497   388  5
>9  5 81
> 0  3      0  19124   3296 924452    0    0  3153 19164  287   295  5 11
>16 68
> 0  1      0  15956   3304 932984    0    0   536  6812  366   123  4
>6  3 87
> 0  2      0  24956   3300 921416    0    0  1931 22936
>
>And if I run top, I see a big iowait % (some times 70-80) and very low
>user % (10-15).
>
>
again, this is your harddrive, and the kernel can play into that.

>I readet many docs about this problem, but do not find any solution.
>
>My question:
>
>If some one is using RH 3.0, pls post some result or suggestions for it
>performance with pg .
>
>What is the best linux distro for pg?
>
>
There's no best, just personal preference.

>Can I get better performance by using 15K SCSI disk ?
>
>
Absolutely

>Or it will be better to have more RAM (2 or 3 GB) ?
>
>
Better to have a fast drive, but more ram can be helpful.

>regards,
>ivan.
>
>
>
HTH,

Gavin

Re: linux distro for better pg performance

From
"J. Andrew Rogers"
Date:
On Thu, 2004-04-15 at 06:39, Gavin M. Roy wrote:
> Your IDE drive is the biggest hardward bottleneck here.  RPM's and bus
> transfers are slower than SCSI or SATA.


Individual disk throughput generally has very little bearing on database
performance compared to other factors.  In fact, IDE bandwidth
performance is perfectly adequate for databases, and for database
purposes indistinguishable from SATA.  I would say that average access
and read/write completion times, especially under load, are by far the
most limiting factors, and disk RPM is only one component of this.  In
fact, disk RPM is a very expensive way to get marginally better
throughput in this regard, and I would suggest 10k rather than 15k
drives for the money.

There are really only two features that are worth buying in your disk
subsystem which many people ignore: TCQ and independently managed I/O
with a large battery-backed write-back cache.  Currently, the only place
to really get this is with SCSI RAID.  You can get 10k SATA drives, so
when you are buying SCSI you are really buying these features.

Do these features make a difference?  Far more than you would imagine.
On one postgres server I just upgraded, we went from a 3Ware 8x7200-RPM
RAID-10 configuration to an LSI 320-2 SCSI 3x10k RAID-5, with 256M
cache, and got a 3-5x performance improvement in the disk subsystem
under full database load.  SCSI RAID can service a lot of I/O requests
far more efficiently than current IDE/SATA RAID controllers, and it
shows in the stats.  Under these types of loads, the actually bandwidth
utilized by the disks doesn't come anywhere close to even their rated
performance, never mind the theoretical performance of the bus.  Service
times for IDE/SATA RAID increases dramatically under load, whereas SCSI
tends not to under the same load.

Considering that very good SCSI RAID controllers (e.g. the LSI 320-2
that I mention above) are only marginally more expensive than nominally
equivalent IDE/SATA controller solutions, using SCSI RAID with 10k
drives is pretty much the price-performance sweet spot if you use your
disk system hard (like we do).  For databases with low disk I/O
intensity, stay with IDE/SATA and save a little money.  For databases
that have high disk I/O intensity, use SCSI.  The price premium for SCSI
is about 50%, but the performance difference is an integer factor under
load.


j. andrew rogers






Re: linux distro for better pg performance

From
Joseph Shraibman
Date:
J. Andrew Rogers wrote:

> Do these features make a difference?  Far more than you would imagine.
> On one postgres server I just upgraded, we went from a 3Ware 8x7200-RPM
> RAID-10 configuration to an LSI 320-2 SCSI 3x10k RAID-5, with 256M

Is raid 5 much faster than raid 10?  On a 4 disk array with 3 data disks
and 1 parity disk, you have to write 4/3rds the original data, while on
raid 10 you have to write 2 times the original data, so logically raid 5
should be faster.

Re: linux distro for better pg performance

From
James Thornton
Date:
Joseph Shraibman wrote:

> Is raid 5 much faster than raid 10?  On a 4 disk array with 3 data disks
> and 1 parity disk, you have to write 4/3rds the original data, while on
> raid 10 you have to write 2 times the original data, so logically raid 5
> should be faster.

RAID 5 will give you more capacity, but is usually not recommended for
write intensive applications since RAID 5 writes require four I/O
operations: parity and data disks must be read, new data is compared to
data already on the drive and changes are noted, new parity is
calculated, both the parity and data disks are written to. Furthermore,
if a disk fails, performance is severely affected since all remaining
drives must be read for each I/O in order to recalculate the missing
disk drives data.

RAID 0+1 has the same performance and capacity as RAID 1+0 (10), but
less reliability since "a single drive failure will cause the whole
array to become, in essence, a RAID Level 0 array" so I don't know why
anyone would choose it over RAID 10 where multiple disks can fail.

RAID 1 has the same capacity as RAID 10 (n/2), but RAID 10 has better
performance so if you're going to have more than one drive pair, why not
go for RAID 10 and get the extra performance from striping?

I have been researching how to configure Postgres for a RAID 10 SAME
configuration as described in the Oracle paper "Optimal Storage
Configuration Made Easy"
(http://otn.oracle.com/deploy/availability/pdf/oow2000_same.pdf). Has
anyone delved into this before?

The filesystem choice is also a key element in database performance
tuning. In another Oracle paper entitled Tuning an "Oracle8i Database
Running Linux"
(http://otn.oracle.com/oramag/webcolumns/2002/techarticles/scalzo_linux02.html),
Dr. Bert Scalzo says, "The trouble with these tests-for example, Bonnie,
Bonnie++, Dbench, Iobench, Iozone, Mongo, and Postmark-is that they are
basic file system throughput tests, so their results generally do not
pertain in any meaningful fashion to the way relational database systems
access data files." Instead he suggests users benchmarking filesystems
for database applications should use these two well-known and widely
accepted database benchmarks:

AS3AP (http://www.benchmarkresources.com/handbook/5.html): a scalable,
portable ANSI SQL relational database benchmark that provides a
comprehensive set of tests of database-processing power; has built-in
scalability and portability for testing a broad range of systems;
minimizes human effort in implementing and running benchmark tests; and
provides a uniform, metric, straightforward interpretation of the results.

TPC-C (http://www.tpc.org/): an online transaction processing (OLTP)
benchmark that involves a mix of five concurrent transactions of various
types and either executes completely online or queries for deferred
execution. The database comprises nine types of tables, having a wide
range of record and population sizes. This benchmark measures the number
of transactions per second.

I encourage you to read the paper -- Dr. Scalzo's results will surprise
you; however, while he benchmarked ext2, ext3, ReiserFS, JFS, and RAW,
he did not include XFS.

SGI and IBM did a more detailed study on Linux filesystem performance,
which included XFS, ext2, ext3 (various modes), ReiserFS, and JRS, and
the results are presented in a paper entitled "Filesystem Performance
and Scalability in Linux 2.4.17"
(http://oss.sgi.com/projects/xfs/papers/filesystem-perf-tm.pdf). This
paper goes over the details on how to properly conduct a filesystem
benchmark and addresses scaling and load more so than Dr. Scalzo's tests.

For further study, I have compiled a list of Linux filesystem resources
at: http://jamesthornton.com/hotlist/linux-filesystems/.
--

  James Thornton
______________________________________________________
Internet Business Consultant, http://jamesthornton.com


Re: linux distro for better pg performance

From
Alan Stange
Date:
Joseph Shraibman wrote:

> J. Andrew Rogers wrote:
>
>> Do these features make a difference?  Far more than you would
>> imagine. On one postgres server I just upgraded, we went from a 3Ware
>> 8x7200-RPM
>> RAID-10 configuration to an LSI 320-2 SCSI 3x10k RAID-5, with 256M
>
> Is raid 5 much faster than raid 10?  On a 4 disk array with 3 data
> disks and 1 parity disk, you have to write 4/3rds the original data,
> while on raid 10 you have to write 2 times the original data, so
> logically raid 5 should be faster.

I think this comparison is a bit simplistic.   For example, most raid5
setups have full stripes that are more than 8K  (the typical IO size in
postgresql), so one might have to read in portions of the stripe in
order to compute the parity.   The needed bits might be in some disk or
controller cache;  if it's not then you lose.   If one is able to
perform full stripe writes then the raid5 config should be faster for
writes.

Note also that the mirror has 2 copies of the data, so that the read IOs
would be divided across 2 (or more) spindles using round robin or a more
advanced algorithm to reduce seek times.

Of course, I might be completely wrong...

-- Alan

Re: linux distro for better pg performance

From
Greg Stark
Date:
Joseph Shraibman <jks@selectacast.net> writes:

> J. Andrew Rogers wrote:
>
> > Do these features make a difference?  Far more than you would imagine. On one
> > postgres server I just upgraded, we went from a 3Ware 8x7200-RPM
> > RAID-10 configuration to an LSI 320-2 SCSI 3x10k RAID-5, with 256M
>
> Is raid 5 much faster than raid 10?  On a 4 disk array with 3 data disks and 1
> parity disk, you have to write 4/3rds the original data, while on raid 10 you
> have to write 2 times the original data, so logically raid 5 should be faster.

In RAID5 every write needs to update the parity disk as well. In order to do
that for a small random access write you often need read in the rest of the
data block being modified to calculate the parity bits. This means writes
often have higher latency on RAID5 because they first have to do an extra
read. This is where RAID5 got its bad reputation.

Good modern RAID5 controllers can minimize this problem but I think it's still
an issue for a lot of lower end hardware. I wonder if postgres's architecture
might minimize it already just because of the pattern of writes it generates.

--
greg

Re: linux distro for better pg performance

From
"Aaron Werman"
Date:
The comparison is actually dead on. If you have lots of write through / read
behind cache, RAID 5 can run very quickly, until the write rate overwhelms
the cache - at which point the 4 I/O per write / 2 per read stops it. This
means that RAID 5 works, except when stressed, which is a bad paradigm.

If you do streaming sequential writes on RAID5 on a 4 drive RAID5, 4 writes
become:

- read drive 1 for data
- read drive 3 for parity
- write changes to drive 1
- write changes to drive 3

- read drive 2 for data
- read drive 4 for parity
- write changes to drive 2
- write changes to drive 4

- read drive 3 for data
- read drive 1 for parity
- write changes to drive 3
- write changes to drive 1

- read drive 4 for data
- read drive 2 for parity
- write changes to drive 4
- write changes to drive 2

or

drive 1: 2 reads, 2 writes
drive 2: 2 reads, 2 writes
drive 3: 2 reads, 2 writes
drive 4: 2 reads, 2 writes

in other words, evenly distributed 16 I/Os. These have to be ordered to be
recoverable (otherwise the parity scheme is broken and you can't recover),
and thus are quasi synchronous.

The same on RAID 10 is

- write changes to drive 1
- write copy of changes to drive 2
- write changes to drive 1
- write copy of changes to drive 2
- write changes to drive 1
- write copy of changes to drive 2
- write changes to drive 1
- write copy of changes to drive 2

or

drive 1: 4 I/Os
drive 2: 4 I/Os

in other words 4 I/Os in parallel. There is no wait on streaming I/O on RAID
10, and this fact is the other main reason RAID 10 gives an order of
magnitude  better performance.

If you are writing full blocks in a streaming mode, RAID 3 will be the
fastest - it is RAID 0 with a parity drive. In every situation I've seen it,
RAID 5 was either generally slow or got applications into trouble during
stress: bulk loads, etc. Most DBAs end up on RAID 10 for it's predictability
and performance.

/Aaron

----- Original Message -----
From: "Alan Stange" <stange@rentec.com>
To: "Joseph Shraibman" <jks@selectacast.net>
Cc: "J. Andrew Rogers" <jrogers@neopolitan.com>;
<pgsql-performance@postgresql.org>
Sent: Monday, May 03, 2004 11:03 PM
Subject: Re: [PERFORM] linux distro for better pg performance


> Joseph Shraibman wrote:
>
> > J. Andrew Rogers wrote:
> >
> >> Do these features make a difference?  Far more than you would
> >> imagine. On one postgres server I just upgraded, we went from a 3Ware
> >> 8x7200-RPM
> >> RAID-10 configuration to an LSI 320-2 SCSI 3x10k RAID-5, with 256M
> >
> > Is raid 5 much faster than raid 10?  On a 4 disk array with 3 data
> > disks and 1 parity disk, you have to write 4/3rds the original data,
> > while on raid 10 you have to write 2 times the original data, so
> > logically raid 5 should be faster.
>
> I think this comparison is a bit simplistic.   For example, most raid5
> setups have full stripes that are more than 8K  (the typical IO size in
> postgresql), so one might have to read in portions of the stripe in
> order to compute the parity.   The needed bits might be in some disk or
> controller cache;  if it's not then you lose.   If one is able to
> perform full stripe writes then the raid5 config should be faster for
> writes.
>
> Note also that the mirror has 2 copies of the data, so that the read IOs
> would be divided across 2 (or more) spindles using round robin or a more
> advanced algorithm to reduce seek times.
>
> Of course, I might be completely wrong...
>
> -- Alan
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>