Thread: linux distro for better pg performance
Hi, I am using pg from 3 y. and generaly I do not have big problems with it. I am searching for best pg distro to run pg (7.4.1). At the moment I am using RedHat AS 3.0, but I think it have some performance problems (I am not sure). My configuration: P4 2.8 GHz 1 GB RAM 120 GB IDE 7200 disk. Kernel version 2.4.21-4.EL (it is the installation vesrion for rh 3.0) . My problems: If I run some query with many reads, I see a massive disk transfer : procs memory swap io system cpu r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 261724 3252 670748 0 0 0 4 105 19 0 0 100 0 0 0 0 261724 3252 670748 0 0 0 0 101 11 0 0 100 0 0 0 0 261724 3260 670748 0 0 0 4 104 19 0 0 100 0 0 1 0 259684 3268 674112 0 0 964 7 131 57 0 0 95 4 1 0 0 119408 3288 808540 0 0 27960 0 572 630 13 14 24 49 1 1 0 15896 3292 914436 0 0 7984 44744 531 275 11 18 24 47 0 2 0 16292 3296 924996 0 0 4145 6413 384 176 2 5 0 92 0 1 0 19928 3316 928844 0 0 11805 13335 497 388 5 9 5 81 0 3 0 19124 3296 924452 0 0 3153 19164 287 295 5 11 16 68 0 1 0 15956 3304 932984 0 0 536 6812 366 123 4 6 3 87 0 2 0 24956 3300 921416 0 0 1931 22936 And if I run top, I see a big iowait % (some times 70-80) and very low user % (10-15). I readet many docs about this problem, but do not find any solution. My question: If some one is using RH 3.0, pls post some result or suggestions for it performance with pg . What is the best linux distro for pg? Can I get better performance by using 15K SCSI disk ? Or it will be better to have more RAM (2 or 3 GB) ? regards, ivan.
>I am searching for best pg distro to run pg (7.4.1). > > This is generally based upon opinion. Honestly though, your kernel version is more important for performance than the distro. Personally I use gentoo, love gentoo, and would recommend very few other distros (Slackware) for servers. RedHat and others seem to include kitchensinkd, when it's not needed. >At the moment I am using RedHat AS 3.0, but I think it have some >performance problems (I am not sure). >My configuration: >P4 2.8 GHz >1 GB RAM >120 GB IDE 7200 disk. > > Your IDE drive is the biggest hardward bottleneck here. RPM's and bus transfers are slower than SCSI or SATA. >Kernel version 2.4.21-4.EL (it is the installation vesrion for rh 3.0) . > > Jump to 2.6, it's much better for performance related issues, in my experience. >My problems: > >If I run some query with many reads, I see a massive disk transfer : >procs memory swap io >system cpu > r b swpd free buff cache si so bi bo in cs us sy >id wa > 0 0 0 261724 3252 670748 0 0 0 4 105 19 0 0 >100 0 > 0 0 0 261724 3252 670748 0 0 0 0 101 11 0 0 >100 0 > 0 0 0 261724 3260 670748 0 0 0 4 104 19 0 0 >100 0 > 0 1 0 259684 3268 674112 0 0 964 7 131 57 0 0 >95 4 > 1 0 0 119408 3288 808540 0 0 27960 0 572 630 13 14 >24 49 > 1 1 0 15896 3292 914436 0 0 7984 44744 531 275 11 18 >24 47 > 0 2 0 16292 3296 924996 0 0 4145 6413 384 176 2 >5 0 92 > 0 1 0 19928 3316 928844 0 0 11805 13335 497 388 5 >9 5 81 > 0 3 0 19124 3296 924452 0 0 3153 19164 287 295 5 11 >16 68 > 0 1 0 15956 3304 932984 0 0 536 6812 366 123 4 >6 3 87 > 0 2 0 24956 3300 921416 0 0 1931 22936 > >And if I run top, I see a big iowait % (some times 70-80) and very low >user % (10-15). > > again, this is your harddrive, and the kernel can play into that. >I readet many docs about this problem, but do not find any solution. > >My question: > >If some one is using RH 3.0, pls post some result or suggestions for it >performance with pg . > >What is the best linux distro for pg? > > There's no best, just personal preference. >Can I get better performance by using 15K SCSI disk ? > > Absolutely >Or it will be better to have more RAM (2 or 3 GB) ? > > Better to have a fast drive, but more ram can be helpful. >regards, >ivan. > > > HTH, Gavin
On Thu, 2004-04-15 at 06:39, Gavin M. Roy wrote: > Your IDE drive is the biggest hardward bottleneck here. RPM's and bus > transfers are slower than SCSI or SATA. Individual disk throughput generally has very little bearing on database performance compared to other factors. In fact, IDE bandwidth performance is perfectly adequate for databases, and for database purposes indistinguishable from SATA. I would say that average access and read/write completion times, especially under load, are by far the most limiting factors, and disk RPM is only one component of this. In fact, disk RPM is a very expensive way to get marginally better throughput in this regard, and I would suggest 10k rather than 15k drives for the money. There are really only two features that are worth buying in your disk subsystem which many people ignore: TCQ and independently managed I/O with a large battery-backed write-back cache. Currently, the only place to really get this is with SCSI RAID. You can get 10k SATA drives, so when you are buying SCSI you are really buying these features. Do these features make a difference? Far more than you would imagine. On one postgres server I just upgraded, we went from a 3Ware 8x7200-RPM RAID-10 configuration to an LSI 320-2 SCSI 3x10k RAID-5, with 256M cache, and got a 3-5x performance improvement in the disk subsystem under full database load. SCSI RAID can service a lot of I/O requests far more efficiently than current IDE/SATA RAID controllers, and it shows in the stats. Under these types of loads, the actually bandwidth utilized by the disks doesn't come anywhere close to even their rated performance, never mind the theoretical performance of the bus. Service times for IDE/SATA RAID increases dramatically under load, whereas SCSI tends not to under the same load. Considering that very good SCSI RAID controllers (e.g. the LSI 320-2 that I mention above) are only marginally more expensive than nominally equivalent IDE/SATA controller solutions, using SCSI RAID with 10k drives is pretty much the price-performance sweet spot if you use your disk system hard (like we do). For databases with low disk I/O intensity, stay with IDE/SATA and save a little money. For databases that have high disk I/O intensity, use SCSI. The price premium for SCSI is about 50%, but the performance difference is an integer factor under load. j. andrew rogers
J. Andrew Rogers wrote: > Do these features make a difference? Far more than you would imagine. > On one postgres server I just upgraded, we went from a 3Ware 8x7200-RPM > RAID-10 configuration to an LSI 320-2 SCSI 3x10k RAID-5, with 256M Is raid 5 much faster than raid 10? On a 4 disk array with 3 data disks and 1 parity disk, you have to write 4/3rds the original data, while on raid 10 you have to write 2 times the original data, so logically raid 5 should be faster.
Joseph Shraibman wrote: > Is raid 5 much faster than raid 10? On a 4 disk array with 3 data disks > and 1 parity disk, you have to write 4/3rds the original data, while on > raid 10 you have to write 2 times the original data, so logically raid 5 > should be faster. RAID 5 will give you more capacity, but is usually not recommended for write intensive applications since RAID 5 writes require four I/O operations: parity and data disks must be read, new data is compared to data already on the drive and changes are noted, new parity is calculated, both the parity and data disks are written to. Furthermore, if a disk fails, performance is severely affected since all remaining drives must be read for each I/O in order to recalculate the missing disk drives data. RAID 0+1 has the same performance and capacity as RAID 1+0 (10), but less reliability since "a single drive failure will cause the whole array to become, in essence, a RAID Level 0 array" so I don't know why anyone would choose it over RAID 10 where multiple disks can fail. RAID 1 has the same capacity as RAID 10 (n/2), but RAID 10 has better performance so if you're going to have more than one drive pair, why not go for RAID 10 and get the extra performance from striping? I have been researching how to configure Postgres for a RAID 10 SAME configuration as described in the Oracle paper "Optimal Storage Configuration Made Easy" (http://otn.oracle.com/deploy/availability/pdf/oow2000_same.pdf). Has anyone delved into this before? The filesystem choice is also a key element in database performance tuning. In another Oracle paper entitled Tuning an "Oracle8i Database Running Linux" (http://otn.oracle.com/oramag/webcolumns/2002/techarticles/scalzo_linux02.html), Dr. Bert Scalzo says, "The trouble with these tests-for example, Bonnie, Bonnie++, Dbench, Iobench, Iozone, Mongo, and Postmark-is that they are basic file system throughput tests, so their results generally do not pertain in any meaningful fashion to the way relational database systems access data files." Instead he suggests users benchmarking filesystems for database applications should use these two well-known and widely accepted database benchmarks: AS3AP (http://www.benchmarkresources.com/handbook/5.html): a scalable, portable ANSI SQL relational database benchmark that provides a comprehensive set of tests of database-processing power; has built-in scalability and portability for testing a broad range of systems; minimizes human effort in implementing and running benchmark tests; and provides a uniform, metric, straightforward interpretation of the results. TPC-C (http://www.tpc.org/): an online transaction processing (OLTP) benchmark that involves a mix of five concurrent transactions of various types and either executes completely online or queries for deferred execution. The database comprises nine types of tables, having a wide range of record and population sizes. This benchmark measures the number of transactions per second. I encourage you to read the paper -- Dr. Scalzo's results will surprise you; however, while he benchmarked ext2, ext3, ReiserFS, JFS, and RAW, he did not include XFS. SGI and IBM did a more detailed study on Linux filesystem performance, which included XFS, ext2, ext3 (various modes), ReiserFS, and JRS, and the results are presented in a paper entitled "Filesystem Performance and Scalability in Linux 2.4.17" (http://oss.sgi.com/projects/xfs/papers/filesystem-perf-tm.pdf). This paper goes over the details on how to properly conduct a filesystem benchmark and addresses scaling and load more so than Dr. Scalzo's tests. For further study, I have compiled a list of Linux filesystem resources at: http://jamesthornton.com/hotlist/linux-filesystems/. -- James Thornton ______________________________________________________ Internet Business Consultant, http://jamesthornton.com
Joseph Shraibman wrote: > J. Andrew Rogers wrote: > >> Do these features make a difference? Far more than you would >> imagine. On one postgres server I just upgraded, we went from a 3Ware >> 8x7200-RPM >> RAID-10 configuration to an LSI 320-2 SCSI 3x10k RAID-5, with 256M > > Is raid 5 much faster than raid 10? On a 4 disk array with 3 data > disks and 1 parity disk, you have to write 4/3rds the original data, > while on raid 10 you have to write 2 times the original data, so > logically raid 5 should be faster. I think this comparison is a bit simplistic. For example, most raid5 setups have full stripes that are more than 8K (the typical IO size in postgresql), so one might have to read in portions of the stripe in order to compute the parity. The needed bits might be in some disk or controller cache; if it's not then you lose. If one is able to perform full stripe writes then the raid5 config should be faster for writes. Note also that the mirror has 2 copies of the data, so that the read IOs would be divided across 2 (or more) spindles using round robin or a more advanced algorithm to reduce seek times. Of course, I might be completely wrong... -- Alan
Joseph Shraibman <jks@selectacast.net> writes: > J. Andrew Rogers wrote: > > > Do these features make a difference? Far more than you would imagine. On one > > postgres server I just upgraded, we went from a 3Ware 8x7200-RPM > > RAID-10 configuration to an LSI 320-2 SCSI 3x10k RAID-5, with 256M > > Is raid 5 much faster than raid 10? On a 4 disk array with 3 data disks and 1 > parity disk, you have to write 4/3rds the original data, while on raid 10 you > have to write 2 times the original data, so logically raid 5 should be faster. In RAID5 every write needs to update the parity disk as well. In order to do that for a small random access write you often need read in the rest of the data block being modified to calculate the parity bits. This means writes often have higher latency on RAID5 because they first have to do an extra read. This is where RAID5 got its bad reputation. Good modern RAID5 controllers can minimize this problem but I think it's still an issue for a lot of lower end hardware. I wonder if postgres's architecture might minimize it already just because of the pattern of writes it generates. -- greg
The comparison is actually dead on. If you have lots of write through / read behind cache, RAID 5 can run very quickly, until the write rate overwhelms the cache - at which point the 4 I/O per write / 2 per read stops it. This means that RAID 5 works, except when stressed, which is a bad paradigm. If you do streaming sequential writes on RAID5 on a 4 drive RAID5, 4 writes become: - read drive 1 for data - read drive 3 for parity - write changes to drive 1 - write changes to drive 3 - read drive 2 for data - read drive 4 for parity - write changes to drive 2 - write changes to drive 4 - read drive 3 for data - read drive 1 for parity - write changes to drive 3 - write changes to drive 1 - read drive 4 for data - read drive 2 for parity - write changes to drive 4 - write changes to drive 2 or drive 1: 2 reads, 2 writes drive 2: 2 reads, 2 writes drive 3: 2 reads, 2 writes drive 4: 2 reads, 2 writes in other words, evenly distributed 16 I/Os. These have to be ordered to be recoverable (otherwise the parity scheme is broken and you can't recover), and thus are quasi synchronous. The same on RAID 10 is - write changes to drive 1 - write copy of changes to drive 2 - write changes to drive 1 - write copy of changes to drive 2 - write changes to drive 1 - write copy of changes to drive 2 - write changes to drive 1 - write copy of changes to drive 2 or drive 1: 4 I/Os drive 2: 4 I/Os in other words 4 I/Os in parallel. There is no wait on streaming I/O on RAID 10, and this fact is the other main reason RAID 10 gives an order of magnitude better performance. If you are writing full blocks in a streaming mode, RAID 3 will be the fastest - it is RAID 0 with a parity drive. In every situation I've seen it, RAID 5 was either generally slow or got applications into trouble during stress: bulk loads, etc. Most DBAs end up on RAID 10 for it's predictability and performance. /Aaron ----- Original Message ----- From: "Alan Stange" <stange@rentec.com> To: "Joseph Shraibman" <jks@selectacast.net> Cc: "J. Andrew Rogers" <jrogers@neopolitan.com>; <pgsql-performance@postgresql.org> Sent: Monday, May 03, 2004 11:03 PM Subject: Re: [PERFORM] linux distro for better pg performance > Joseph Shraibman wrote: > > > J. Andrew Rogers wrote: > > > >> Do these features make a difference? Far more than you would > >> imagine. On one postgres server I just upgraded, we went from a 3Ware > >> 8x7200-RPM > >> RAID-10 configuration to an LSI 320-2 SCSI 3x10k RAID-5, with 256M > > > > Is raid 5 much faster than raid 10? On a 4 disk array with 3 data > > disks and 1 parity disk, you have to write 4/3rds the original data, > > while on raid 10 you have to write 2 times the original data, so > > logically raid 5 should be faster. > > I think this comparison is a bit simplistic. For example, most raid5 > setups have full stripes that are more than 8K (the typical IO size in > postgresql), so one might have to read in portions of the stripe in > order to compute the parity. The needed bits might be in some disk or > controller cache; if it's not then you lose. If one is able to > perform full stripe writes then the raid5 config should be faster for > writes. > > Note also that the mirror has 2 copies of the data, so that the read IOs > would be divided across 2 (or more) spindles using round robin or a more > advanced algorithm to reduce seek times. > > Of course, I might be completely wrong... > > -- Alan > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster >