Re: Scaling further up - Mailing list pgsql-performance
From | Marty Scholes |
---|---|
Subject | Re: Scaling further up |
Date | |
Msg-id | 404FA4F3.6080903@outputservices.com Whole thread Raw |
In response to | Scaling further up ("Anjan Dave" <adave@vantage.com>) |
List | pgsql-performance |
I have some suggestions based on my anecdotal experience. 1. This is a relatively small DB -- the working set will likely be in RAM at any moment in time, making read I/O time mostly irrelevant. 2. The killer will be write times -- specifically log writes. Small and heavily synchronized writes, log and data writes, will drag down an impressive hardware RAID setup. We run mirrored hardware RAID 5 arrays with write back cache and are constantly seeking ways to improve write performance. We do a lot of batch processing, though, so we do a lot of write I/Os. 3. Be very careful with "battery backed write cache." It usually works as advertised. More than once in the past decade I have seen spontaneous cache corruption after power losss. The corruption usually happens when some admin, including me, has assumed that the cache will ALWAYS survive a power failure unblemished and has no "plan B." Make sure you have a contingency plan for corruption, or don't enable the cache. 4. RAID 10 will likely have bigger stripe sizes on the RAID 0 portion of the setup, and might hinder, not help small write I/O performance. 5. Most (almost all) of the I/O time will be due to the access time (head seek + head settle + rotational latency) and very little of the I/O time will due to data transfer time. In other words, getting drives that provide faster transfer rates will barely improve performance. The secret is lowering the access time. 6. A relatively cheap way to drastically drop the access time is to get large drive(s) and only use a portion of them for storage. The less space used on the drive, the less area the heads need to cover for seeks. At one extreme, you could make the partition the size of a single cylinder. This would make access time (ignoring OS and controller overhead) identical to rotational latency, which is as low as 4.2 ms for a cheap 7200 RPM drive. 7. A drive with a 5 ms average service time, servicing 8 KB blocks, will yield as much as 1.6 MB/s sustained write throughput. Not bad for a cheap uncached solution. Any OS aggregation of writes during the fsync() call will further improve this number -- it is basically a lower bound for throughput. 8. Many people, especially managers, cannot stomach buying disk space and only using a portion of it. In many cases, it seems more palatable to purchase a much more expensive solution to get to the same speeds. Good luck. scott.marlowe wrote: > On Wed, 3 Mar 2004, Paul Thomas wrote: > > > > > On 02/03/2004 23:25 johnnnnnn wrote: > > > [snip] > > > random_page_cost should be set with the following things taken into > > > account: > > > - seek speed > > > > Which is not exactly the same thing as spindle speed as it's a > combination > > of spindle speed and track-to-track speed. I think you'll find that a > 15K > > rpm disk, whilst it will probably have a lower seek time than a 10K rpm > > disk, won't have a proportionately (i.e., 2/3rds) lower seek time. > > There are three factors that affect how fast you can get to the next > sector: > > seek time > settle time > rotational latency > > Most drives only list the first, and don't bother to mention the other > two. > > On many modern drives, the seek times are around 5 to 10 milliseconds. > The settle time varies as well. the longer the seek, the longer the > settle, generally. This is the time it takes for the head to stop shaking > and rest quietly over a particular track. > Rotational Latency is the amount of time you have to wait, on average, for > the sector you want to come under the heads. > > Assuming an 8 ms seek, and 2 ms settle (typical numbers), and that the > rotational latency on average is 1/2 of a rotation: At 10k rpm, a > rotation takes 1/166.667 of a second, or 6 mS. So, a half a rotation is > approximately 3 mS. By going to a 15k rpm drive, the latency drops to 2 > mS. So, if we add them up, on the same basic drive, one being 10k and one > being 15k, we get: > > 10krpm: 8+2+3 = 13 mS > 15krpm: 8+2+2 = 12 mS > > So, based on the decrease in rotational latency being the only advantage > the 15krpm drive has over the 10krpm drive, we get an decrease in access > time of only 1 mS, or only about an 8% decrease in actual seek time. > > So, if you're random page cost on 10krpm drives was 1.7, you'd need to > drop it to 1.57 or so to reflect the speed increase from 15krpm drives. > > I.e. it's much more likely that going from 1 gig to 2 gigs of ram will > make a noticeable difference than going from 10k to 15k drives. > > >
pgsql-performance by date: