Re: Scaling further up - Mailing list pgsql-performance

From Marty Scholes
Subject Re: Scaling further up
Date
Msg-id 404FA4F3.6080903@outputservices.com
Whole thread Raw
In response to Scaling further up  ("Anjan Dave" <adave@vantage.com>)
List pgsql-performance
I have some suggestions based on my anecdotal experience.

1. This is a relatively small DB -- the working set will likely be in
RAM at any moment in time, making read I/O time mostly irrelevant.

2. The killer will be write times -- specifically log writes.  Small and
heavily synchronized writes, log and data writes, will drag down an
impressive hardware RAID setup.  We run mirrored hardware RAID 5 arrays
with write back cache and are constantly seeking ways to improve write
performance.  We do a lot of batch processing, though, so we do a lot of
write I/Os.

3. Be very careful with "battery backed write cache."  It usually works
as advertised.  More than once in the past decade I have seen
spontaneous cache corruption after power losss.  The corruption usually
happens when some admin, including me, has assumed that the cache will
ALWAYS survive a power failure unblemished and has no "plan B."  Make
sure you have a contingency plan for corruption, or don't enable the cache.

4. RAID 10 will likely have bigger stripe sizes on the RAID 0 portion of
the setup, and might hinder, not help small write I/O performance.

5. Most (almost all) of the I/O time will be due to the access time
(head seek + head settle + rotational latency) and very little of the
I/O time will due to data transfer time.  In other words, getting drives
that provide faster transfer rates will barely improve performance.  The
secret is lowering the access time.

6. A relatively cheap way to drastically drop the access time is to get
large drive(s) and only use a portion of them for storage.  The less
space used on the drive, the less area the heads need to cover for
seeks.  At one extreme, you could make the partition the size of a
single cylinder.  This would make access time (ignoring OS and
controller overhead) identical to rotational latency, which is as low as
4.2 ms for a cheap 7200 RPM drive.

7. A drive with a 5 ms average service time, servicing 8 KB blocks, will
yield as much as 1.6 MB/s sustained write throughput.  Not bad for a
cheap uncached solution.  Any OS aggregation of writes during the
fsync() call will further improve this number -- it is basically a lower
bound for throughput.

8. Many people, especially managers, cannot stomach buying disk space
and only using a portion of it.  In many cases, it seems more palatable
to purchase a much more expensive solution to get to the same speeds.

Good luck.

scott.marlowe wrote:
> On Wed, 3 Mar 2004, Paul Thomas wrote:
>
>  >
>  > On 02/03/2004 23:25 johnnnnnn wrote:
>  > > [snip]
>  > > random_page_cost should be set with the following things taken into
>  > > account:
>  > >   - seek speed
>  >
>  > Which is not exactly the same thing as spindle speed as it's a
> combination
>  > of spindle speed and track-to-track speed. I think you'll find that a
> 15K
>  > rpm disk, whilst it will probably have a lower seek time than a 10K rpm
>  > disk, won't have a proportionately (i.e., 2/3rds) lower seek time.
>
> There are three factors that affect how fast you can get to the next
> sector:
>
> seek time
> settle time
> rotational latency
>
> Most drives only list the first, and don't bother to mention the other
> two.
>
> On many modern drives, the seek times are around 5 to 10 milliseconds.
> The settle time varies as well.  the longer the seek, the longer the
> settle, generally.  This is the time it takes for the head to stop shaking
> and rest quietly over a particular track.
> Rotational Latency is the amount of time you have to wait, on average, for
> the sector you want to come under the heads.
>
> Assuming an 8 ms seek, and 2 ms settle (typical numbers), and that the
> rotational latency on average is 1/2 of a rotation:  At 10k rpm, a
> rotation takes 1/166.667 of a second, or 6 mS.  So, a half a rotation is
> approximately 3 mS.  By going to a 15k rpm drive, the latency drops to 2
> mS.  So, if we add them up, on the same basic drive, one being 10k and one
> being 15k, we get:
>
> 10krpm: 8+2+3 = 13 mS
> 15krpm: 8+2+2 = 12 mS
>
> So, based on the decrease in rotational latency being the only advantage
> the 15krpm drive has over the 10krpm drive, we get an decrease in access
> time of only 1 mS, or only about an 8% decrease in actual seek time.
>
> So, if you're random page cost on 10krpm drives was 1.7, you'd need to
> drop it to 1.57 or so to reflect the speed increase from 15krpm drives.
>
> I.e. it's much more likely that going from 1 gig to 2 gigs of ram will
> make a noticeable difference than going from 10k to 15k drives.
>
>
>



pgsql-performance by date:

Previous
From: Joseph Shraibman
Date:
Subject: Re: Postgresql on SAN
Next
From: Maneesha Nunes
Date:
Subject: Drop Tables Very Slow in Postgresql 7.2.1