Re: RAID arrays and performance - Mailing list pgsql-performance

From Mark Mielke
Subject Re: RAID arrays and performance
Date
Msg-id 47557C30.10809@mark.mielke.cc
Whole thread Raw
In response to Re: RAID arrays and performance  (Matthew <matthew@flymine.org>)
List pgsql-performance
Matthew wrote:
On Tue, 4 Dec 2007, Mark Mielke wrote: 
The larger the set of requests, the closer the performance will scale to
the number of discs     
This assumes that you can know which pages to fetch ahead of time -
which you do not except for sequential read of a single table.   
There are circumstances where it may be hard to locate all the pages ahead
of time - that's probably when you're doing a nested loop join. However,
if you're looking up in an index and get a thousand row hits in the index,
then there you go. Page locations to load. 
Sure.
Please show one of your query plans and how you as a person would design
which pages to request reads for.   
How about the query that "cluster <skrald@amossen.dk>" was trying to get
to run faster a few days ago? Tom Lane wrote about it:

| Wouldn't help, because the accesses to "questions" are not the problem.
| The query's spending nearly all its time in the scan of "posts", and
| I'm wondering why --- doesn't seem like it should take 6400msec to fetch
| 646 rows, unless perhaps the data is just horribly misordered relative
| to the index.

Which is exactly what's going on. The disc is having to seek 646 times
fetching a single row each time, and that takes 6400ms. He obviously has a
standard 5,400 or 7,200 rpm drive with a seek time around 10ms. 
Your proposal would not necessarily improve his case unless he also purchased additional disks, at which point his execution time may be different. More speculation. :-)

It seems reasonable - but still a guess.

Or on a similar vein, fill a table with completely random values, say ten
million rows with a column containing integer values ranging from zero to
ten thousand. Create an index on that column, analyse it. Then pick a
number between zero and ten thousand, and

"SELECT * FROM table WHERE that_column = the_number_you_picked
This isn't a real use case. Optimizing for the worst case scenario is not always valuable.

Cheers,
mark

-- 
Mark Mielke <mark@mielke.cc>

pgsql-performance by date:

Previous
From: Mark Mielke
Date:
Subject: Re: RAID arrays and performance
Next
From: Matthew
Date:
Subject: Re: RAID arrays and performance