Re: RAID arrays and performance - Mailing list pgsql-performance

From Mark Mielke
Subject Re: RAID arrays and performance
Date
Msg-id 475559F4.7060104@mark.mielke.cc
Whole thread Raw
In response to Re: RAID arrays and performance  (Matthew <matthew@flymine.org>)
Responses Re: RAID arrays and performance
List pgsql-performance
Matthew wrote:
> On Tue, 4 Dec 2007, Gregory Stark wrote:
>
>> "Matthew" <matthew@flymine.org> writes
>>> Does Postgres issue requests to each random access in turn, waiting for
>>> each one to complete before issuing the next request (in which case the
>>> performance will not exceed that of a single disc), or does it use some
>>> clever asynchronous access method to send a queue of random access
>>> requests to the OS that can be distributed among the available discs?
>>>
>> Sorry, it does the former, at least currently.
>> That said, this doesn't really come up nearly as often as you might think.
>>
> Shame. It comes up a *lot* in my project. A while ago we converted a task
> that processes a queue of objects to processing groups of a thousand
> objects, which sped up the process considerably. So we run an awful lot of
> queries with IN lists with a thousand values. They hit the indexes, then
> fetch the rows by random access. A full table sequential scan would take
> much longer. It'd be awfully nice to have those queries go twelve times
> faster.
>
The bitmap scan method does ordered reads of the table, which can
partially take advantage of sequential reads. Not sure whether bitmap
scan is optimal for your situation or whether your situation would allow
this to be taken advantage of.

>> Normally queries fit mostly in either the large batch query domain or the
>> small quick oltp query domain. For the former Postgres tries quite hard to do
>> sequential i/o which the OS will do readahead for and you'll get good
>> performance. For the latter you're normally running many simultaneous such
>> queries and the raid array helps quite a bit.
>>
> Having twelve discs will certainly improve the sequential IO throughput!
>
> However, if this was implemented (and I have *no* idea whatsoever how hard
> it would be), then large index scans would scale with the number of discs
> in the system, which would be quite a win, I would imagine. Large index
> scans can't be that rare!
>
Do you know that there is a problem, or are you speculating about one? I
think your case would be far more compelling if you could show a
problem. :-)

I would think that at a minimum, having 12 disks with RAID 0 or RAID 1+0
would allow your insane queries to run concurrent with up to 12 other
queries. Unless your insane query is the only query in use on the
system, I think you may be speculating about a nearly non-existence
problem. Just a suggestion...

I recall talk of more intelligent table scanning algorithms, and the use
of asynchronous I/O to benefit from RAID arrays, but the numbers
prepared to convince people that the change would have effect have been
less than impressive.

Cheers,
mark

--
Mark Mielke <mark@mielke.cc>

pgsql-performance by date:

Previous
From: Matthew
Date:
Subject: Re: Utilizing multiple cores for one query
Next
From: Matthew
Date:
Subject: Re: RAID arrays and performance