Re: RAID arrays and performance - Mailing list pgsql-performance
From | Matthew |
---|---|
Subject | Re: RAID arrays and performance |
Date | |
Msg-id | Pine.LNX.4.58.0712041454110.3731@aragorn.flymine.org Whole thread Raw |
In response to | Re: RAID arrays and performance (Mark Mielke <mark@mark.mielke.cc>) |
Responses |
Re: RAID arrays and performance
Re: RAID arrays and performance |
List | pgsql-performance |
On Tue, 4 Dec 2007, Mark Mielke wrote: > The disk head has less theoretical distance to travel if always moving > in a single direction instead of randomly seeking back and forth. True... and false. The head can move pretty quickly, and it also has rotational latency and settling time to deal with. This means that there are cases where it is actually faster to move across the disc and read a block, then move back and read a block than to read them in order. So, if you hand requests one by one to the disc, it will almost always be faster to order them. On the other hand, if you hand a huge long list of requests to a decent SCSI or SATA-NCQ disc in one go, it will reorder the reads itself, and it will do it much better than you. > The time to seek to a particular sector does not reduce 12X with 12 > disks. It is still approximately the same, only it can handle 12X the > concurrency. This makes RAID best for concurrent loads. In your > scenario, you are trying to make a single query take advantage of this > concurrency by taking advantage of the system cache. Kind of. The system cache is just a method to make it simpler to explain - I don't know the operating system interfaces, but it's likely that the actual call is something more like "transfer these blocks to these memory locations and tell me when they're all finished." I'm trying to make a single query concurrent by using the knowledge of a *list* of accesses to be made, and getting the operating system to do all of them concurrently. > The problem is that a 12X speed for 12 disks seems unlikely except under > very specific loads (such as a sequential scan of a single table). I'll grant you that 12X may not actually be reached, but it'll be somewhere between 1X and 12X. I'm optimistic. > Each of the indexes may need to be scanned or searched in turn, then > each of the tables would need to be scanned or searched in turn, > depending on the query plan. Yes, the indexes would also have to be accessed concurrently, and that will undoubtedly be harder to code than accessing the tables concurrently. > There is no guarantee that the index rows or the table rows are equally > spread across the 12 disks. Indeed. However, you will get an advantage if they are spread out at all. Statistically, the larger the number of pages that need to be retrieved in the set, the more equally spread between the discs they will be. It's the times when there are a large number of pages to retrieve that this will be most useful. > CPU processing becomes involved with is currently limited to a single > processor thread. On the contrary, this is a problem at the moment for sequential table scans, but would not be a problem for random accesses. If you have twelve discs all throwing 80MB/s at the CPU, it's understandable that the CPU won't keep up. However, when you're making random accesses, with say a 15,000 rpm disc, and retrieving a single 8k page on every access, each disc will be producing a maximum of 2MB per second, which can be handled quite easily by modern CPUs. Index scans are limited by the disc, not the CPU. Mathew -- A. Top Posters > Q. What's the most annoying thing in the world?
pgsql-performance by date: