Re: tablespaces and DB administration - Mailing list pgsql-hackers
From | pgsql@mohawksoft.com |
---|---|
Subject | Re: tablespaces and DB administration |
Date | |
Msg-id | 16839.24.91.171.78.1085761016.squirrel@mail.mohawksoft.com Whole thread Raw |
In response to | Re: tablespaces and DB administration (Andreas Pflug <pgadmin@pse-consulting.de>) |
Responses |
Re: tablespaces and DB administration
|
List | pgsql-hackers |
> pgsql@mohawksoft.com wrote: > >>>pgsql@mohawksoft.com wrote: >>> >>> >>> >>>>What you are missing is that the RAID is dealing with the multiple >>>> drives >>>>as one drive. Two operations have to happen serially, >>>> >>>> >>>> >>>You're kidding or vastly underestimating raid controllers. The average >>>db access is well served with a single block of data, stored on a single >>>drive. Nicely parallelizable by a raid controller if it has a minimum of >>>smartness. >>> >>> >>> >> >>The data contained on a RAID is spread across all the drives in the raid, >>is this not true? >> >> > Data is spread *blockwise*, usually 32k or 64k blocks of data. This > means, that typically 8 to 16 database blocks will reside on a *single* > disk, with additional parity data on other disks. That may or may not be true depending on the RAID OEM, setup, and caching parameters. > >>To access data on a drive, one must get the data off all of the drives at >>the same time, is this not true? >> > The data is usually completely on a single drive. That may or may not be true, and you *don't* know that because the RAID shields you from it. > >> >>If you perform two different operations on the RAID, you must access each >>RAID drive twice. >> >>If you perform different operations on multiple different drives, you can >>access the same amount of data as you would with the RAID, but have >>parallelized operations. >> >>This is a fact. It is *the* drawback to RAID system. If you do not >>understand this, then you do not understand RAID systems. >> >> >> > You indicate clearly that it's you having strange opinions of raid > controller/subsystem functionality executing multiple commands. Wait, it gets better. > >>Perform any benchmark you want. Take any RAID system you want. Or, >>actually, I have a factual reason why RAID systems perform worse than >>multiple single drives, I have written a quick program to show it. I have >>even double checked on my own RAID system here. >> > > As I said, the "benchmark" you wrote does by no means simulate DBMS > access patterns, it might be good to show video streaming performance or > so. > Please do read dbms disk io white papers, e.g. > http://msdn.microsoft.com/archive/en-us/dnarsqlsg/html/sqlperftune.asp > Teaching hardware issues is OT for this list. From the top of the very article you site: "Archived content. No warranty is made as to technical accuracy" Typical Microsoft hogwash, but they do have a few nuggets: "Note As a general rule of thumb, be sure to stripe across as many disks as necessary to achieve solid performance. Windows NT/SQL Performance Monitor will indicate if Windows NT disk I/O is bottlenecking on a particular RAID array. Be ready to add disks and redistribute data across RAID arrays and/or SCSI channels as necessary to balance disk I/O and maximize performance." They are suggesting that you use multiple RAID arrays or data channels. Hmmm, sound familiar? Isn't that EXACTLY what I've been saying? How about this heading title: "Creating as Much Disk I/O Parallelism as Possible" "Distinct disk I/O channels refer mainly to distinct sets of hard drives or distinct RAID arrays, because hard drives are the most likely point of disk I/O bottleneck. But also consider distinct sets of RAID or SCSI controllers and distinct sets of PCI buses as ways to separate SQL Server activity if additional RAID controllers and PCI buses are available." Your own documents don't even support your claims.
pgsql-hackers by date: