Re: tablespaces and DB administration - Mailing list pgsql-hackers

From pgsql@mohawksoft.com
Subject Re: tablespaces and DB administration
Date
Msg-id 16839.24.91.171.78.1085761016.squirrel@mail.mohawksoft.com
Whole thread Raw
In response to Re: tablespaces and DB administration  (Andreas Pflug <pgadmin@pse-consulting.de>)
Responses Re: tablespaces and DB administration
List pgsql-hackers
> pgsql@mohawksoft.com wrote:
>
>>>pgsql@mohawksoft.com wrote:
>>>
>>>
>>>
>>>>What you are missing is that the RAID is dealing with the multiple
>>>> drives
>>>>as one drive. Two operations have to happen serially,
>>>>
>>>>
>>>>
>>>You're kidding or vastly underestimating raid controllers. The average
>>>db access is well served with a single block of data, stored on a single
>>>drive. Nicely parallelizable by a raid controller if it has a minimum of
>>>smartness.
>>>
>>>
>>>
>>
>>The data contained on a RAID is spread across all the drives in the raid,
>>is this not true?
>>
>>
> Data is spread *blockwise*, usually 32k or 64k blocks of data. This
> means, that typically 8 to 16 database blocks will reside on a *single*
> disk, with additional parity data on other disks.

That may or may not be true depending on the RAID OEM, setup, and caching
parameters.

>
>>To access data on a drive, one must get the data off all of the drives at
>>the same time, is this not true?
>>
> The data is usually completely on a single drive.

That may or may not be true, and you *don't* know that because the RAID
shields you from it.

>
>>
>>If you perform two different operations on the RAID, you must access each
>>RAID drive twice.
>>
>>If you perform different operations on multiple different drives, you can
>>access the same amount of data as you would with the RAID, but have
>>parallelized operations.
>>
>>This is a fact. It is *the* drawback to RAID system. If you do not
>>understand this, then you do not understand RAID systems.
>>
>>
>>
> You indicate clearly that it's you having strange opinions of raid
> controller/subsystem functionality executing multiple commands.

Wait, it gets better.

>
>>Perform any benchmark you want. Take any RAID system you want. Or,
>>actually, I have a factual reason why RAID systems perform worse than
>>multiple single drives, I have written a quick program to show it. I have
>>even double checked on my own RAID system here.
>>
>
> As I said, the "benchmark" you wrote does by no means simulate DBMS
> access patterns, it might be good to show video streaming performance or
> so.
> Please do read dbms disk io white papers, e.g.
> http://msdn.microsoft.com/archive/en-us/dnarsqlsg/html/sqlperftune.asp
> Teaching hardware issues is OT for this list.

From the top of the very article you site:
"Archived content. No warranty is made as to technical accuracy"
Typical Microsoft hogwash, but they do have a few nuggets:

"Note   As a general rule of thumb, be sure to stripe across as many disks
as necessary to achieve solid performance. Windows NT/SQL Performance
Monitor will indicate if Windows NT disk I/O is bottlenecking on a
particular RAID array. Be ready to add disks and redistribute data across
RAID arrays and/or SCSI channels as necessary to balance disk I/O and
maximize performance."

They are suggesting that you use multiple RAID arrays or data channels.
Hmmm, sound familiar? Isn't that EXACTLY what I've been saying?

How about this heading title:
"Creating as Much Disk I/O Parallelism as Possible"
"Distinct disk I/O channels refer mainly to distinct sets of hard drives
or distinct RAID arrays, because hard drives are the most likely point of
disk I/O bottleneck. But also consider distinct sets of RAID or SCSI
controllers and distinct sets of PCI buses as ways to separate SQL Server
activity if additional RAID controllers and PCI buses are available."

Your own documents don't even support your claims.





pgsql-hackers by date:

Previous
From: pgsql@mohawksoft.com
Date:
Subject: Re: tablespaces and DB administration
Next
From: Andreas Pflug
Date:
Subject: Re: tablespaces and DB administration