Re: For the ametures. (related to 'Are we losing momentum?') - Mailing list pgsql-hackers

From pgsql@mohawksoft.com
Subject Re: For the ametures. (related to 'Are we losing momentum?')
Date
Msg-id 4128.68.162.220.216.1051022711.squirrel@mail.mohawksoft.com
Whole thread Raw
In response to Re: For the ametures. (related to "Are we losing momentum?")  (Shridhar Daithankar <shridhar_daithankar@persistent.co.in>)
List pgsql-hackers
> On Tuesday 22 April 2003 13:55, Ben Clewett wrote:
>> If I wanted to divide the postmaster read() calls evenly to files
>> located over several physical disks, how would you suggest
>> distributing the data-space?  Would it be as simple as putting each
>> child directory in 'data/base' on a different physical disk in a
>> round-robbin fasion using symbolic links:  Or is it more involved...
>>
>> data/base/1 -> /dev/hda
>> data/base/2 -> /dev/hdb
>> data/base/3 -> /dev/hdc
>> data/base/4 -> /dev/hda
>> data/base/5 -> /dev/hdb
>> data/base/6 -> /dev/hdc (etc)
>
> Don't bother splitting across disks unless you put them on different
> IDE  channels as IDE channel bandwidth is shared.

While that is electricaly "true" it is not completely true. Modern IDE hard
disks are very advanced with large read-ahead caches. That combined with
IDE-DMA access, low seek times, faster spin rates, means you can get
performance across two IDE drives on the same channel.

For instance, two databases, one on HDA and the other database on HDB.
Successive reads inteleaved HDA/HDB/HDA/HDB etc. will share electical
bandwidth (as would SCSI). AFAIK, there is no standard asynchronous command
structure for IDE, however, the internal read-ahead cache on each drive will
usually have a pretty good guess at the "next" block based on some
predictive caching algorithm.

So, the "next" read from the drive has a good chance at coming from cache.
Plus the OS may "scatter gather" larger requests into smaller successive
requests (so a pure "read-ahead" will work great). Then consider
write-caching (if you dare).

It is very true you want to have one IDE drive per IDE channel, but these
days two drives on a channel are not as bad as it once was.  This is not due
to shared electrical bandwidth of the system (all bus systems suffer this)
but because of the electrical protocol to address the drives. ATA and EIDE
have made strides in this area.

>
> If you have that many disk, put them on IDE RAID. That is a much
> simpler  solution.

A hardware RAID system is obviously an "easier" solution, and
www.infortrend.com makes a very cool system, but spreading multiple
databases across multiple IDE drives and controllers will probably provide
higher overall performance if you have additional IDE channels instead of
forcing all the I/O through one controller (IDE or SCSI) channel.

Pretty good PCI/EIDE-DMA controllers are cheap, $50~$100, and you can fit a
bunch of them into a server system. Provided your OS has a reentrent driver
model, it should be possible for PostgreSQL to be performing as many I/O
operations concurrently as you have drive controllers, where as with an
IDE->SCSI raid controller, you may still be limited to how good your
specific driver handles concurrency within one driver instance.

The "best" solution is one hardware raid per I/O channel per database, but
that is expensive. One IDE driver per IDE channel per database is the next
best thing. Two IDE drives per channel, one drive per database, is very
workable if you make sure that the more active databases are on separate
controllers.



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: CLOSE command tag
Next
From: Andrew Overholt
Date:
Subject: Re: TODO-list