Re: Performance query about large tables, lots of concurrent access - Mailing list pgsql-performance

From Jean-David Beyer
Subject Re: Performance query about large tables, lots of concurrent access
Date
Msg-id 467FBDEF.1070701@verizon.net
Whole thread Raw
In response to Re: Performance query about large tables, lots of concurrent access  (Michael Stone <mstone+postgres@mathom.us>)
List pgsql-performance
Michael Stone wrote:
> On Thu, Jun 21, 2007 at 12:29:49PM -0400, Karl Wright wrote:
>> I checked the disk picture - this is a RAID disk array with 6 drives,
>> with a bit more than 1Tbyte total storage.  15,000 RPM.  It would be
>> hard to get more/faster disk than that.
>
> Well, it's not hard to more disk than that, but you'd probably have to
> look at an external storage array (or more than one). A larger number of
> larger/slower drives, splitting indices away from data, etc., will almost
> certainly outperform 6 disks, 15k RPM or not.
>
I also have 6 hard drives (Four of these are 10,000RPM Ultra/320 SCSI hard
drives, and the other two will be soon), 4 of which are dedicated
exclusively to the DBMS, and the other two are for everything else. I am
currently running IBM DB2, using the 4 SCSI drives in raw mode and letting
DB2 do the IO (except for the bottom level device drivers). I have all the
Indices on one drive, and most of the data on the other three, except for
some very small, seldom used tables (one has two rows, one has about 10
rows) that are managed by the OS on the other drives. I have tested this and
the bottleneck is the logfiles. For this reason, I am about to upgrade the
"everything else" drives to SCSI drives (the logfiles are on one of these).
They are currently 7200 rpm EIDE drives, but the SCSI ones are sitting on
top of the computer now, ready to be installed.

When I upgrade from RHEL3 to RHEL5 (disks for that are also sitting on top
of the computer), I will be switching from DB2 to postgreSQL, and that will
be an opportunity to lay out the disks differently. I think the partitions
will end up being about the same, but for the four main data drives, I am
thinking about doing something like this, where D is data and X is Index,
and Ti is table.

Drive 3    Drive 4    Drive 5    Drive 6
DT1    XT1
    DT2    XT2
        DT3    Xt3
XT4            DT4
etc.

Now once that is set up and populated, it might make sense to move things
around somewhat to further reduce seek contention. But that would require
actually populating the database and measuring it.
This setup would probably be pretty good if using just T1 and T3, for
example, but less good if using just T1 and T2. So ideal, such as it is,
would depend on the accesses being made by the usual program to the database.

These drives are about 17 GBytes each, which is enough for the database in
question. (The other two are about 80 GBytes each, which is enough to run
Linux and my other stuff on.)

--
  .~.  Jean-David Beyer          Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A         Registered Machine   241939.
 /( )\ Shrewsbury, New Jersey    http://counter.li.org
 ^^-^^ 08:45:01 up 4 days, 16:20, 3 users, load average: 4.23, 4.24, 4.21

pgsql-performance by date:

Previous
From: "Sabin Coanda"
Date:
Subject: Re: PostgreSQL Configuration Tool for Dummies - feedback adjustable control
Next
From: "Simon Riggs"
Date:
Subject: Re: PITR Backups