Thread: RAID or manual split?
It seems, that if I know the type and frequency of the queries a database will be seeing, I could split the database by hand over multiple disks and get better performance that I would with a RAID array with similar hardware. Most of the data is volatile and easily replaceable (and the rest is backed up independently), so redundancy isn't importand, and I'm willing to do some ongoing maintenance if I can get a decent speed boost. Am I misguided, or might this work? details of my setup are below: Six large (3-7 Mrow) 'summary' tables, each being updated continuously by 5-20 processes with about 0.5 transactions/second/process. Periodically (currently every two weeks), join queries are performed between one of the 'summary' tables(same one each time) and each of the other five. Each join touches most rows of both tables, indexes aren't used. Results are written into a separate group of 'inventory' tables (about 500 Krow each), one for each join. There are frequent (100-1000/day) queries of both the inventory and summary tables using the primary key -- always using the index and returning < 10 rows. We're currently getting (barely) acceptable performance from a single 15k U160 SCSI disk, but db size and activity are growing quickly. I've got more disks and a battery-backed LSI card on order. -mike -- Mike Glover GPG Key ID BFD19F2C <mpg4@duluoz.net>
Attachment
> It seems, that if I know the type and frequency of the queries a > database will be seeing, I could split the database by hand over > multiple disks and get better performance that I would with a RAID array > with similar hardware. Unlikely, but possible if you had radically different hardware for different tables. > Six large (3-7 Mrow) 'summary' tables, each being updated continuously > by 5-20 processes with about 0.5 transactions/second/process. Well you should get close to an order of magnitude better performance from a RAID controller with write-back cache on those queries. > Periodically (currently every two weeks), join queries are > performed between one of the 'summary' tables(same one each time) and > each of the other five. Each join touches most rows of both tables, > indexes aren't used. Results are written into a separate group of > 'inventory' tables (about 500 Krow each), one for each join. The more disks the data is spread over the better (the RAID controller will help here with striping). > There are frequent (100-1000/day) queries of both the > inventory and summary tables using the primary key -- always using the > index and returning < 10 rows. RAM is what you need, to cache the data and indexes, and then as much CPU power as you can get. > We're currently getting (barely) acceptable performance from a single > 15k U160 SCSI disk, but db size and activity are growing quickly. > I've got more disks and a battery-backed LSI card on order. 3 or more disks in a stripe set, with write back caching, will almost certainly give a huge performance boost. Try that first, and only if you have issues should you think about futzing with symlinks etc. M
> There are frequent (100-1000/day) queries of both the > inventory and summary tables using the primary key -- always using the > index and returning < 10 rows. For this query frequency I don't think splitting the drives will do much -- you just need more IO. Look at optimizing the query themselves, specifically ensuring the useful information is already in memory. If the 10 rows are recent, you might try using partial indexes with the last days worth of information instead of an index across the entire table. > We're currently getting (barely) acceptable performance from a single > 15k U160 SCSI disk, but db size and activity are growing quickly. > I've got more disks and a battery-backed LSI card on order. Configure for Raid 10 and you're off. -- Rod Taylor <rbt [at] rbt [dot] ca> Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL PGP Key: http://www.rbt.ca/signature.asc