Thread: Tables on multiple disk drives
Hi! Does PostgreSQL allow to create tables and indices of a single database on multiple disk drives with a purpose of increase performance as Oracle database does? If a symbolic reference is the only method then the next question is: how can it be determined what file is referred to what table and index?
On Tuesday 17 February 2004 12:54, Konstantin Tokar wrote: > Hi! > Does PostgreSQL allow to create tables and indices of a single > database on multiple disk drives with a purpose of increase > performance as Oracle database does? If a symbolic reference is the > only method then the next question is: how can it be determined what > file is referred to what table and index? Yep, symlinks are the way at present (though I think someone is working on tablespace support). The files are named after the OID of the object they represent - there is a useful oid2name utility in the contrib/ folder. You might want to check the archives though, and see what RAID setups people prefer - less trouble to maintain than symlinking. -- Richard Huxton Archonet Ltd
On Tue, 17 Feb 2004, Konstantin Tokar wrote: > Hi! > Does PostgreSQL allow to create tables and indices of a single > database on multiple disk drives with a purpose of increase > performance as Oracle database does? If a symbolic reference is the > only method then the next question is: how can it be determined what > file is referred to what table and index? You're life will be simpler, and your setup will be faster without having to muck about with it, if you just buy a good RAID controller with battery backed cache. LSI/Megaraid and Adaptec both make serviceable controllers for reasonable prices, and as you add drives, the speed just goes up, no muddling around with sym links.
> On Tue, 17 Feb 2004, Konstantin Tokar wrote: > >> Hi! >> Does PostgreSQL allow to create tables and indices of a single >> database on multiple disk drives with a purpose of increase >> performance as Oracle database does? If a symbolic reference is the >> only method then the next question is: how can it be determined what >> file is referred to what table and index? > > You're life will be simpler, and your setup will be faster without > having to muck about with it, if you just buy a good RAID controller > with battery backed cache. LSI/Megaraid and Adaptec both make > serviceable controllers for reasonable prices, and as you add drives, > the speed just goes up, no muddling around with sym links. This works to a limited extent. For very large databases, maximum throughput of I/O is the paramount factor for database performance. With raid controllers, your LUN is still limited to a small number of disks. PostgreSQL can only write on a file system, but Oracle, SAP DB, DB2, etc can write directly to disk (raw I/O). With large databases it is advantageous to spread a table across 100's of disks, if the table is quite large. I don't know of any manufacturer that creates a 100 disk raid array yet. Some of the problem can be addressed by using a volume manager (such as LVM in Linux, or Veritas on Unix-like systems). This allows one to create a volume using partitions from many disks. One can then create a file system and mount it on the volume. However, to get the best performance, Raw I/O capability is the best way to go. > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
On Tue, 17 Feb 2004, Craig Thomas wrote: > > On Tue, 17 Feb 2004, Konstantin Tokar wrote: > > > >> Hi! > >> Does PostgreSQL allow to create tables and indices of a single > >> database on multiple disk drives with a purpose of increase > >> performance as Oracle database does? If a symbolic reference is the > >> only method then the next question is: how can it be determined what > >> file is referred to what table and index? > > > > You're life will be simpler, and your setup will be faster without > > having to muck about with it, if you just buy a good RAID controller > > with battery backed cache. LSI/Megaraid and Adaptec both make > > serviceable controllers for reasonable prices, and as you add drives, > > the speed just goes up, no muddling around with sym links. > > This works to a limited extent. For very large databases, maximum > throughput of I/O is the paramount factor for database performance. With > raid controllers, your LUN is still limited to a small number of disks. > PostgreSQL can only write on a file system, but Oracle, SAP DB, DB2, etc > can write directly to disk (raw I/O). With large databases it is > advantageous to spread a table across 100's of disks, if the table is > quite large. I don't know of any manufacturer that creates a 100 disk > raid array yet. You can run up to four LSI / Megaraids in one box, each with 3 UW SCSI interfaces, and they act as one unit. That's 3*4*15 = 180 disks max. With FC AL connections and four cards, it would be possible to approach 1000 drives. Of course, I'm not sure how fast any RAID card setup is gonna be with that many drives, but ya never know. My guess is that before you go there you buy a big external RAID box built for speed. We have a couple of 200+ drive external RAID5 storage boxes at work that are quite impressive. > Some of the problem can be addressed by using a volume manager (such as > LVM in Linux, or Veritas on Unix-like systems). This allows one to > create a volume using partitions from many disks. One can then create > a file system and mount it on the volume. Pretty much RAID arrays in software, which means no battery backed cache, which means it'll be fast at reading, but probably pretty slow at writes, epsecially if there's a lot of parallel access waiting to write to the database. > However, to get the best performance, Raw I/O capability is the best > way to go. Unsupported statement made as fact. I'm not saying it can't or isn't true, but my experience has been that large RAID5 arrays are a great compromise between maximum performance and reliability, giving a good measure of each. It doesn't take 100 drives to do well, even a dozen to two dozen will get you in the same basic range as splitting out files by hand with sym links without all the headache of chasing down the files, shutting down the database, linking it over etc...
> On Tue, 17 Feb 2004, Craig Thomas wrote: > >> > On Tue, 17 Feb 2004, Konstantin Tokar wrote: >> > >> >> Hi! >> >> Does PostgreSQL allow to create tables and indices of a single >> database on multiple disk drives with a purpose of increase >> >> performance as Oracle database does? If a symbolic reference is the >> only method then the next question is: how can it be determined >> what file is referred to what table and index? >> > >> > You're life will be simpler, and your setup will be faster without >> having to muck about with it, if you just buy a good RAID >> controller with battery backed cache. LSI/Megaraid and Adaptec >> both make serviceable controllers for reasonable prices, and as you >> add drives, the speed just goes up, no muddling around with sym >> links. >> >> This works to a limited extent. For very large databases, maximum >> throughput of I/O is the paramount factor for database performance. >> With raid controllers, your LUN is still limited to a small number of >> disks. PostgreSQL can only write on a file system, but Oracle, SAP DB, >> DB2, etc can write directly to disk (raw I/O). With large databases >> it is advantageous to spread a table across 100's of disks, if the >> table is quite large. I don't know of any manufacturer that creates a >> 100 disk raid array yet. > > You can run up to four LSI / Megaraids in one box, each with 3 UW SCSI > interfaces, and they act as one unit. That's 3*4*15 = 180 disks max. > > With FC AL connections and four cards, it would be possible to approach > 1000 drives. > > Of course, I'm not sure how fast any RAID card setup is gonna be with > that many drives, but ya never know. My guess is that before you go > there you buy a big external RAID box built for speed. We have a > couple of 200+ drive external RAID5 storage boxes at work that are > quite impressive. That's a good point. But it seems that the databases that are the leaders of the TPC numbers seem to be the Oracles of the world. I know that a former company I worked for publised TPC numbers using Oracle with Raw I/O to get the performance up. However, it would be interesting for us to conduct a small scale test using a couple of HW Raid systems configured so that a single file system can be mounted, then run the OSDL dbt workloads. The resluts could then be compared with current results that have been captured. > >> Some of the problem can be addressed by using a volume manager (such >> as LVM in Linux, or Veritas on Unix-like systems). This allows one to >> create a volume using partitions from many disks. One can then create >> a file system and mount it on the volume. > > Pretty much RAID arrays in software, which means no battery backed > cache, which means it'll be fast at reading, but probably pretty slow > at writes, epsecially if there's a lot of parallel access waiting to > write to the database. > >> However, to get the best performance, Raw I/O capability is the best >> way to go. > > Unsupported statement made as fact. I'm not saying it can't or isn't > true, but my experience has been that large RAID5 arrays are a great > compromise between maximum performance and reliability, giving a good > measure of each. It doesn't take 100 drives to do well, even a dozen to > two dozen will get you in the same basic range as splitting out files > by hand with sym links without all the headache of chasing down the > files, shutting down the database, linking it over etc... Whoops, you're right. I was typing faster than I was thinking. I was assuming a JBOD set up rather than a RAID storage subsystem. SAN units such as an EMC or Shark usualy have 4-16 GB cache and thus the I/O's go pretty quick for really large databases. > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org
On Tue, 17 Feb 2004, Craig Thomas wrote: > > On Tue, 17 Feb 2004, Craig Thomas wrote: > > > >> > On Tue, 17 Feb 2004, Konstantin Tokar wrote: > >> > > >> >> Hi! > >> >> Does PostgreSQL allow to create tables and indices of a single > >> database on multiple disk drives with a purpose of increase > >> >> performance as Oracle database does? If a symbolic reference is the > >> only method then the next question is: how can it be determined > >> what file is referred to what table and index? > >> > > >> > You're life will be simpler, and your setup will be faster without > >> having to muck about with it, if you just buy a good RAID > >> controller with battery backed cache. LSI/Megaraid and Adaptec > >> both make serviceable controllers for reasonable prices, and as you > >> add drives, the speed just goes up, no muddling around with sym > >> links. > >> > >> This works to a limited extent. For very large databases, maximum > >> throughput of I/O is the paramount factor for database performance. > >> With raid controllers, your LUN is still limited to a small number of > >> disks. PostgreSQL can only write on a file system, but Oracle, SAP DB, > >> DB2, etc can write directly to disk (raw I/O). With large databases > >> it is advantageous to spread a table across 100's of disks, if the > >> table is quite large. I don't know of any manufacturer that creates a > >> 100 disk raid array yet. > > > > You can run up to four LSI / Megaraids in one box, each with 3 UW SCSI > > interfaces, and they act as one unit. That's 3*4*15 = 180 disks max. > > > > With FC AL connections and four cards, it would be possible to approach > > 1000 drives. > > > > Of course, I'm not sure how fast any RAID card setup is gonna be with > > that many drives, but ya never know. My guess is that before you go > > there you buy a big external RAID box built for speed. We have a > > couple of 200+ drive external RAID5 storage boxes at work that are > > quite impressive. > > That's a good point. But it seems that the databases that are the > leaders of the TPC numbers seem to be the Oracles of the world. I > know that a former company I worked for publised TPC numbers using > Oracle with Raw I/O to get the performance up. But keep in mind, that in the TPC benchmarks, doing things that require lots of dba work don't tend to make the cost in the test go up (you can hide a lot of admin work in those things) while in real life, they do drive up the real cost of maintenance. I'd imagine that with Postgresql coming along nicely, it may well be that in a year or two, in the real world, you can just take the money you'd have spend on Oracle licenses and Oracle DBAs and just throw more drives at a problem to solve it. And still spend less money than you would on Oracle. :-)
lists2@tokar.ru (Konstantin Tokar) wrote: > Hi! > Does PostgreSQL allow to create tables and indices of a single > database on multiple disk drives with a purpose of increase > performance as Oracle database does? If a symbolic reference is the > only method then the next question is: how can it be determined what > file is referred to what table and index? It is possible to do this, albeit not trivially easily, by shutting down the database, moving the index to another filesystem, and using a symbolic link to connect it back in. The system table pg_class contains the relevant linkages. But it seems likely to me that using a smart RAID controller (e.g. - LSILogic MegaRAID) to link a whole lot of disks together to generate one enormous striped filesystem would be a more effective strategy, in the long run. Doing that, with a substantial array of disk drives, allows your disk subsystem to provide an analagous sort of performance increase without there being any need for a DBA to fiddle around with anything. If you have the DBA do the work, this means consuming some not-insubstantial amount of time for analysis as well as down-time for maintenance. And it will be necessary to have a somewhat-fragile "registry" of configuration information indicating what customizations were done. In contrast, throwing a smarter RAID controller at the problem costs only a few hundred dollars, and requires little or no analysis effort. And the RAID controller will affect _all_ cases where there could be I/O benefits from striping tables across multiple drives. -- If this was helpful, <http://svcs.affero.net/rm.php?r=cbbrowne> rate me http://cbbrowne.com/info/x.html The way to a man's heart is through the left ventricle.
Konstantin, > > >> Does PostgreSQL allow to create tables and indices of a single > > >> database on multiple disk drives with a purpose of increase > > >> performance as Oracle database does? If a symbolic reference is the > > >> only method then the next question is: how can it be determined what > > >> file is referred to what table and index? Howdy! I bet you're a bit taken aback by the discussion that ensued, and even more confused than before. You are actually asking about two related features: Tablespaces, which allows designating different directories/volumes for specific tables and indexes at creation time, and: Partitioned Tables, which allows the division of large tables and/or indexes horizontally along pre-defined criteria. The first, tablespaces, are under development and may make it for 7.5, or maybe not, but certainly in the version after that. The second, partitioned tables, is NOT under development because this feature lacks both a programmer and a clear specification. -- -Josh Berkus Aglio Database Solutions San Francisco