Re: PostgreSQL clustering VS MySQL clustering - Mailing list pgsql-performance
From | Alex Turner |
---|---|
Subject | Re: PostgreSQL clustering VS MySQL clustering |
Date | |
Msg-id | 33c6269f05012009236f451499@mail.gmail.com Whole thread Raw |
In response to | Re: PostgreSQL clustering VS MySQL clustering (Stephen Frost <sfrost@snowman.net>) |
List | pgsql-performance |
The problem is very large ammounts of data that needs to be both read and updated. If you replicate a system, you will need to intelligently route the reads to the server that has the data in RAM or you will always be hitting DIsk which is slow. This kind of routing AFAIK is not possible with current database technology, and you are still stuck for writes. Writes are always going to be the bane of any cluster. Clustering can give better parallel read performance i.e. large no. of clients accessing data simultaneously, but your write performance is always going to be bound by the underlying disk infrastructure, not even Oracle RAC can get around this (It uses multiple read nodes accessing the same set of database files underneath) Google solved the problem by building this intelligence into the middle tier, and using a distributed file system. Java Entity Beans are supposed to solve this problem somewhat by distributing the data across multiple servers in a cluster and allowing you to defer write syncing, but it really doesn't work all that well. The only way I know to solve this at the RDBMS layer is to configure a very powerfull disk layer, which is basicaly going to a SAN mesh with multiple cards on a single system with multiple IO boards, or an OS that clusters at the base level, thinking HP Superdome or z900. Even Opteron w/PCI-X cards has a limit of about 400MB/sec throughput on a single IO channel, and there are only two independent channels on any boards I know about. The other solution is to do what google did. Implement your own middle tier that knows how to route queries to the appropriate place. Each node can then have it's own independant database with it's own independant disk subsystem, and your throughput is only limited by your network interconnects, and your internet pipe. This kind of middle tier is really not that hard to if your data can easily be segmented. Each node runs it's own query sort and filter independantly, and supplies the result to the central data broker, which then collates the results and supplies them back to the user. Updated work in a similar fasion. The update comes into the central broker that decides which nodes it will affect, and then issues updates to those nodes. I've built this kind of architecture, if you want to do it, don't use Java unless you want to pay top dollar for your programmers, because it's hard to make it work well in Java (most JMS implementations suck, look at MQueue or a custom queue impl, forget XML it's too slow to serialize and deserialize requests). Alex Turner NetEconomist On Thu, 20 Jan 2005 11:13:25 -0500, Stephen Frost <sfrost@snowman.net> wrote: > * Richard_D_Levine@raytheon.com (Richard_D_Levine@raytheon.com) wrote: > > I think maybe a SAN in conjunction with tablespaces might be the answer. > > Still need one honking server. > > That's interesting- can a PostgreSQL partition be acress multiple > tablespaces? > > Stephen > > >
pgsql-performance by date: