Re: Few questions on postgresql (dblink, 2pc, clustering) - Mailing list pgsql-general
From | Christopher Browne |
---|---|
Subject | Re: Few questions on postgresql (dblink, 2pc, clustering) |
Date | |
Msg-id | m3vffbqoby.fsf@wolfe.cbbrowne.com Whole thread Raw |
In response to | Re: Few questions on postgresql (dblink, 2pc, clustering) (Jim Worke <jimworke@inbox.lv>) |
Responses |
Re: Few questions on postgresql (dblink, 2pc, clustering)
|
List | pgsql-general |
After a long battle with technology, mendola@bigfoot.com (Gaetano Mendola), an earthling, wrote: > | MySQL does not allow cross-server database connection such as dblink. So, > | we're thinking of 3 alternatives: > | > | 1) Wait for MySQL clustering to be stable and put all our databases in the > | cluster > | 2) Migrate to PostgreSQL and use dblink to solve the referential integrity > | 3) Migrate to PostgreSQL clustering solution > > May I know why are you sticky on the idea of spread your database > among various servers ? Free your mysql-minded. If you idea is an > horizontal scale solution then open your wallet and buy Oracle. > Postgresql scale very well vertically. Indeed. It seems quite unclear what the proposed merit of "clustering" is _supposed_ to be, and I think it bears asking Jim Worke to explain in more detail what they're trying to accomplish by it. - If the goal is improved redundancy, then I'd argue that doing all the updates on the "master" and having several "hot replicas" being maintained using Slony-I would accomplish that without technology changes. - If the goal is to somehow get better performance by partitioning work across multiple servers, then this can be attained by having those several "hot replicas," and directing as many read-only operations to replicas as possible. Updates have to go to the "master;" by dropping out query load, that lets the "master" be occupied primarily with write operations. - System reliability does NOT improve if write operations are spread across several servers. We have been setting up extra replicas of some databases on some new servers lately, and people that _haven't_ thought it out have briefly imagined it a good idea to spread the 'masters' across more servers, which would _hurt_ reliability, in fact. The situation we have is that our app needs access to two databases: one that stores "state," and another that logs activity. The theory that people come up with is that when we have 3 main "big servers," we should write the "state" to one, and logs to another. Reality rears its ugly head: Doing that makes the system more vulnerable, as if _either_ of those servers goes down, that will cause the application to go into convulsions. Putting "state" and "logs" on the same server, and replicating everywhere else is, in fact, the more reliable choice. If the ONE server that's "master" goes down, the application won't be happy, but that was always going to be the case. The point: Distributing writes across many hosts makes the system vulnerable to the possibility of _any_ of them going down. Furthermore, it is not evident that distributing writes will be able to actually improve performance, because it introduces substantial additional communications overhead. At some point, there will be parts of the write activities that have to be handled in a serial manner, in one place. For instance, updating and reporting account balances must ultimately be thus handled. If there are 10 customers fighting over who gets to order your last 20 pallets of "Grapple Grommets," the handling of who actually gets that inventory has _got_ to be serialized in one place if you don't want to run the risk of the over-commitment of perhaps promising 200 pallets worth when you only have 20. 2PC provides a way of managing that serialization across multiple databases; it has two inherent problems relating to what is already said above: 1. The communications overhead involved in coordinating requests across multiple databases means that it's unlikely to be a "performance win"; 2. There are deadlock situations that 2PC is vulnerable to that make it _less_ reliable than having just one database, when there is the possibility of hardware or communications failure. Note: These aren't PostgreSQL-specific issues; they apply to the various classes of "clustering solutions." It may be that Oracle or DB2 or Informix have some particular features that partially relieve some of the performance problems with "partitioning," but that still leaves a big bill to pay, and you'd better be sure you're getting some actual value for the extra coin... -- let name="cbbrowne" and tld="cbbrowne.com" in String.concat "@" [name;tld];; http://www3.sympatico.ca/cbbrowne/spiritual.html Signs of a Klingon Programmer #8: "What is this talk of 'release'? Klingons do not make software 'releases.' Our software 'escapes' leaving a bloody trail of designers and quality assurance people in its wake."
pgsql-general by date: