Re: Few questions on postgresql (dblink, 2pc, clustering) - Mailing list pgsql-general

From Christopher Browne
Subject Re: Few questions on postgresql (dblink, 2pc, clustering)
Date
Msg-id m3vffbqoby.fsf@wolfe.cbbrowne.com
Whole thread Raw
In response to Re: Few questions on postgresql (dblink, 2pc, clustering)  (Jim Worke <jimworke@inbox.lv>)
Responses Re: Few questions on postgresql (dblink, 2pc, clustering)
List pgsql-general
After a long battle with technology, mendola@bigfoot.com (Gaetano Mendola), an earthling, wrote:
> | MySQL does not allow cross-server database connection such as dblink.  So,
> | we're thinking of 3 alternatives:
> |
> | 1) Wait for MySQL clustering to be stable and put all our databases in the
> | cluster
> | 2) Migrate to PostgreSQL and use dblink to solve the referential integrity
> | 3) Migrate to PostgreSQL clustering solution
>
> May I know why are you sticky on the idea of spread your database
> among various servers ? Free your mysql-minded. If you idea is an
> horizontal scale solution then open your wallet and buy Oracle.
> Postgresql scale very well vertically.

Indeed.

It seems quite unclear what the proposed merit of "clustering" is
_supposed_ to be, and I think it bears asking Jim Worke to explain in
more detail what they're trying to accomplish by it.

- If the goal is improved redundancy, then I'd argue that doing all
  the updates on the "master" and having several "hot replicas" being
  maintained using Slony-I would accomplish that without technology
  changes.

- If the goal is to somehow get better performance by partitioning
  work across multiple servers, then this can be attained by having
  those several "hot replicas," and directing as many read-only
  operations to replicas as possible.  Updates have to go to the
  "master;" by dropping out query load, that lets the "master" be
  occupied primarily with write operations.

- System reliability does NOT improve if write operations are
  spread across several servers.  We have been setting up extra
  replicas of some databases on some new servers lately, and people
  that _haven't_ thought it out have briefly imagined it a good idea
  to spread the 'masters' across more servers, which would _hurt_
  reliability, in fact.

  The situation we have is that our app needs access to two
  databases: one that stores "state," and another that logs activity.

  The theory that people come up with is that when we have 3 main "big
  servers," we should write the "state" to one, and logs to another.

  Reality rears its ugly head: Doing that makes the system more
  vulnerable, as if _either_ of those servers goes down, that will
  cause the application to go into convulsions.  Putting "state" and
  "logs" on the same server, and replicating everywhere else is, in
  fact, the more reliable choice.  If the ONE server that's "master"
  goes down, the application won't be happy, but that was always going
  to be the case.

The point: Distributing writes across many hosts makes the system
vulnerable to the possibility of _any_ of them going down.

Furthermore, it is not evident that distributing writes will be able
to actually improve performance, because it introduces substantial
additional communications overhead.

At some point, there will be parts of the write activities that have
to be handled in a serial manner, in one place.  For instance,
updating and reporting account balances must ultimately be thus
handled.  If there are 10 customers fighting over who gets to order
your last 20 pallets of "Grapple Grommets," the handling of who
actually gets that inventory has _got_ to be serialized in one place
if you don't want to run the risk of the over-commitment of perhaps
promising 200 pallets worth when you only have 20.

2PC provides a way of managing that serialization across multiple
databases; it has two inherent problems relating to what is already
said above:

 1.  The communications overhead involved in coordinating requests
     across multiple databases means that it's unlikely to be a
     "performance win";

 2.  There are deadlock situations that 2PC is vulnerable to that
     make it _less_ reliable than having just one database, when
     there is the possibility of hardware or communications
     failure.

Note: These aren't PostgreSQL-specific issues; they apply to the
various classes of "clustering solutions."  It may be that Oracle or
DB2 or Informix have some particular features that partially relieve
some of the performance problems with "partitioning," but that still
leaves a big bill to pay, and you'd better be sure you're getting
some actual value for the extra coin...
--
let name="cbbrowne" and tld="cbbrowne.com" in String.concat "@" [name;tld];;
http://www3.sympatico.ca/cbbrowne/spiritual.html
Signs of a  Klingon Programmer #8: "What is   this talk of  'release'?
Klingons do not make software 'releases.'  Our software 'escapes'
leaving a bloody  trail of designers and quality  assurance people in
its wake."

pgsql-general by date:

Previous
From: Harald Fuchs
Date:
Subject: Re: How to setup default value "0000-00-00" for "date"
Next
From: Thomas Hallgren
Date:
Subject: Re: Unsupported 3rd-party solutions (Was: Few questions on postgresql