Re: Parallel databases? - Mailing list pgsql-general

From Jurgen Defurne
Subject Re: Parallel databases?
Date
Msg-id 3972B5C4.9B33A111@glo.be
Whole thread Raw
In response to Parallel databases?  (A James Lewis <james@fsck.co.uk>)
List pgsql-general
A James Lewis wrote:

> What would you discribe Oracle Parallel Server then?

I know Oracle and PG. The big difference is that PG is an application which
runs
on top of an OS, while Oracle bypasses for its functioning a whole lot of the
OS
and replaces it with functions of its own.

This means that for these parallel systems, one needs a product which is
called
SQL*Net, which provides the functionality needed to channel data over a
network.
Only on top of that, Oracle Parallel Server is implemented.

The database administrator sets up the replication etc., but to the
programmer this
is all COMPLETELY TRANSPARENT!!

>
> I am well aware of RAID/Mirroring etc... but I want the ability for one
> database to seamlessly take over from the other (OR even with the
> appliction getting thrown off and having to re-connect) but the DATA must
> be kept in sync across both machines...
>
> What I want is some sort or replication, bi-directional would be nice but
> not vital...
>

Referring to the part above, this would mean that you will need to dig into
the code very deep, to the part where data is physically written to the data
files, and put code there by which it is possible to write this data across
the
network to the datafiles of the replication database. (Hey, guys, what would
you think of that ?)

>
> Even if the machine has mirrored disks its CPU can fail and a secondary
> machine is useless unless it has the identical data...

The hardware way of doing this (also see the High-Availability HOWTO) :
- Build a RAID : mirroring, etc, something which makes your data survive a
disk
crash
- SHARE this RAID between two CPU's (for sharing strategies, also see the HA
HOWTO)

When a CPU fails, the other one can take over with the exact data.
When a disk fails, your data is still safe, the system can continue running
until
the defective part is replaced.

> The only way round this would be to re-write the application to write to
> BOTH databases and this could be a problem with some pre-existing software
> packages and give the potential to get the DB's out of sync.
>
> Writing the application from scratch would mean that it was not so much of
> a problem.
>

If you do such a thing at the application level, it will always be a burden.
The
goal of a database programmer is to write solutions to problems, not to be
a systems programmer. If you have to take into account for every problem
that you must solve, that data must be replicated, then you will double the
time needed to write every application. That is why I put the emphasis of
replication on the system level.

>
> James Lewis wrote:
>
> > Does anyone have any suggestions for a way to keep 2 databases in sync?
> >
> > Ideally updates need to be made to both... this can't be too uncommon a
> > requirement..... any kind of HA would need it....
>
> No. The way HA works is that the system is made in such a way that you
> can't lose data, or that you don't lose CPU cycles. HA does not make any
> assumptions on the kind of applications that are running on the system.
>
> If you want to experiment with HA, start with building a mirroring disk on
> your Linux system to get the feel of it.
>
> Then try to asses what you really want : low down-time or 7x24 operation.
> This is what determines your HA system.
>
> If you don't want to lose CPY cycles, then you have to build a cluster
> with e.g. two CPU's. These should share their mass storage. This mass
> storage should be organised as RAID. With hot-plug capabilities, it is
> possible to keep the system running either if a CPU goes down or if a
> drive fails.
>
> The worst thing that you can do is to base the implementation of your
> application upon the fact that the system should have high-availability
> requirements. That is not a database issue, but an operating system issue.
>

Jurgen Defurne



pgsql-general by date:

Previous
From: Philip Warner
Date:
Subject: Answers to questions on changes to PostgreSQL license
Next
From: "Eric Jain"
Date:
Subject: RE: COUNT DISTINCT