Thread: Enterprise readiness - mirroring / incremental backup solutions?

Enterprise readiness - mirroring / incremental backup solutions?

From

Kieran

Date:

19 November 2002, 08:21:57

I'm currently starting to evaluate Open Source RDBMSs for use in a
high-volume, high-availability environment.

My main requirements are:

1. Ability to store approx 200Gb of data, with about 5Gb of data
changing per day.

2. Support for high number of concurrent short transactions under
REPEATABLE READ transaction isolation with row-level locking (or
equivalent optimistic concurrency control).

3. Fast (i.e. < 5 mins) failover time to a constantly mirrored secondary
database server.

4. Ability to perform continous network backups such that in the event
of both the primary database server and mirrored database server
suffering total failure, no more than 1 hour of data is lost.

First impressions are that PostgreSQL (and SAP DB, but definitely not
MySQL) appears to meet requirements 1 & 2, but I'm not sure whether it
(or any Open Source db) can currently meet requirements 3 & 4.

My understanding is that while PostgreSQL offers hot backups "out of the
box", it only offers full backups and does not have built in support for
mirroring. Clearly, backing up 200Gb of data hourly is not feasible.

Are there any third part solutions capable of making PostgreSQL meet
requirements 3 & 4?

I'd imagine it may be possible to satisfy 3. using file system level
mirroring, but I'd appreciate it if someone could confirm this.

My last question is somewhat pie-in-the sky, but assuming that
PostgreSQL cannot currently meet requirements 3 & 4 even with 3rd party
solutions, what are people's gut reactions to whether a small team (e.g.
5-6) of experienced, full-time paid developers could add mirroring and
incremental backup support to PostgreSQL within 18 months?

Cheers,
Kieran Elby

Re: Enterprise readiness - mirroring / incremental backup

From

Bruce Momjian

Date:

19 November 2002, 09:10:04

Kieran wrote:
> I'm currently starting to evaluate Open Source RDBMSs for use in a
> high-volume, high-availability environment.
>
> My main requirements are:
>
> 1. Ability to store approx 200Gb of data, with about 5Gb of data
> changing per day.

OK.

> 2. Support for high number of concurrent short transactions under
> REPEATABLE READ transaction isolation with row-level locking (or
> equivalent optimistic concurrency control).

We don't have REPEATABLE READ, as far as I know.  We have READ COMMITTED
and SERIALIZABLE.

> 3. Fast (i.e. < 5 mins) failover time to a constantly mirrored secondary
> database server.

No mirroring.  We are working on replication.

> 4. Ability to perform continous network backups such that in the event
> of both the primary database server and mirrored database server
> suffering total failure, no more than 1 hour of data is lost.

Nope.  Point-in-time recovery will be in 7.4.

> First impressions are that PostgreSQL (and SAP DB, but definitely not
> MySQL) appears to meet requirements 1 & 2, but I'm not sure whether it
> (or any Open Source db) can currently meet requirements 3 & 4.

Right.

> My understanding is that while PostgreSQL offers hot backups "out of the
> box", it only offers full backups and does not have built in support for
> mirroring. Clearly, backing up 200Gb of data hourly is not feasible.


Right.

> Are there any third part solutions capable of making PostgreSQL meet
> requirements 3 & 4?

There are master-slave replications in /contrib, specificially rserv and
dbmirror.

> I'd imagine it may be possible to satisfy 3. using file system level
> mirroring, but I'd appreciate it if someone could confirm this.

Uh, yes, you can use RAID.

> My last question is somewhat pie-in-the sky, but assuming that
> PostgreSQL cannot currently meet requirements 3 & 4 even with 3rd party
> solutions, what are people's gut reactions to whether a small team (e.g.
> 5-6) of experienced, full-time paid developers could add mirroring and
> incremental backup support to PostgreSQL within 18 months?

Easily done. We have a point-in-time recovery patch ready for 7.4
already.  Full multi-master replication is being worked on at:

    http://gborg.postgresql.org/project/pgreplication/projdisplay.php

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Enterprise readiness - mirroring / incremental backup solutions?

From

Richard Huxton

Date:

19 November 2002, 09:15:28

On Tuesday 19 Nov 2002 11:42 am, Kieran wrote:
> I'm currently starting to evaluate Open Source RDBMSs for use in a
> high-volume, high-availability environment.
>
> My main requirements are:
>
> 1. Ability to store approx 200Gb of data, with about 5Gb of data
> changing per day.

There are people with databases larger than this. Read up on VACUUM as regards
changing data.

> 2. Support for high number of concurrent short transactions under
> REPEATABLE READ transaction isolation with row-level locking (or
> equivalent optimistic concurrency control).

See the section: Multiversion Concurrency Control in the online manuals for
details of PG's transaction levels.

> 3. Fast (i.e. < 5 mins) failover time to a constantly mirrored secondary
> database server.
>
> 4. Ability to perform continous network backups such that in the event
> of both the primary database server and mirrored database server
> suffering total failure, no more than 1 hour of data is lost.

There are some replication projects and one commercial addon that I know of.
Search the mailing list archives for mention of the options, and perhaps
check the techdocs.postgresql.org site.

> Are there any third part solutions capable of making PostgreSQL meet
> requirements 3 & 4?
>
> I'd imagine it may be possible to satisfy 3. using file system level
> mirroring, but I'd appreciate it if someone could confirm this.
>
> My last question is somewhat pie-in-the sky, but assuming that
> PostgreSQL cannot currently meet requirements 3 & 4 even with 3rd party
> solutions, what are people's gut reactions to whether a small team (e.g.
> 5-6) of experienced, full-time paid developers could add mirroring and
> incremental backup support to PostgreSQL within 18 months?

Check out gborg.postgresql.org for a replication project that is I believe
intended to be merged into PostgreSQL at some point. The people on that
project would be able to tell you more about timescales and whether an offer
of help could accelerate this.

--
  Richard Huxton

Re: Enterprise readiness - mirroring / incremental backup solutions?

From

Bruno Wolff III

Date:

19 November 2002, 09:24:26

On Tue, Nov 19, 2002 at 11:42:30 +0000,
  Kieran <kieran@dunelm.org.uk> wrote:
>
> I'd imagine it may be possible to satisfy 3. using file system level
> mirroring, but I'd appreciate it if someone could confirm this.

There was some discussion about this recntly on the list and my reading
of the responses is that that will NOT work.

Re: Enterprise readiness - mirroring / incremental backup solutions?

From

Tom Lane

Date:

19 November 2002, 09:46:50

Kieran <kieran@dunelm.org.uk> writes:
> My main requirements are:
> 1. Ability to store approx 200Gb of data, with about 5Gb of data
> changing per day.

Given sufficient iron, no problem.

> 2. Support for high number of concurrent short transactions under
> REPEATABLE READ transaction isolation with row-level locking (or
> equivalent optimistic concurrency control).

What do you consider a "high number"?  I think we'd max out somewhere
on the order of a thousand simultaneous transactions (again, given
respectable iron).

> 3. Fast (i.e. < 5 mins) failover time to a constantly mirrored secondary
> database server.

People are doing this today using rserv (or better, the commercial
version available from PostgreSQL Inc).  It's a bit of a pain in the
neck to work with, IMHO.  Check the pgreplication mailing list for
ongoing work on better solutions.

> 4. Ability to perform continous network backups such that in the event
> of both the primary database server and mirrored database server
> suffering total failure, no more than 1 hour of data is lost.

The only tool we have for this today is pg_dump, and as you say backing
up 200Gb every hour doesn't seem real promising.  I do wonder though
why you don't just redefine the problem: why not mirror to two slaves
at dispersed locations?

There is work being done on point-in-time recovery (ie, beefing up the
WAL facility to the point where WAL logs could usefully be archived).
That will eventually provide a more direct answer to your concern.

> I'd imagine it may be possible to satisfy 3. using file system level
> mirroring, but I'd appreciate it if someone could confirm this.

I wouldn't trust such an approach...

> My last question is somewhat pie-in-the sky, but assuming that
> PostgreSQL cannot currently meet requirements 3 & 4 even with 3rd party
> solutions, what are people's gut reactions to whether a small team (e.g.
> 5-6) of experienced, full-time paid developers could add mirroring and
> incremental backup support to PostgreSQL within 18 months?

If you're thinking of bringing in people with no prior experience with
Postgres, I'd counsel not.  The learning curve is too long.  If you're
thinking of paying existing developers to work on this, I can name
several people who'd love to take your money ;-).

            regards, tom lane

Re: Enterprise readiness - mirroring / incremental backup

From

Robert Treat

Date:

19 November 2002, 09:55:57

On Tue, 2002-11-19 at 09:08, Bruce Momjian wrote:
> Kieran wrote:
>
> > My last question is somewhat pie-in-the sky, but assuming that
> > PostgreSQL cannot currently meet requirements 3 & 4 even with 3rd party
> > solutions, what are people's gut reactions to whether a small team (e.g.
> > 5-6) of experienced, full-time paid developers could add mirroring and
> > incremental backup support to PostgreSQL within 18 months?
>
> Easily done. We have a point-in-time recovery patch ready for 7.4
> already.  Full multi-master replication is being worked on at:
>
>     http://gborg.postgresql.org/project/pgreplication/projdisplay.php
>

Maybe I am overly optimistic, but I would think you could do this within
6 months without too much trouble with a team of 5-6 developers working
on it. You could probably cut back on developers and hire consulting
services from one of the postgresql support companies that already have
a replication solution.

Robert Treat

Re: Enterprise readiness - mirroring / incremental backup

From

"Charles H. Woloszynski"

Date:

19 November 2002, 20:22:34

Kieran:

I am also looking for incremental backups for postgreSQL.  We have been
looking at ERSERVER as a replication engine to address failover. Looks
like the replication can be done with ERSERVER (or if you only need a
small replica, I think rserv might be sufficient) and linux-HA to
support the failover.  We are looking at the scripts to support the
failover converting the slave into the master (and vice-versa).
 Incremental backup/restore is still something we have our list to
research but have not yet had time to tackle.

If you start down using PostgreSQL, please let me know.  Perhaps our
teams can work together.

Charlie



Kieran wrote:

> I'm currently starting to evaluate Open Source RDBMSs for use in a
> high-volume, high-availability environment.
>
> My main requirements are:
>
> 1. Ability to store approx 200Gb of data, with about 5Gb of data
> changing per day.
>
> 2. Support for high number of concurrent short transactions under
> REPEATABLE READ transaction isolation with row-level locking (or
> equivalent optimistic concurrency control).
>
> 3. Fast (i.e. < 5 mins) failover time to a constantly mirrored
> secondary database server.
>
> 4. Ability to perform continous network backups such that in the event
> of both the primary database server and mirrored database server
> suffering total failure, no more than 1 hour of data is lost.
>
> First impressions are that PostgreSQL (and SAP DB, but definitely not
> MySQL) appears to meet requirements 1 & 2, but I'm not sure whether it
> (or any Open Source db) can currently meet requirements 3 & 4.
>
> My understanding is that while PostgreSQL offers hot backups "out of
> the box", it only offers full backups and does not have built in
> support for mirroring. Clearly, backing up 200Gb of data hourly is not
> feasible.
>
> Are there any third part solutions capable of making PostgreSQL meet
> requirements 3 & 4?
>
> I'd imagine it may be possible to satisfy 3. using file system level
> mirroring, but I'd appreciate it if someone could confirm this.
>
> My last question is somewhat pie-in-the sky, but assuming that
> PostgreSQL cannot currently meet requirements 3 & 4 even with 3rd
> party solutions, what are people's gut reactions to whether a small
> team (e.g. 5-6) of experienced, full-time paid developers could add
> mirroring and incremental backup support to PostgreSQL within 18 months?
>
> Cheers,
> Kieran Elby
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)


--


Charles H. Woloszynski

ClearMetrix, Inc.
115 Research Drive
Bethlehem, PA 18015

tel: 610-419-2210 x400
fax: 240-371-3256
web: www.clearmetrix.com

Re: Enterprise readiness - mirroring / incremental backup solutions?

From

Andrew Sullivan

Date:

21 November 2002, 06:28:44

On Tue, Nov 19, 2002 at 11:42:30AM +0000, Kieran wrote:
> 3. Fast (i.e. < 5 mins) failover time to a constantly mirrored secondary
> database server.

We currently do this with the commercial version of rserv; it's
available from PostgreSQL, Inc.  They're working on making it a
little less hairy to use; but even though it is a little awkward, it
works for us, and I suspect would handle your problem.

> 4. Ability to perform continous network backups such that in the event
> of both the primary database server and mirrored database server
> suffering total failure, no more than 1 hour of data is lost.

I'd do this with multiple mirrors.  eRServer supports multiple
slaves.  Point-in-time recovery, which is I guess what you want, is
high on our list of desired features, too, and I gather it will be in
7.4.

> I'd imagine it may be possible to satisfy 3. using file system level
> mirroring, but I'd appreciate it if someone could confirm this.

As I understand it, this is pretty risky.

> My last question is somewhat pie-in-the sky, but assuming that
> PostgreSQL cannot currently meet requirements 3 & 4 even with 3rd party
> solutions, what are people's gut reactions to whether a small team (e.g.
> 5-6) of experienced, full-time paid developers could add mirroring and
> incremental backup support to PostgreSQL within 18 months?

I have heard reasonably well-informed estimates that the multi-master
replication system could be completed in 18 months by a sufficiently
familiar developer.  You'll probably want to look to the replication
project list for more details.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

Re: Enterprise readiness - mirroring / incremental backup solutions?

From

Kieran

Date:

26 November 2002, 14:18:25

Kieran wrote:

> I'm currently starting to evaluate Open Source RDBMSs for use in a
> high-volume, high-availability environment.
>
> <rest snipped>

OK, thanks to all who replied.

In case anyone's curious as to the motivation of my question, I've been
asked to look into rough first year hardware + software costs required
for a 'web services' offering, assuming it'll go live Q4 2004. (And how
to minimise them!).

So far, the only major non in-house software component where an open
source product is not quite ready for what we want to do is the RDBMS.
I'm guessing that an RDBMS with full transaction + recovery support is
of similar (if not greater) complexity than a POSIX kernel, and far more
complex than a web server. But I'm sure PostgreSQL developers already
know that....

Now I've got some pointers, I'm going to have to do some more research
myself, but from what I gather, it looks like there's a good chance that
by version 7.4 PostgreSQL will be able to compete with Oracle / Informix
for the particular application domain I'm interested in.

Certainly, the pgreplication project looks very encouraging, as does the
Point-in-time recovery mentioned. I'll also look into the (non-free)
eRServer, as well as rserv and dbmirror. As Tom Lane pointed out,
focusing on mirroring to a remote slave rather than requiring
incremental backup does make more sense (at least to me).

Unfortunately I'm not in a position to pay external developers even
though I suspect doing so would be cheaper than, say, a 16-CPU Oracle
licence :-). Hopefully, we'll eventually be able to contribute something
back either to PostgreSQL or the open source community in general.

Thanks once again,
Regards,
Kieran Elby

Re: Enterprise readiness - mirroring / incremental backup solutions?

From

Andrew Sullivan

Date:

28 November 2002, 11:16:47

On Tue, Nov 19, 2002 at 04:18:35PM +0000, Kieran wrote:

> Unfortunately I'm not in a position to pay external developers even
> though I suspect doing so would be cheaper than, say, a 16-CPU Oracle
> licence :-).

For what it's worth, I can tell you for sure that consulting fees for
PostgreSQL projects from at least one company are _way_ below what
you'd need to spend for even a modest apportionment of Oracle
licenses.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110