Thread: Enterprise readiness - mirroring / incremental backup solutions?
I'm currently starting to evaluate Open Source RDBMSs for use in a high-volume, high-availability environment. My main requirements are: 1. Ability to store approx 200Gb of data, with about 5Gb of data changing per day. 2. Support for high number of concurrent short transactions under REPEATABLE READ transaction isolation with row-level locking (or equivalent optimistic concurrency control). 3. Fast (i.e. < 5 mins) failover time to a constantly mirrored secondary database server. 4. Ability to perform continous network backups such that in the event of both the primary database server and mirrored database server suffering total failure, no more than 1 hour of data is lost. First impressions are that PostgreSQL (and SAP DB, but definitely not MySQL) appears to meet requirements 1 & 2, but I'm not sure whether it (or any Open Source db) can currently meet requirements 3 & 4. My understanding is that while PostgreSQL offers hot backups "out of the box", it only offers full backups and does not have built in support for mirroring. Clearly, backing up 200Gb of data hourly is not feasible. Are there any third part solutions capable of making PostgreSQL meet requirements 3 & 4? I'd imagine it may be possible to satisfy 3. using file system level mirroring, but I'd appreciate it if someone could confirm this. My last question is somewhat pie-in-the sky, but assuming that PostgreSQL cannot currently meet requirements 3 & 4 even with 3rd party solutions, what are people's gut reactions to whether a small team (e.g. 5-6) of experienced, full-time paid developers could add mirroring and incremental backup support to PostgreSQL within 18 months? Cheers, Kieran Elby
Kieran wrote: > I'm currently starting to evaluate Open Source RDBMSs for use in a > high-volume, high-availability environment. > > My main requirements are: > > 1. Ability to store approx 200Gb of data, with about 5Gb of data > changing per day. OK. > 2. Support for high number of concurrent short transactions under > REPEATABLE READ transaction isolation with row-level locking (or > equivalent optimistic concurrency control). We don't have REPEATABLE READ, as far as I know. We have READ COMMITTED and SERIALIZABLE. > 3. Fast (i.e. < 5 mins) failover time to a constantly mirrored secondary > database server. No mirroring. We are working on replication. > 4. Ability to perform continous network backups such that in the event > of both the primary database server and mirrored database server > suffering total failure, no more than 1 hour of data is lost. Nope. Point-in-time recovery will be in 7.4. > First impressions are that PostgreSQL (and SAP DB, but definitely not > MySQL) appears to meet requirements 1 & 2, but I'm not sure whether it > (or any Open Source db) can currently meet requirements 3 & 4. Right. > My understanding is that while PostgreSQL offers hot backups "out of the > box", it only offers full backups and does not have built in support for > mirroring. Clearly, backing up 200Gb of data hourly is not feasible. Right. > Are there any third part solutions capable of making PostgreSQL meet > requirements 3 & 4? There are master-slave replications in /contrib, specificially rserv and dbmirror. > I'd imagine it may be possible to satisfy 3. using file system level > mirroring, but I'd appreciate it if someone could confirm this. Uh, yes, you can use RAID. > My last question is somewhat pie-in-the sky, but assuming that > PostgreSQL cannot currently meet requirements 3 & 4 even with 3rd party > solutions, what are people's gut reactions to whether a small team (e.g. > 5-6) of experienced, full-time paid developers could add mirroring and > incremental backup support to PostgreSQL within 18 months? Easily done. We have a point-in-time recovery patch ready for 7.4 already. Full multi-master replication is being worked on at: http://gborg.postgresql.org/project/pgreplication/projdisplay.php -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Tuesday 19 Nov 2002 11:42 am, Kieran wrote: > I'm currently starting to evaluate Open Source RDBMSs for use in a > high-volume, high-availability environment. > > My main requirements are: > > 1. Ability to store approx 200Gb of data, with about 5Gb of data > changing per day. There are people with databases larger than this. Read up on VACUUM as regards changing data. > 2. Support for high number of concurrent short transactions under > REPEATABLE READ transaction isolation with row-level locking (or > equivalent optimistic concurrency control). See the section: Multiversion Concurrency Control in the online manuals for details of PG's transaction levels. > 3. Fast (i.e. < 5 mins) failover time to a constantly mirrored secondary > database server. > > 4. Ability to perform continous network backups such that in the event > of both the primary database server and mirrored database server > suffering total failure, no more than 1 hour of data is lost. There are some replication projects and one commercial addon that I know of. Search the mailing list archives for mention of the options, and perhaps check the techdocs.postgresql.org site. > Are there any third part solutions capable of making PostgreSQL meet > requirements 3 & 4? > > I'd imagine it may be possible to satisfy 3. using file system level > mirroring, but I'd appreciate it if someone could confirm this. > > My last question is somewhat pie-in-the sky, but assuming that > PostgreSQL cannot currently meet requirements 3 & 4 even with 3rd party > solutions, what are people's gut reactions to whether a small team (e.g. > 5-6) of experienced, full-time paid developers could add mirroring and > incremental backup support to PostgreSQL within 18 months? Check out gborg.postgresql.org for a replication project that is I believe intended to be merged into PostgreSQL at some point. The people on that project would be able to tell you more about timescales and whether an offer of help could accelerate this. -- Richard Huxton
On Tue, Nov 19, 2002 at 11:42:30 +0000, Kieran <kieran@dunelm.org.uk> wrote: > > I'd imagine it may be possible to satisfy 3. using file system level > mirroring, but I'd appreciate it if someone could confirm this. There was some discussion about this recntly on the list and my reading of the responses is that that will NOT work.
Kieran <kieran@dunelm.org.uk> writes: > My main requirements are: > 1. Ability to store approx 200Gb of data, with about 5Gb of data > changing per day. Given sufficient iron, no problem. > 2. Support for high number of concurrent short transactions under > REPEATABLE READ transaction isolation with row-level locking (or > equivalent optimistic concurrency control). What do you consider a "high number"? I think we'd max out somewhere on the order of a thousand simultaneous transactions (again, given respectable iron). > 3. Fast (i.e. < 5 mins) failover time to a constantly mirrored secondary > database server. People are doing this today using rserv (or better, the commercial version available from PostgreSQL Inc). It's a bit of a pain in the neck to work with, IMHO. Check the pgreplication mailing list for ongoing work on better solutions. > 4. Ability to perform continous network backups such that in the event > of both the primary database server and mirrored database server > suffering total failure, no more than 1 hour of data is lost. The only tool we have for this today is pg_dump, and as you say backing up 200Gb every hour doesn't seem real promising. I do wonder though why you don't just redefine the problem: why not mirror to two slaves at dispersed locations? There is work being done on point-in-time recovery (ie, beefing up the WAL facility to the point where WAL logs could usefully be archived). That will eventually provide a more direct answer to your concern. > I'd imagine it may be possible to satisfy 3. using file system level > mirroring, but I'd appreciate it if someone could confirm this. I wouldn't trust such an approach... > My last question is somewhat pie-in-the sky, but assuming that > PostgreSQL cannot currently meet requirements 3 & 4 even with 3rd party > solutions, what are people's gut reactions to whether a small team (e.g. > 5-6) of experienced, full-time paid developers could add mirroring and > incremental backup support to PostgreSQL within 18 months? If you're thinking of bringing in people with no prior experience with Postgres, I'd counsel not. The learning curve is too long. If you're thinking of paying existing developers to work on this, I can name several people who'd love to take your money ;-). regards, tom lane
On Tue, 2002-11-19 at 09:08, Bruce Momjian wrote: > Kieran wrote: > > > My last question is somewhat pie-in-the sky, but assuming that > > PostgreSQL cannot currently meet requirements 3 & 4 even with 3rd party > > solutions, what are people's gut reactions to whether a small team (e.g. > > 5-6) of experienced, full-time paid developers could add mirroring and > > incremental backup support to PostgreSQL within 18 months? > > Easily done. We have a point-in-time recovery patch ready for 7.4 > already. Full multi-master replication is being worked on at: > > http://gborg.postgresql.org/project/pgreplication/projdisplay.php > Maybe I am overly optimistic, but I would think you could do this within 6 months without too much trouble with a team of 5-6 developers working on it. You could probably cut back on developers and hire consulting services from one of the postgresql support companies that already have a replication solution. Robert Treat
Kieran: I am also looking for incremental backups for postgreSQL. We have been looking at ERSERVER as a replication engine to address failover. Looks like the replication can be done with ERSERVER (or if you only need a small replica, I think rserv might be sufficient) and linux-HA to support the failover. We are looking at the scripts to support the failover converting the slave into the master (and vice-versa). Incremental backup/restore is still something we have our list to research but have not yet had time to tackle. If you start down using PostgreSQL, please let me know. Perhaps our teams can work together. Charlie Kieran wrote: > I'm currently starting to evaluate Open Source RDBMSs for use in a > high-volume, high-availability environment. > > My main requirements are: > > 1. Ability to store approx 200Gb of data, with about 5Gb of data > changing per day. > > 2. Support for high number of concurrent short transactions under > REPEATABLE READ transaction isolation with row-level locking (or > equivalent optimistic concurrency control). > > 3. Fast (i.e. < 5 mins) failover time to a constantly mirrored > secondary database server. > > 4. Ability to perform continous network backups such that in the event > of both the primary database server and mirrored database server > suffering total failure, no more than 1 hour of data is lost. > > First impressions are that PostgreSQL (and SAP DB, but definitely not > MySQL) appears to meet requirements 1 & 2, but I'm not sure whether it > (or any Open Source db) can currently meet requirements 3 & 4. > > My understanding is that while PostgreSQL offers hot backups "out of > the box", it only offers full backups and does not have built in > support for mirroring. Clearly, backing up 200Gb of data hourly is not > feasible. > > Are there any third part solutions capable of making PostgreSQL meet > requirements 3 & 4? > > I'd imagine it may be possible to satisfy 3. using file system level > mirroring, but I'd appreciate it if someone could confirm this. > > My last question is somewhat pie-in-the sky, but assuming that > PostgreSQL cannot currently meet requirements 3 & 4 even with 3rd > party solutions, what are people's gut reactions to whether a small > team (e.g. 5-6) of experienced, full-time paid developers could add > mirroring and incremental backup support to PostgreSQL within 18 months? > > Cheers, > Kieran Elby > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) -- Charles H. Woloszynski ClearMetrix, Inc. 115 Research Drive Bethlehem, PA 18015 tel: 610-419-2210 x400 fax: 240-371-3256 web: www.clearmetrix.com
On Tue, Nov 19, 2002 at 11:42:30AM +0000, Kieran wrote: > 3. Fast (i.e. < 5 mins) failover time to a constantly mirrored secondary > database server. We currently do this with the commercial version of rserv; it's available from PostgreSQL, Inc. They're working on making it a little less hairy to use; but even though it is a little awkward, it works for us, and I suspect would handle your problem. > 4. Ability to perform continous network backups such that in the event > of both the primary database server and mirrored database server > suffering total failure, no more than 1 hour of data is lost. I'd do this with multiple mirrors. eRServer supports multiple slaves. Point-in-time recovery, which is I guess what you want, is high on our list of desired features, too, and I gather it will be in 7.4. > I'd imagine it may be possible to satisfy 3. using file system level > mirroring, but I'd appreciate it if someone could confirm this. As I understand it, this is pretty risky. > My last question is somewhat pie-in-the sky, but assuming that > PostgreSQL cannot currently meet requirements 3 & 4 even with 3rd party > solutions, what are people's gut reactions to whether a small team (e.g. > 5-6) of experienced, full-time paid developers could add mirroring and > incremental backup support to PostgreSQL within 18 months? I have heard reasonably well-informed estimates that the multi-master replication system could be completed in 18 months by a sufficiently familiar developer. You'll probably want to look to the replication project list for more details. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
Kieran wrote: > I'm currently starting to evaluate Open Source RDBMSs for use in a > high-volume, high-availability environment. > > <rest snipped> OK, thanks to all who replied. In case anyone's curious as to the motivation of my question, I've been asked to look into rough first year hardware + software costs required for a 'web services' offering, assuming it'll go live Q4 2004. (And how to minimise them!). So far, the only major non in-house software component where an open source product is not quite ready for what we want to do is the RDBMS. I'm guessing that an RDBMS with full transaction + recovery support is of similar (if not greater) complexity than a POSIX kernel, and far more complex than a web server. But I'm sure PostgreSQL developers already know that.... Now I've got some pointers, I'm going to have to do some more research myself, but from what I gather, it looks like there's a good chance that by version 7.4 PostgreSQL will be able to compete with Oracle / Informix for the particular application domain I'm interested in. Certainly, the pgreplication project looks very encouraging, as does the Point-in-time recovery mentioned. I'll also look into the (non-free) eRServer, as well as rserv and dbmirror. As Tom Lane pointed out, focusing on mirroring to a remote slave rather than requiring incremental backup does make more sense (at least to me). Unfortunately I'm not in a position to pay external developers even though I suspect doing so would be cheaper than, say, a 16-CPU Oracle licence :-). Hopefully, we'll eventually be able to contribute something back either to PostgreSQL or the open source community in general. Thanks once again, Regards, Kieran Elby
On Tue, Nov 19, 2002 at 04:18:35PM +0000, Kieran wrote: > Unfortunately I'm not in a position to pay external developers even > though I suspect doing so would be cheaper than, say, a 16-CPU Oracle > licence :-). For what it's worth, I can tell you for sure that consulting fees for PostgreSQL projects from at least one company are _way_ below what you'd need to spend for even a modest apportionment of Oracle licenses. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110