Thread: replication/redundancy
some searches on this have produced mixed results... do we have a stable means to replicate transactions between two physical servers, preferrably in master - master configuration where updates/inserts can be done on either database and the results replicated to the other master. Dave
On Fri, Jun 27, 2003 at 08:05:02PM -0400, Dave [Hawk-Systems] wrote: > some searches on this have produced mixed results... > > do we have a stable means to replicate transactions between two physical > servers, preferrably in master - master configuration where updates/inserts can > be done on either database and the results replicated to the other master. I've integrated an replication support in my object abstraction layer in php. (evrything is running over this layer - no direct queries in the application) in short words: each table has the fields inode_id (int4) and mtime (timestamp). on each write operation, the mtime must be updated. from time to time (i.e. once per minute) an script runs over the db, fetches out all new records and posts them to the other nodes. this way, evry node can write anytime (masterless). but this brings some problems on highly-written databases - conflicts may occour. i didnt find an better solution yet, but still works for all of my apps, even running over email :) cu -- --------------------------------------------------------------------- Enrico Weigelt == metux ITS Webhosting ab 5 EUR/Monat. UUCP, rawIP und vieles mehr. phone: +49 36207 519931 www: http://www.metux.de/ fax: +49 36207 519932 email: contact@metux.de cellphone: +49 174 7066481 smsgate: sms.weigelt@metux.de --------------------------------------------------------------------- Diese Mail wurde mit UUCP versandt. http://www.metux.de/uucp/
weigelt@metux.de wrote: >On Fri, Jun 27, 2003 at 08:05:02PM -0400, Dave [Hawk-Systems] wrote: > > >>some searches on this have produced mixed results... >> >>do we have a stable means to replicate transactions between two physical >>servers, preferrably in master - master configuration where updates/inserts can >>be done on either database and the results replicated to the other master. >> I fear up to date there is no such facility available. As far as I have understood your question you intend to copy all transactions from database-A to database-B and continously rollforward database-B, or so. It seems pgreplication will be the tool you are looking. it is under way but it is not here yet. It looks as if pgreplication will do much more than just enabling a hot standby. What I have read up to now looks more like a quorum-facility. I hope, I didn't get this wrong, because I would have use for something like this. Ciao, Toni
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sunday 29 June 2003 01:28, Toni Schlichting wrote: > weigelt@metux.de wrote: > It seems pgreplication will be the tool you are looking. it is under way > but it is > not here yet. It looks as if pgreplication will do much more than just > enabling a > hot standby. What I have read up to now looks more like a > quorum-facility. I hope, > I didn't get this wrong, because I would have use for something like this. > Yes, replication is going to be the next BIG thing in the database world. Right now, no one (not even Oracle, and I've experience with their system) has a good solution out there. The stuff that pgreplication is doing is earth-shatteringly huge. Read through the research paper and see for yourself what I mean. http://www.cs.mcgill.ca/~kemme/papers/vldb00.html What the original poster is looking for is called "peer-to-peer eager replication". "peer-to-peer" because there is no master server, "eager" because the two databases are always exactly the same. The really cool thing about pgreplication will be that it is hot-swappable. If one of the servers goes down, the whole thing will still work. You can also add databases to the system on-the-fly. That way, you can always keep one (or two or three) extra stand-by databases. - -- Jonathan Gardner <jgardner@jonathangardner.net> (was jgardn@alumni.washington.edu) Live Free, Use Linux! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQE/AFe9WgwF3QvpWNwRAvRxAKClx8twjsSyQ0uwqr1ukSUYeQtkuACgxxz3 tsl8mOP53S8PRwMPOMPPy24= =yDXw -----END PGP SIGNATURE-----
On Mon, Jun 30, 2003 at 08:31:09AM -0700, Jonathan Gardner wrote: <snip> > The really cool thing about pgreplication will be that it is > hot-swappable. If one of the servers goes down, the whole thing will > still work. You can also add databases to the system on-the-fly. > That way, you can always keep one (or two or three) extra stand-by databases. This is also provided by my system, but with some strict limitations: * all tables must have the same first three fields: inode_id, mtime, attr (im using inherited tables, perhaps thats not needed) * on each write operation, the mtime field must be set accurately (im currently not using an rule, since i dont know if there's an way to disabled this within an query -- perhaps someone can help here) * currently only an explicit sync-out is supported - from time to time evry table has to be scanned for new records * currently no real conflict handling perhaps we can improve this a little bit. cu -- --------------------------------------------------------------------- Enrico Weigelt == metux ITS Webhosting ab 5 EUR/Monat. UUCP, rawIP und vieles mehr. phone: +49 36207 519931 www: http://www.metux.de/ fax: +49 36207 519932 email: contact@metux.de cellphone: +49 174 7066481 smsgate: sms.weigelt@metux.de --------------------------------------------------------------------- Diese Mail wurde mit UUCP versandt. http://www.metux.de/uucp/
On Tue, Jul 01, 2003 at 08:22:37AM -0700, Jonathan Gardner wrote: <snip> > So you are using "lazy" rather than "eager" replication. correct. <snip> > 1) The data is not consistent. This means if you run the same select query > at the same time on the two databases, you may get different results. For > some situations, that is okay (like Usenet). For others, it is not. (like > registrations -- you'll sign up on one database, but you won't appear on the > other.) Yes. For those cases, an explicit syncout should be triggered as fast as possible. What you of course cannot is to store session data into the clustered DB. All requests to same session must go to the same host. In web applications this can be done by redirecting the client to an per-server-vhost before creating the session. (i.e. from www.metux.de to s[1..n].metux.de). So the session data is always available for the application. In the case of failure only those sessions are lost, which run on the failing server. Another solution could be an NFS-shared session directory or an rpc-based update notification system. Perhaps we could implement some remote locking features direct into the postmaster. <snip> > 3) These two factors above make using the second database as a hot-swappable > backup risky at best. You will lose some data when you switch to the backup, > unless changes to the database are so rare that the backup is usually up to > date. If that were the case, you probably don't need the backup in the first > place, because databases that don't do much tend not to be very important. For my current applications it seems ok that some DBs are one or two minutes behind. <snip> > The most obvious problem with this comes from incrementing a column. I had such an case. If i want to count objects, i normally put them into an table and count the rows (i.e. from an view). If both servers add an new object, there will be no conflict. <snip> > This would be bad for things like paypal, where your account would only > increase by one or the other account transfers, rather than both, if two > occured at the same time. Well, for such applications, the actual account value must be calculated from an checkpoint and the lasted transactions. Checkpoints may be made only by an master (which may change from time to time) cu -- --------------------------------------------------------------------- Enrico Weigelt == metux ITS Webhosting ab 5 EUR/Monat. UUCP, rawIP und vieles mehr. phone: +49 36207 519931 www: http://www.metux.de/ fax: +49 36207 519932 email: contact@metux.de cellphone: +49 174 7066481 smsgate: sms.weigelt@metux.de --------------------------------------------------------------------- Diese Mail wurde mit UUCP versandt. http://www.metux.de/uucp/
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Monday 30 June 2003 09:17, weigelt@metux.de wrote: > On Mon, Jun 30, 2003 at 08:31:09AM -0700, Jonathan Gardner wrote: > > * currently only an explicit sync-out is supported - from time to time > evry table has to be scanned for new records So you are using "lazy" rather than "eager" replication. I am sure you know the limitations for lazy replication. Let me enumerate them here for those of you who aren't familiar with this: 1) The data is not consistent. This means if you run the same select query at the same time on the two databases, you may get different results. For some situations, that is okay (like Usenet). For others, it is not. (like registrations -- you'll sign up on one database, but you won't appear on the other.) 2) The "other" process that does the synchronization is serial in nature. The processes that change the database are parallel in nature. It is very possible to have changes happening to the database faster than you can replicate them. This was a real problem at a web company I recently worked for that used lazy replication. Their backup database fell weeks behind the live database. It almost got to the point where recreating the entire database would've been faster than waiting for the replication process to catch up. 3) These two factors above make using the second database as a hot-swappable backup risky at best. You will lose some data when you switch to the backup, unless changes to the database are so rare that the backup is usually up to date. If that were the case, you probably don't need the backup in the first place, because databases that don't do much tend not to be very important. > > * currently no real conflict handling > What he is talking about here is what happens when two seperate processes are working on the same rows. PostgreSQL uses transactions and locking right now, so two processes on the same system cannot do this. However, his system cannot handle this at all when the two processes are on seperate machines. The most obvious problem with this comes from incrementing a column. If both processes try to increment the same column, then they will end up with the column incremented by one or the other, but not both. This would be bad for things like paypal, where your account would only increase by one or the other account transfers, rather than both, if two occured at the same time. > > perhaps we can improve this a little bit. > I would hope you spend some time researching what others have done. Relational databases are an area that a tremendous amount of solid research has already occured. Applying yourself to understand the research and projects that have gone before you will save yourself a lot of time replicating their work. In other words, "If I have seen farther, it is because I have stood on the shoulders of giants" to (mis?)quote Newton. Again, to re-emphasize why pgreplication is so cool and why everyone should be excited about this: 1) Database theory says that scaleable, eager replication is impossible. This is true in practice. 2) The Postgres-R team discovered a way to make scaleable, eager replication work. The restriction is that locks, once granted, may be aborted or revoked. 3) This means you will one day be able to setup a beowulf-type cluster of postgres databases that will rival the most powerful databases on earth today. - -- Jonathan Gardner <jgardner@jonathangardner.net> (was jgardn@alumni.washington.edu) Live Free, Use Linux! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQE/Aac+WgwF3QvpWNwRAgFxAJ9Mxesnc6Q3wLrUcL1Zz62AGLLjGACcCYJp zcV9rFm8TiqH90N6eSpRQnY= =/bFm -----END PGP SIGNATURE-----