Re: Replication options? - Mailing list pgsql-general
From | Andrew Sullivan |
---|---|
Subject | Re: Replication options? |
Date | |
Msg-id | 20040812102504.GA7952@libertyrms.info Whole thread Raw |
In response to | Re: Replication options? (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-general |
I'll try this again, since it doesn't seem to have made it to the list. On Wed, Aug 11, 2004 at 12:02:07PM -0400, Tom Lane wrote: > Now erServer did work for them, but it required significant amounts of > tuning and constant babysitting by the DBA. (If Andrew Sullivan is > paying attention to this thread, he can offer lots of gory details.) > I can also personally testify that getting erServer set up is a major > pain in the rear. I haven't messed with Slony, but all reports are that > it's a substantially better piece of code. I can indeed provide gory details. Erserver worked for us, and was able to handle the load we gave it (at times pretty substantial). But it had a number of flaws. Some of these were mere matters of implementation, and some were (in my view) fundamental. Since I've been observing radio silence on the list lately, I feel entitled to blather on at length now. So, below is the gore, and the reasons we finally decided to abandon erserver. This is very similar to the negative part of what I had to say at OSCON, so if you were bored by me there, you'll find this equally boring. A. First, the implementation faults. As Vivek Khera pointed out, the failover and set up support is not strong. 1. Setting up erserver on a system which is not already replicated is a major pain. (We didn't have this problem because we always launched with erserver support in place.) On a database of a few gig, you could easily have to take 24 hours downtime to get it set up. Some of that was just faulty implementation, and if you have a single not null unique column on every table, the problem is more to do with poorly conceived setup scripts. But finding this out turns out to depend on having available an expert in the system (and as far as I know, almost all the experts on it actually work here at Afilias. I did put together some notes on this topic for the BSD version of erserver. They're at <http://gborg.postgresql.org/pipermail/erserver-general/2003-October/000169.html> or <http://tinyurl.com/66b89>.) 2. Switchover is also a pain (we don't like to talk about failover: erserver is, like Slony-I, async, and failover more or less automatically risks stranding data on the dead master). There are some automation scripts which make it a little easier, but the basic problem is getting your slave into a condition where it can actually take over from the master. The slaves in erserver really don't know enough about the master to be in a position to do this. It _is_ possible: I've done it. It's not fun. (Failing back is even less fun, and essentially requires you to build a new slave. See A1, above. If you're going to use erserver as a disaster-avoidance system, you need two identical servers, so that any one can play the role of master.) 3. The engine was written in Java. Java is a nice language, but the JDK from Sun imposes a 3 G limit on the size of the JVM. If you get far enough behind, the VM just blows up, and then you have no hope of recovering. This is a _very_ serious limitation for high traffic sites. It also turned out to be completely fatal for certain users who wanted to replicate large objects: one object would be enough to make the system fail (for reasons that are too incredible to go into, the process actually has two copies of the data at one time during a part of processing. This is just a bug, though a dangerous one). 4. The logging code was deliberately obfuscatory. For some reason, the person who originally wrote the Java code (note _not_ the original code from PostgreSQL, Inc.) decided to wrap all the error handling in an outer layer which returned the line number of the error handler every time it threw an exception. This meant that, from looking at the logs, every case of a bug looks like it happened at the same place. You can imagine how much fun it was to fix things. Every person I've ever known who looked at the logging code suffered retinal damange -- it was that bad. (This is acutally fixed in the PostgreSQL-commercial version of the software, BTW.) B. Second, the fundamental errors. 1. The first big problem came from something we thought was an advantage: erserver replicates only the latest version of the row. This reduces the replication overhead considerably, and for a long time I was a great proponent of this approach. I was wrong, because the performance overhead that it imposes under certain perverse kinds of loads is well and truly awful. Even in the normal circumstance, the performance penalty is noticable; but it's not a problem if you have enough excess capacity. When that capacity is squeezed, you run into a lot of pain. In such cases, the replication application starts to slow down. Get under really heavy load, and you start to have to worry about the JVM limits outlined in A3. This can be dealt with, but you absolutely need to hold its hand when things are bad. Your DBAs have better things to do, I assume. The decision to send only the last row also cost some functionality, because you can't build an historical-database slave with erserver, unfortunately (if a row gets updated twice in the space of one transaction, you won't see two changes on the slave, but only the final state of the changed row). 2. Finally, there is the problem that the snapshot applications occasionally could get into the situation where applying rows to the slave would result either in bad data (bad) or errors on unique indexes (also bad). You had to choose between making your slave even more unlike your master or potentially getting called in the middle of the night to hand-fix the deadlock condition. (Some further discussion of this feature of the software is at <http://gborg.postgresql.org/pipermail/erserver-general/2003-October/000185.html> or <http://tinyurl.com/5erj7>.) It is really the items in B that finally conviced us that we had to give up on the erserver code and work on a fresh system. I think Jan will confirm that his Slony-I work drew some useful inspiration from the erserver code (in particular, the magic that Vadim performed). But ultimately, erserver taught us as much about what _else_ you needed before you got a real replication system. In particular, we felt that you needed more knowledge at all the nodes than erserver was able to provide. By contrast, you can usefully think of Slony-I as a cluster-communication system which happens to specialise in keeping the data the same on all subscribing nodes. This isn't to say that erserver is not undergoing development. I understand from Geoff Davidson of PostgreSQL, Inc, that they are continuing work on the product, with an eye to a multi-master distributed system and automatic failover. I think such developments would be welcomed by PostgreSQL users. A -- ---- Andrew Sullivan 204-4141 Yonge Street Afilias Canada Toronto, Ontario Canada <andrew@ca.afilias.info> M2P 2A8 +1 416 646 3304 x4110
pgsql-general by date: