Re: Sync Rep Design - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Sync Rep Design |
Date | |
Msg-id | 1293796137.1892.37313.camel@ebony Whole thread Raw |
In response to | Re: Sync Rep Design (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
Responses |
Re: Sync Rep Design
Re: Sync Rep Design |
List | pgsql-hackers |
On Fri, 2010-12-31 at 12:06 +0200, Heikki Linnakangas wrote: > Regarding the rest of the proposal, I would still prefer the UI > discussed here: > > http://archives.postgresql.org/message-id/4CAE030A.2060701@enterprisedb.com > > It ought to be the same amount of work to implement, and provides the > same feature set, but makes administration a bit easier by being able to > name the standbys. Also, I dislike the idea of having the standby > specify that it's a synchronous standby that the master has to wait for. > Behavior on the master should be configured on the master. Good point; I've added the people on the copy list from that post. This question is they key, so please respond after careful thought on my points below. There are ways to blend together the two approaches, discussed later, though first we need to look at the reasons behind my proposals. I see significant real-world issues with configuring replication using multiple named servers, as described in the link above: 1. Syncing to multiple standbys does not guarantee that the updates to the standbys are in any way coordinated. You can run a query on one standby and get one answer and at the exact same time run the same query on another standby and get a different answer (slightly ahead/behind). That also means that if the master crashes one of the servers can still be ahead or behind, even though you asked them to be the same. So you don't actually get what you think you're getting. 2. Availability of the cluster just went down. If *any* of the "important nodes" goes down, then everything just freezes. (I accept that you want that, and have provided that as an option). 3. Administrative complexity just jumped a huge amount. (a) If you add or remove servers to the config you need to respecify all the parameters, which need to be specific to the exact set of servers. There is no way to test that you have configured the parameters correctly without a testbed that exactly mirrors production with same names etc., or trying it in directly in production. So availability takes another potential hit because of user error. (b) After failover, the list of synchronous_standbys needs to be re-specified, yet what is the correct list of servers? The only way to make that config work is with complex middleware that automatically generates new config files. I don't think that is "the same amount of work to implement", its an order of magnitude harder overall. 4. As a result of the administrative complexity, testing the full range of function will take significantly longer and that is likely to have a direct impact on the robustness of PostgreSQL 9.1. 5. Requesting sync from more than one server performs poorly, since you must wait for additional servers. If there are sporadic or systemic network performance issues you will be badly hit by them. Monitoring that just got harder also. First-response-wins is more robust in the case of volatile resources since it implies responsiveness to changing conditions. 6. You just lost the ability to control performance on the master, with a userset. Performance is a huge issue with sync rep. If you can't control it, you'll simply turn it off. Having a feature that we daren't ever use because it performs poorly helps nobody. This is not a tick-box in our marketing checklist, I want it to be genuinely real-world usable. I understand very well that Oracle provides that level of configuration, though I think it is undesirable in 90% of real world use cases. I also understand how sexy that level of configuration *sounds*, but I genuinely believe trying to deliver that would be a mistake for PostgreSQL. IMHO we should take the same road here as we do in other things: simplicity encouraged, complexity allowed. So I don't have any objection to supporting that functionality in the future, but I believe it is not something we should be encouraging (ever), nor is it something we need for this release. I suppose we might regard the feature set I am proposing as being the same as making synchronous_standbys a USERSET parameter, and allowing just two options: "none" - allowing the user to specify async if they wish it "*" - allowing people to specify that syncing to *any* standby is acceptable We can blend the two approaches together, if we wish, by having two parameters (plus server naming) synchronous_replication = on | off (USERSET) synchronous_standbys = '...' If synchronous_standbys is not set and synchronous_replication = on then we sync to any standby. If synchronous_replication = off then we use async replication, whatever synchronous_standbys is set to. If synchronous_standbys is set, then we use sync rep to all listed servers. My proposal amounts to "lets add synchronous_standbys as a parameter in 9.2". If you really think that we need that functionality in this release, lets get the basic stuff added now and then fold in those ideas on top afterwards. If we do that, I will help. However, my only insistence is that we explain the above points very clearly in the docs to specifically dissuade people from using those features for typical cases. If you wondered why I ignored your post previously, its because I understood that Fujii's post of 15 Oct, one week later, effectively accepted my approach, albeit with two additional parameters. That is the UI that I had been following. http://archives.postgresql.org/pgsql-hackers/2010-10/msg01009.php -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
pgsql-hackers by date: