Re: max_standby_delay considered harmful - Mailing list pgsql-hackers
From | Greg Smith |
---|---|
Subject | Re: max_standby_delay considered harmful |
Date | |
Msg-id | 4BE5EDF7.6030305@2ndquadrant.com Whole thread Raw |
In response to | Re: max_standby_delay considered harmful (Bruce Momjian <bruce@momjian.us>) |
Responses |
Re: max_standby_delay considered harmful
|
List | pgsql-hackers |
Bruce Momjian wrote: > I think the big question is whether this issue is significant enough > that we should ignore our policy of no feature design during beta The idea that you're considering removal of a feature that we already have people using in beta and making plans around is a policy violation too you know. A freeze should include not cutting things just because their UI or implementation is not ideal yet. And you've been using the word "consensus" here when there is no such thing. At best there's barely a majority here among people who have stated an opinion, and consensus means something much stronger even than that; that means something closer to unanimity. I thought the summary of where the project is at Josh wrote at http://archives.postgresql.org/message-id/4BE31279.7040002@agliodbs.com was excellent, both from a technical and a process commentary standpoint. I'd be completely happy to follow that plan, and then we'd be at a consensus--with no one left arguing. It was very clear back in February that if SR didn't hit the feature set to make HS less troublesome out of the box, there would be some limitations here, and that set of concerns hasn't changed much since then. I thought the backup plan if we didn't get things like xid feedback was to keep the capability as written anyway, knowing that it's still much better than no control over cancellation timing available at all. Keep improving documentation around its issues, and continue to hack away at them in user space and in the field. Then we do better for 9.1. You seem bent on removing the feedback part of that cycle. The full statement of the ESR bit Josh was quoting is "Release early. Release often. And listen to your customers."[1] My customers include some of whom believed the PostgreSQL community process enough to contribute toward the HS development that's been completed and donated to the project. They have a pretty clear view on this I'm relaying when I talk about what I'd like to see happen. They are saying they cannot completely ignore their requirements for HA failover, but would be willing to loosen them just a bit (increasing failover time slightly) if it reduces the odds of query cancellation, and therefore improves how much load they can expect to push toward the standby. max_standby_delay is a currently available mechanism that does that. I'm not going to be their nanny and say "no, that's not perfectly predictable, you might get a query canceled sometimes when you don't expect it anyway". Instead, I was hoping to let them deploy 9.0 with this option available (but certainly not the default), informed of the potential risks, see how that goes. We can confirm whether the userland workarounds we believe will be effective here really are. If so, then we can solider forward directly incorporating them into the server code, knowing that works. If not, switch to one of the safer modes, see if there's something better to use altogether in 9.1, and perhaps this whole approach gets removed. That's healthy development progress either way. Upthread Bruce expressed some concern that this was going to live forever once deployed. There is no way I'm going to let this behavior continue to be available in 9.1 if field tests say the workarounds aren't good enough. That's going to torture all of us who do customer deployments of this technology every day if that turns out to be the case, and nobody is going to feel the heat from that worse than 2ndQuadrant. I did a round once of removing GUCs that didn't do what they were expected to in the field before, based on real-world tests showing regular misuse, and I'll do it again if this falls into that same category. We've already exposed this release to a whole stack of risk from work during its development cycle, risk that doesn't really drop much just from cutting this one bit. I'd at least like to get all the reward possible from that risk, which I expected to include feedback in this area. Circumventing the planned development process by dropping this now will ruin how I expected the project to feel out the right thing on the user side, and we'll all be left with little more insight for what to do in 9.1 than we have now. And I'm not looking forward to explaining to people why a feature they've been seeing and planning to deploy for months has now been cut only after what was supposed to be a freeze for beta. [1] http://catb.org/esr/writings/homesteading/cathedral-bazaar/ar01s04.html , and this particular bit is quite relevant here: "Linus was keeping his hacker/users constantly stimulated and rewarded—stimulated by the prospect of having an ego-satisfying piece of the action, rewarded by the sight of constant (even daily) improvement in their work. Linus was directly aiming to maximize the number of person-hours thrown at debugging and development, even at the possible cost of instability in the code and user-base burnout if any serious bug proved intractable." I continue to be disappointed at how contributing code to PostgreSQL is far more likely to come with a dose of argument and frustration rather than reward, and this discussion is a perfect example of such. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
pgsql-hackers by date: