Re: max_standby_delay considered harmful - Mailing list pgsql-hackers

From Greg Smith
Subject Re: max_standby_delay considered harmful
Date
Msg-id 4BE5EDF7.6030305@2ndquadrant.com
Whole thread Raw
In response to Re: max_standby_delay considered harmful  (Bruce Momjian <bruce@momjian.us>)
Responses Re: max_standby_delay considered harmful
List pgsql-hackers
Bruce Momjian wrote:
> I think the big question is whether this issue is significant enough
> that we should ignore our policy of no feature design during beta

The idea that you're considering removal of a feature that we already 
have people using in beta and making plans around is a policy violation 
too you know. A freeze should include not cutting things just because 
their UI or implementation is not ideal yet. And you've been using the 
word "consensus" here when there is no such thing. At best there's 
barely a majority here among people who have stated an opinion, and 
consensus means something much stronger even than that; that means 
something closer to unanimity. I thought the summary of where the 
project is at Josh wrote at 
http://archives.postgresql.org/message-id/4BE31279.7040002@agliodbs.com 
was excellent, both from a technical and a process commentary 
standpoint. I'd be completely happy to follow that plan, and then we'd 
be at a consensus--with no one left arguing.

It was very clear back in February that if SR didn't hit the feature set 
to make HS less troublesome out of the box, there would be some 
limitations here, and that set of concerns hasn't changed much since 
then. I thought the backup plan if we didn't get things like xid 
feedback was to keep the capability as written anyway, knowing that it's 
still much better than no control over cancellation timing available at 
all. Keep improving documentation around its issues, and continue to 
hack away at them in user space and in the field. Then we do better for 
9.1. You seem bent on removing the feedback part of that cycle.

The full statement of the ESR bit Josh was quoting is "Release early. 
Release often. And listen to your customers."[1] My customers include 
some of whom believed the PostgreSQL community process enough to 
contribute toward the HS development that's been completed and donated 
to the project. They have a pretty clear view on this I'm relaying when 
I talk about what I'd like to see happen. They are saying they cannot 
completely ignore their requirements for HA failover, but would be 
willing to loosen them just a bit (increasing failover time slightly) if 
it reduces the odds of query cancellation, and therefore improves how 
much load they can expect to push toward the standby. max_standby_delay 
is a currently available mechanism that does that. I'm not going to be 
their nanny and say "no, that's not perfectly predictable, you might get 
a query canceled sometimes when you don't expect it anyway".

Instead, I was hoping to let them deploy 9.0 with this option available 
(but certainly not the default), informed of the potential risks, see 
how that goes. We can confirm whether the userland workarounds we 
believe will be effective here really are. If so, then we can solider 
forward directly incorporating them into the server code, knowing that 
works. If not, switch to one of the safer modes, see if there's 
something better to use altogether in 9.1, and perhaps this whole 
approach gets removed. That's healthy development progress either way.

Upthread Bruce expressed some concern that this was going to live 
forever once deployed. There is no way I'm going to let this behavior 
continue to be available in 9.1 if field tests say the workarounds 
aren't good enough. That's going to torture all of us who do customer 
deployments of this technology every day if that turns out to be the 
case, and nobody is going to feel the heat from that worse than 
2ndQuadrant. I did a round once of removing GUCs that didn't do what 
they were expected to in the field before, based on real-world tests 
showing regular misuse, and I'll do it again if this falls into that 
same category. We've already exposed this release to a whole stack of 
risk from work during its development cycle, risk that doesn't really 
drop much just from cutting this one bit. I'd at least like to get all 
the reward possible from that risk, which I expected to include feedback 
in this area.

Circumventing the planned development process by dropping this now will 
ruin how I expected the project to feel out the right thing on the user 
side, and we'll all be left with little more insight for what to do in 
9.1 than we have now. And I'm not looking forward to explaining to 
people why a feature they've been seeing and planning to deploy for 
months has now been cut only after what was supposed to be a freeze for 
beta.

[1] 
http://catb.org/esr/writings/homesteading/cathedral-bazaar/ar01s04.html 
, and this particular bit is quite relevant here: "Linus was keeping his 
hacker/users constantly stimulated and rewarded—stimulated by the 
prospect of having an ego-satisfying piece of the action, rewarded by 
the sight of constant (even daily) improvement in their work. Linus was 
directly aiming to maximize the number of person-hours thrown at 
debugging and development, even at the possible cost of instability in 
the code and user-base burnout if any serious bug proved intractable." I 
continue to be disappointed at how contributing code to PostgreSQL is 
far more likely to come with a dose of argument and frustration rather 
than reward, and this discussion is a perfect example of such.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: max_standby_delay considered harmful
Next
From: Bruce Momjian
Date:
Subject: Re: max_standby_delay considered harmful