Re: Restore-reliability mode - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Restore-reliability mode
Date
Msg-id CANP8+jJgKpBac77b77xLWTxxBeq9hG3a0uY-JUv6F4k=2=Nsgw@mail.gmail.com
Whole thread Raw
In response to Restore-reliability mode  (Noah Misch <noah@leadboat.com>)
Responses Re: Restore-reliability mode  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
On 3 June 2015 at 14:50, Noah Misch <noah@leadboat.com> wrote:
Subject changed from "Re: [CORE] postpone next week's release".

On Sat, May 30, 2015 at 10:48:45PM -0400, Bruce Momjian wrote:
> Well, I think we stop what we are doing, focus on restructuring,
> testing, and reviewing areas that historically have had problems, and
> when we are done, we can look to go to 9.5 beta.  What we don't want to
> do is to push out more code and get back into a
> wack-a-bug-as-they-are-found mode, which obviously did not serve us well
> for multi-xact, and which is what releasing a beta will do, and of
> course, more commit-fests, and more features.
>
> If we have to totally stop feature development until we are all happy
> with the code we have, so be it.  If people feel they have to get into
> cleanup mode or they will never get to add a feature to Postgres again,
> so be it.  If people say, heh, I am not going to do anything and just
> come back when cleanup is done (by someone else), then we will end up
> with a smaller but more dedicated development team, and I am fine with
> that too.  I am suggesting that until everyone is happy with the code we
> have, we should not move forward.

I like the essence of this proposal.  Two suggestions.  We can't achieve or
even robustly measure "everyone is happy with the code," so let's pick
concrete exit criteria.  Given criteria framed like "Files A,B,C and patches
X,Y,Z have a sign-off from a committer other than their original committer."
anyone can monitor progress and find specific ways to contribute.

I don't like the proposal, nor do I like the follow on comments made.

This whole idea of "feature development" vs reliability is bogus. It implies people that work on features don't care about reliability. Given the fact that many of the features are actually about increasing database reliability in the event of crashes and corruptions it just makes no sense.

How will we participate in cleanup efforts? How do we know when something has been "cleaned up", how will we measure our success or failure? I think we should be clear that wasting N months on cleanup can *fail* to achieve a useful objective. Without a clear plan it almost certainly will do so. The flip side is that wasting N months will cause great amusement and dancing amongst those people who wish to pull ahead of our open source project and we should take care not to hand them a victory from an overreaction.

Lastly, the idea that we allow developers to drift away and we're OK with that is just plain mad. I've spent a decade trying to grow the pool of skilled developers who can assist the project. Acting against that, in deed or just word, is highly counter productive for the project.

Let's just take a breath and think about this.

It is normal for us to spend a month or so consolidating our work. It is also normal for people that see major problems to call them out, effectively using the "Stop The Line" technique.   https://leanbuilds.wordpress.com/tag/stop-the-line/

So lets do our normal things, not do a "total stop" for an indefinite period. If someone has specific things that in their opinion need to be addressed, list them and we can talk about doing them, together. I thought that was what the Open Items list was for. Let's use it.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

Previous
From: "Shulgin, Oleksandr"
Date:
Subject: Handle PGRES_COPY_BOTH in psql for logical replication?
Next
From: Andres Freund
Date:
Subject: Re: Handle PGRES_COPY_BOTH in psql for logical replication?