Thread: postpone next week's release
Hi, I think we should postpone next week's release. I have been hard at work on the multixact-related bugs that were reported in 9.4.2 and 9.3.7, and the subsequent bugs found by code-reading, but getting them all fixed by Monday doesn't seem realistic. Such fixes should have careful review, and not be dashed into the tree under time pressure. We could do the release anyway to relieve the pain caused by the fsync-pgdata hard-failure problem, but it seems to me that if we do that, we're just going to end up having to do yet another release almost right away. I think it would be better to wait and do one release that fixes both sets of issues. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, May 29, 2015 at 02:02:43PM -0400, Robert Haas wrote: > Hi, > > I think we should postpone next week's release. I have been hard at > work on the multixact-related bugs that were reported in 9.4.2 and > 9.3.7, and the subsequent bugs found by code-reading, but getting them > all fixed by Monday doesn't seem realistic. Such fixes should have > careful review, and not be dashed into the tree under time pressure. > > We could do the release anyway to relieve the pain caused by the > fsync-pgdata hard-failure problem, but it seems to me that if we do > that, we're just going to end up having to do yet another release > almost right away. I think it would be better to wait and do one > release that fixes both sets of issues. It does seem wise to make sure we have all these items fixed. We have PR'ed the recovery failure issue so I think we are good at this point. I see having to put out another multi-xact-only fix release the week after as being a bigger negative. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
* Robert Haas (robertmhaas@gmail.com) wrote: > I think we should postpone next week's release. I have been hard at > work on the multixact-related bugs that were reported in 9.4.2 and > 9.3.7, and the subsequent bugs found by code-reading, but getting them > all fixed by Monday doesn't seem realistic. Such fixes should have > careful review, and not be dashed into the tree under time pressure. > > We could do the release anyway to relieve the pain caused by the > fsync-pgdata hard-failure problem, but it seems to me that if we do > that, we're just going to end up having to do yet another release > almost right away. I think it would be better to wait and do one > release that fixes both sets of issues. Agreed. I just caution that we appreciate PGCon coming up and that we do our best to avoid running into a case where we have to push it further due to everyone being at the conference. Thanks! Stephen
On Fri, May 29, 2015 at 02:54:31PM -0400, Stephen Frost wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: > > I think we should postpone next week's release. I have been hard at > > work on the multixact-related bugs that were reported in 9.4.2 and > > 9.3.7, and the subsequent bugs found by code-reading, but getting them > > all fixed by Monday doesn't seem realistic. Such fixes should have > > careful review, and not be dashed into the tree under time pressure. > > > > We could do the release anyway to relieve the pain caused by the > > fsync-pgdata hard-failure problem, but it seems to me that if we do > > that, we're just going to end up having to do yet another release > > almost right away. I think it would be better to wait and do one > > release that fixes both sets of issues. > > Agreed. > > I just caution that we appreciate PGCon coming up and that we do our > best to avoid running into a case where we have to push it further due > to everyone being at the conference. This brings up the issue of when we want to do 9.5 beta. Ideas? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Fri, May 29, 2015 at 8:02 PM, Robert Haas <robertmhaas@gmail.com> wrote:
Hi,
I think we should postpone next week's release. I have been hard at
work on the multixact-related bugs that were reported in 9.4.2 and
9.3.7, and the subsequent bugs found by code-reading, but getting them
all fixed by Monday doesn't seem realistic. Such fixes should have
careful review, and not be dashed into the tree under time pressure.
We could do the release anyway to relieve the pain caused by the
fsync-pgdata hard-failure problem, but it seems to me that if we do
that, we're just going to end up having to do yet another release
almost right away. I think it would be better to wait and do one
release that fixes both sets of issues.
Thoughts?
I'm a bit split on this.
We *definitely* don't want to release the multixact fix without it being carefully reviewed, that's the part I'm not split about :) And I fully appreciate we can't have that done by monday.
However, the file-permission thing seems to hit quite a few people (have we ever had this many bug reports after a minor release), which means wed really want to get that out quickly.
Do you have any feeling of how likely people are to actually hit the multixact one? I've followed some of that impressive debugging you guys did, and I know it's a pretty critical bug if you hit it, but how wide-spread will it be?
I guess one option we could do is encourage packagers to push updated packages (-2 versions) basically. But if we do that, perhaps we might as well release anyway?
AIUI, the permission thing won't actually be very likely to affect Windows users. And Windows packages are the ones that take by far the most work to make. Perhaps we should consider skipping making packages of that version on Windows, and then plan to push yet another minor one or two weeks later, that goes out on all platforms?
On Fri, May 29, 2015 at 8:54 PM, Stephen Frost <sfrost@snowman.net> wrote:
-- * Robert Haas (robertmhaas@gmail.com) wrote:
> I think we should postpone next week's release. I have been hard at
> work on the multixact-related bugs that were reported in 9.4.2 and
> 9.3.7, and the subsequent bugs found by code-reading, but getting them
> all fixed by Monday doesn't seem realistic. Such fixes should have
> careful review, and not be dashed into the tree under time pressure.
>
> We could do the release anyway to relieve the pain caused by the
> fsync-pgdata hard-failure problem, but it seems to me that if we do
> that, we're just going to end up having to do yet another release
> almost right away. I think it would be better to wait and do one
> release that fixes both sets of issues.
Agreed.
I just caution that we appreciate PGCon coming up and that we do our
best to avoid running into a case where we have to push it further due
to everyone being at the conference.
If we plan it, we certainly *can* make a release during pgcon. If that's what the reasonable timing comes down to, I think getting these fixes out definitely has to be considered more important than the conference, so a few of us will just have to take a break...
On Fri, May 29, 2015 at 3:09 PM, Magnus Hagander <magnus@hagander.net> wrote: > Do you have any feeling of how likely people are to actually hit the > multixact one? I've followed some of that impressive debugging you guys did, > and I know it's a pretty critical bug if you hit it, but how wide-spread > will it be? That precise problem has been reported a few times, but it may not be widespread. I don't know. My bigger concern is that, at present, taking a base backup is broken. I haven't figured out the exact reproduction scenario, but I think it's something like this: - begin base backup - checkpoint happens, truncating pg_multixact - at this point pg_multixact gets copied - end base backup I think what will happen on replay is that replaying the checkpoint, it will try to reference pg_multixact files that don't exist any more and die with a fatal error. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Magnus Hagander (magnus@hagander.net) wrote: > On Fri, May 29, 2015 at 8:54 PM, Stephen Frost <sfrost@snowman.net> wrote: > > > * Robert Haas (robertmhaas@gmail.com) wrote: > > > I think we should postpone next week's release. I have been hard at > > > work on the multixact-related bugs that were reported in 9.4.2 and > > > 9.3.7, and the subsequent bugs found by code-reading, but getting them > > > all fixed by Monday doesn't seem realistic. Such fixes should have > > > careful review, and not be dashed into the tree under time pressure. > > > > > > We could do the release anyway to relieve the pain caused by the > > > fsync-pgdata hard-failure problem, but it seems to me that if we do > > > that, we're just going to end up having to do yet another release > > > almost right away. I think it would be better to wait and do one > > > release that fixes both sets of issues. > > > > Agreed. > > > > I just caution that we appreciate PGCon coming up and that we do our > > best to avoid running into a case where we have to push it further due > > to everyone being at the conference. > > If we plan it, we certainly *can* make a release during pgcon. If that's > what the reasonable timing comes down to, I think getting these fixes out > definitely has to be considered more important than the conference, so a > few of us will just have to take a break... I don't disagree with you about any of that, just wanted to make mention of the timing. Thanks! Stephen
On 05/29/2015 12:18 PM, Robert Haas wrote: > > On Fri, May 29, 2015 at 3:09 PM, Magnus Hagander <magnus@hagander.net> wrote: >> Do you have any feeling of how likely people are to actually hit the >> multixact one? I've followed some of that impressive debugging you guys did, >> and I know it's a pretty critical bug if you hit it, but how wide-spread >> will it be? > > That precise problem has been reported a few times, but it may not be > widespread. I don't know. My bigger concern is that, at present, > taking a base backup is broken. This I think is the bigger issue. They both are horrible but basebackup being broken is rather... egregious. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. Announcing "I'm offended" is basically telling the world you can't control your own emotions, so everyone else should do it for you.
Magnus Hagander <magnus@hagander.net> writes: > On Fri, May 29, 2015 at 8:54 PM, Stephen Frost <sfrost@snowman.net> wrote: >> I just caution that we appreciate PGCon coming up and that we do our >> best to avoid running into a case where we have to push it further due >> to everyone being at the conference. > If we plan it, we certainly *can* make a release during pgcon. If that's > what the reasonable timing comes down to, I think getting these fixes out > definitely has to be considered more important than the conference, so a > few of us will just have to take a break... I think there's no way that we wait more than one additional week to push the fsync fix. So the problem is not with scheduling the update releases, it's with whether we can also fit in a 9.5 beta release before PGCon. (I can't see doing a beta *during* PGCon week. I for one am going to be on an airplane at the time I'd normally have to be Doing Release Stuff.) I know Josh doesn't like to do beta1 releases concurrently with back branches because it confuses the PR messaging. But we could make an exception perhaps; or do all those releases the same week but announce the beta the day after the bugfix releases. Or we just let the beta slide till after PGCon, but then I think we're missing some excitement factor. regards, tom lane
On Fri, May 29, 2015 at 9:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
> On Fri, May 29, 2015 at 8:54 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> I just caution that we appreciate PGCon coming up and that we do our
>> best to avoid running into a case where we have to push it further due
>> to everyone being at the conference.
> If we plan it, we certainly *can* make a release during pgcon. If that's
> what the reasonable timing comes down to, I think getting these fixes out
> definitely has to be considered more important than the conference, so a
> few of us will just have to take a break...
I think there's no way that we wait more than one additional week to push
the fsync fix. So the problem is not with scheduling the update releases,
it's with whether we can also fit in a 9.5 beta release before PGCon.
I think 9.5 beta has to stand back. The question is what we do with the potentially two minor releases. Then we can slot in the beta whenever.
If we do the minor as currently planned, can we do another one the week after to deal with the multixact issues? (scheduling wise we're going to have to do one the week after *regardless*, the question is if we can make two different ones, or if we need to fold them into one)
(I can't see doing a beta *during* PGCon week. I for one am going to be
on an airplane at the time I'd normally have to be Doing Release Stuff.)
Agreed. We can push a *minor* during pgcon, but not beta.
I know Josh doesn't like to do beta1 releases concurrently with back
branches because it confuses the PR messaging. But we could make an
exception perhaps; or do all those releases the same week but announce
the beta the day after the bugfix releases.
I can't comment on the PR parts, I'll leave that to Josh.
Or we just let the beta slide till after PGCon, but then I think we're
missing some excitement factor.
Well, most of the people going to pgcon know it already. And most of the excitement affects people who are not at pgcon (simply based on that most of our users are not at pgcon). If doing it the week after pgcon is what ends up making sense once weve figured out what to do with the minors, then so be it, IMNSHO.
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > (I can't see doing a beta *during* PGCon week. I for one am going to be > on an airplane at the time I'd normally have to be Doing Release Stuff.) [...] > Or we just let the beta slide till after PGCon, but then I think we're > missing some excitement factor. Personally, I'd be all for a "watch Tom do the 9.5 beta release!" Unconference slot... :) (mostly kidding, but I'm 100% sure it'd draw a huge crowd..) Thanks! Stephen
Magnus Hagander <magnus@hagander.net> writes: > On Fri, May 29, 2015 at 9:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think there's no way that we wait more than one additional week to push >> the fsync fix. So the problem is not with scheduling the update releases, >> it's with whether we can also fit in a 9.5 beta release before PGCon. > I think 9.5 beta has to stand back. The question is what we do with the > potentially two minor releases. Then we can slot in the beta whenever. > If we do the minor as currently planned, can we do another one the week > after to deal with the multixact issues? (scheduling wise we're going to > have to do one the week after *regardless*, the question is if we can make > two different ones, or if we need to fold them into one) I suppose we could, but it doubles the amount of release gruntwork involved, and it doesn't exactly make us look good to our users either. I believe Christoph indicated that he was going to cherry-pick the fsync patch and push out an intermediate Debian package with that fix, so at least for that community there is not an urgent reason to get out a set of releases with only the fsync fixes and not the multixact fixes. I'm not clear though on how many of the other reports we heard came from Debian users. (Some of them did, but maybe not all.) regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Magnus Hagander <magnus@hagander.net> writes: > > On Fri, May 29, 2015 at 9:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> I think there's no way that we wait more than one additional week to push > >> the fsync fix. So the problem is not with scheduling the update releases, > >> it's with whether we can also fit in a 9.5 beta release before PGCon. > > > I think 9.5 beta has to stand back. The question is what we do with the > > potentially two minor releases. Then we can slot in the beta whenever. > > > If we do the minor as currently planned, can we do another one the week > > after to deal with the multixact issues? (scheduling wise we're going to > > have to do one the week after *regardless*, the question is if we can make > > two different ones, or if we need to fold them into one) > > I suppose we could, but it doubles the amount of release gruntwork > involved, and it doesn't exactly make us look good to our users either. Agreed. Makes it look like we can't manage to figure out our bugs and put fixes for them together in sensible releases.. Thanks! Stephen
On Fri, May 29, 2015 at 9:46 PM, Stephen Frost <sfrost@snowman.net> wrote:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Magnus Hagander <magnus@hagander.net> writes:
> > On Fri, May 29, 2015 at 9:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> I think there's no way that we wait more than one additional week to push
> >> the fsync fix. So the problem is not with scheduling the update releases,
> >> it's with whether we can also fit in a 9.5 beta release before PGCon.
>
> > I think 9.5 beta has to stand back. The question is what we do with the
> > potentially two minor releases. Then we can slot in the beta whenever.
>
> > If we do the minor as currently planned, can we do another one the week
> > after to deal with the multixact issues? (scheduling wise we're going to
> > have to do one the week after *regardless*, the question is if we can make
> > two different ones, or if we need to fold them into one)
>
> I suppose we could, but it doubles the amount of release gruntwork
> involved, and it doesn't exactly make us look good to our users either.
Agreed. Makes it look like we can't manage to figure out our bugs and
put fixes for them together in sensible releases..
The flipside of that is that we have a bug fix that's preventing peoples databases from starting, and we're the intentionally delaying the shipment of it. Though i guess a mitigating fact there is that it is very easy to manually recover from that. But it's painful if your db server restarts awhen you're not around...
* Magnus Hagander (magnus@hagander.net) wrote: > On Fri, May 29, 2015 at 9:46 PM, Stephen Frost <sfrost@snowman.net> wrote: > > > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > > > Magnus Hagander <magnus@hagander.net> writes: > > > > On Fri, May 29, 2015 at 9:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > >> I think there's no way that we wait more than one additional week to > > push > > > >> the fsync fix. So the problem is not with scheduling the update > > releases, > > > >> it's with whether we can also fit in a 9.5 beta release before PGCon. > > > > > > > I think 9.5 beta has to stand back. The question is what we do with the > > > > potentially two minor releases. Then we can slot in the beta whenever. > > > > > > > If we do the minor as currently planned, can we do another one the week > > > > after to deal with the multixact issues? (scheduling wise we're going > > to > > > > have to do one the week after *regardless*, the question is if we can > > make > > > > two different ones, or if we need to fold them into one) > > > > > > I suppose we could, but it doubles the amount of release gruntwork > > > involved, and it doesn't exactly make us look good to our users either. > > > > Agreed. Makes it look like we can't manage to figure out our bugs and > > put fixes for them together in sensible releases.. > > > > The flipside of that is that we have a bug fix that's preventing peoples > databases from starting, and we're the intentionally delaying the shipment > of it. Though i guess a mitigating fact there is that it is very easy to > manually recover from that. But it's painful if your db server restarts > awhen you're not around... And we have *another* fix for a *data corruption* bug which is coming in the following *week*. Yes, I think delaying a week to get both in is better than putting out a fix for one bug when we *know* there's a data corruption bug sitting in that code, and we're putting out a fix for it the following week. If we were talking about a month-long delay, that'd be one thing, but that isn't the impression I've got about what we're talking about. Thanks! Stephen
On Fri, May 29, 2015 at 03:32:57PM -0400, Tom Lane wrote: > I know Josh doesn't like to do beta1 releases concurrently with back > branches because it confuses the PR messaging. But we could make an > exception perhaps; or do all those releases the same week but announce > the beta the day after the bugfix releases. > > Or we just let the beta slide till after PGCon, but then I think we're > missing some excitement factor. I am unclear if we are anywhere near ready for beta1 even in June. Are we? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
* Bruce Momjian (bruce@momjian.us) wrote: > On Fri, May 29, 2015 at 03:32:57PM -0400, Tom Lane wrote: > > I know Josh doesn't like to do beta1 releases concurrently with back > > branches because it confuses the PR messaging. But we could make an > > exception perhaps; or do all those releases the same week but announce > > the beta the day after the bugfix releases. > > > > Or we just let the beta slide till after PGCon, but then I think we're > > missing some excitement factor. > > I am unclear if we are anywhere near ready for beta1 even in June. Are > we? I'm all about having that discussion... but can we do it on another thread or at least wait til we've decided about the back-branch releases? They are clearly the more important issue to consider. Thanks! Stephen
Stephen Frost <sfrost@snowman.net> writes: > * Bruce Momjian (bruce@momjian.us) wrote: >> I am unclear if we are anywhere near ready for beta1 even in June. Are >> we? > I'm all about having that discussion... but can we do it on another > thread or at least wait til we've decided about the back-branch > releases? They are clearly the more important issue to consider. It's the same discussion though, ie what releases are we expecting to get out in the next couple of weeks. It's possible that we ought to give up on a pre-conference beta. Certainly a whole lot of time that I'd hoped would go into reviewing 9.5 feature commits has instead gone into back-branch bug chasing this week. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > It's possible that we ought to give up on a pre-conference beta. > Certainly a whole lot of time that I'd hoped would go into reviewing > 9.5 feature commits has instead gone into back-branch bug chasing this > week. I guess that's what I'm getting at. We need to take care of the back-branches and that means pushing beta back. I fully expect a good discussion on when to release beta when we get closer on that, but we're not going to be close while we have outstanding big back-branch bugs. Thanks! Stephen
On Fri, May 29, 2015 at 04:01:00PM -0400, Tom Lane wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Bruce Momjian (bruce@momjian.us) wrote: > >> I am unclear if we are anywhere near ready for beta1 even in June. Are > >> we? > > > I'm all about having that discussion... but can we do it on another > > thread or at least wait til we've decided about the back-branch > > releases? They are clearly the more important issue to consider. > > It's the same discussion though, ie what releases are we expecting to > get out in the next couple of weeks. Agreed. If we want to put out beta1 before PGCon, I need to start on the release notes on Monday. > It's possible that we ought to give up on a pre-conference beta. > Certainly a whole lot of time that I'd hoped would go into reviewing > 9.5 feature commits has instead gone into back-branch bug chasing this > week. Based on what has transpired in the past two weeks, I am thinking we need to move _slower_, not faster. I am concerned we have focused so much on new features that we have taken our eye off of reliability. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On 05/29/2015 01:03 PM, Stephen Frost wrote: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> It's possible that we ought to give up on a pre-conference beta. >> Certainly a whole lot of time that I'd hoped would go into reviewing >> 9.5 feature commits has instead gone into back-branch bug chasing this >> week. > > I guess that's what I'm getting at. We need to take care of the > back-branches and that means pushing beta back. +1 JD -- The most kicking donkey PostgreSQL Infrastructure company in existence. The oldest, the most experienced, the consulting company to the stars. Command Prompt, Inc. http://www.commandprompt.com/ +1 -503-667-4564 - 24x7 - 365 - Proactive and Managed Professional Services!
On Fri, May 29, 2015 at 4:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > It's possible that we ought to give up on a pre-conference beta. > Certainly a whole lot of time that I'd hoped would go into reviewing > 9.5 feature commits has instead gone into back-branch bug chasing this > week. I'm personally kind of astonished that we're even thinking about beta so soon. I mean, we at least need to go through the stuff listed here, I think: https://wiki.postgresql.org/wiki/PostgreSQL_9.5_Open_Items The bigger issue is: what's NOT on that list that should be? I think we need to devote some cycles to figuring that out, and I sure haven't had any this week. In any case, I think the negative PR that we're going to get from not getting this multixact stuff taken care of is going to far outweigh any positive PR from getting 9.5beta1 out a little sooner, especially if 9.5beta1 is bug-ridden because we gave it no time to settle. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I'm personally kind of astonished that we're even thinking about beta > so soon. I mean, we at least need to go through the stuff listed > here, I think: > https://wiki.postgresql.org/wiki/PostgreSQL_9.5_Open_Items Well, maybe we ought to call it an alpha not a beta, but I think we ought to put out some kind of release that we can encourage people to test. What you are suggesting is that we serialize resolution of the known issues with discovery of new issues, and that's not an efficient use of time. Especially seeing that we're approaching the summer season where we won't get much input at all. regards, tom lane
On 2015-05-29 16:37:00 -0400, Tom Lane wrote: > Well, maybe we ought to call it an alpha not a beta, but I think we ought > to put out some kind of release that we can encourage people to test. I also do think it's important that we put out a beta (or alpha) relatively soon. Both because we actually need input to find out what works and what doesn't and also because it pushes us to tie up loose ends. A beta with open items isn't that bad a thing? There's many bigger projects doing 4-8 betas releases before a major one; and most of them have open items at the indvidual beta's release times. I think we should define/document it so that there's no hard goal of being compatible for beta releases and that the compatibility goal starts with the first release candidate, and not the betas.
On Fri, May 29, 2015 at 11:04:59PM +0200, Andres Freund wrote: > On 2015-05-29 16:37:00 -0400, Tom Lane wrote: > > Well, maybe we ought to call it an alpha not a beta, but I think we ought > > to put out some kind of release that we can encourage people to test. > > I also do think it's important that we put out a beta (or alpha) > relatively soon. Both because we actually need input to find out what > works and what doesn't and also because it pushes us to tie up loose > ends. > > A beta with open items isn't that bad a thing? There's many bigger > projects doing 4-8 betas releases before a major one; and most of them > have open items at the indvidual beta's release times. > > I think we should define/document it so that there's no hard goal of > being compatible for beta releases and that the compatibility goal > starts with the first release candidate, and not the betas. Do we need release notes for an alpha? Once I do the release notes, it is possible to miss subtle changes in the code that aren't mentioned in commit messages. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On May 29, 2015 2:12:24 PM PDT, Bruce Momjian <bruce@momjian.us> wrote: >On Fri, May 29, 2015 at 11:04:59PM +0200, Andres Freund wrote: >> On 2015-05-29 16:37:00 -0400, Tom Lane wrote: >> > Well, maybe we ought to call it an alpha not a beta, but I think we >ought >> > to put out some kind of release that we can encourage people to >test. >> >> I also do think it's important that we put out a beta (or alpha) >> relatively soon. Both because we actually need input to find out what >> works and what doesn't and also because it pushes us to tie up loose >> ends. >> >> A beta with open items isn't that bad a thing? There's many bigger >> projects doing 4-8 betas releases before a major one; and most of >them >> have open items at the indvidual beta's release times. >> >> I think we should define/document it so that there's no hard goal of >> being compatible for beta releases and that the compatibility goal >> starts with the first release candidate, and not the betas. > >Do we need release notes for an alpha? Once I do the release notes, it >is possible to miss subtle changes in the code that aren't mentioned in >commit messages. Yes I think so. Otherwise it's pretty useless for people not following closely. I see little point in explicitly delayingrelease note work any further. Andres --- Please excuse brevity and formatting - I am writing this on my mobile phone.
Bruce Momjian <bruce@momjian.us> writes: > Do we need release notes for an alpha? Once I do the release notes, it > is possible to miss subtle changes in the code that aren't mentioned in > commit messages. If the commit message isn't clear about something, you'd likely miss the issue anyway, no? Anyway, once the release notes are in the tree, we could expect that anyone committing a user-visible semantics change should update the release notes themselves. regards, tom lane
On Fri, May 29, 2015 at 4:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I'm personally kind of astonished that we're even thinking about beta >> so soon. I mean, we at least need to go through the stuff listed >> here, I think: >> https://wiki.postgresql.org/wiki/PostgreSQL_9.5_Open_Items > > Well, maybe we ought to call it an alpha not a beta, but I think we ought > to put out some kind of release that we can encourage people to test. > What you are suggesting is that we serialize resolution of the known > issues with discovery of new issues, and that's not an efficient use of > time. Especially seeing that we're approaching the summer season where > we won't get much input at all. Well, I think we ought to take at least a few weeks to try to do a bit of code review and clean up what we can from the open items list. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, May 29, 2015 at 05:37:13PM -0400, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Do we need release notes for an alpha? Once I do the release notes, it > > is possible to miss subtle changes in the code that aren't mentioned in > > commit messages. > > If the commit message isn't clear about something, you'd likely miss the > issue anyway, no? Anyway, once the release notes are in the tree, we I often do research in the git tree to get details on the feature beyond just looking at the commit or the patch. > could expect that anyone committing a user-visible semantics change should > update the release notes themselves. Yes, that would be nice. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On 2015-05-29 18:02:36 -0400, Robert Haas wrote: > Well, I think we ought to take at least a few weeks to try to do a bit > of code review and clean up what we can from the open items list. Why? A large portion of the input required to go from beta towards a release is from actual users. To see when things break, what confuses them and such. I don't see why that requires that there are no minor entries in the open items list - and that's what currently is on it. Neither does it seem to be a problem to do code review concurrently to user beta testing. We obviously can't start a beta if things crash left and right, but I don't think that's the situation right now?
* Andres Freund (andres@anarazel.de) wrote: > On 2015-05-29 18:02:36 -0400, Robert Haas wrote: > > Well, I think we ought to take at least a few weeks to try to do a bit > > of code review and clean up what we can from the open items list. > > Why? A large portion of the input required to go from beta towards a > release is from actual users. To see when things break, what confuses > them and such. > > I don't see why that requires that there are no minor entries in the > open items list - and that's what currently is on it. Neither does it > seem to be a problem to do code review concurrently to user beta > testing. We obviously can't start a beta if things crash left and > right, but I don't think that's the situation right now? Agreed. Thanks! Stephen
On Fri, May 29, 2015 at 6:33 PM, Andres Freund <andres@anarazel.de> wrote: > On 2015-05-29 18:02:36 -0400, Robert Haas wrote: >> Well, I think we ought to take at least a few weeks to try to do a bit >> of code review and clean up what we can from the open items list. > > Why? A large portion of the input required to go from beta towards a > release is from actual users. To see when things break, what confuses > them and such. I have two concerns: 1. I'm concerned that once we release beta, any idea about reverting a feature or fixing something that is broken will get harder, because people will say "well, we can't do that after we've released a beta". I confess to particularly wanting a solution to the item listed as "custom-join has no way to construct Plan nodes of child Path nodes", the history of which I'll avoid recapitulating until I'm sure I can do it while maintaining my blood pressure at safe levels. 2. Also, if we're going to make significant multixact-related changes to 9.5 to try to improve reliability, as you proposed on the other thread, then it would be nice to do that before beta, so that it gets tested. Of course, someone is bound to point out that we could make those changes in time for beta2, and people could test that. But in practice I think that'll just mean that stuff is only out there for let's say 2 months before we put it in a major release, which ain't much. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, May 29, 2015 at 6:33 PM, Andres Freund <andres@anarazel.de> wrote: >> Why? A large portion of the input required to go from beta towards a >> release is from actual users. To see when things break, what confuses >> them and such. > I have two concerns: > 1. I'm concerned that once we release beta, any idea about reverting a > feature or fixing something that is broken will get harder, because > people will say "well, we can't do that after we've released a beta". > I confess to particularly wanting a solution to the item listed as > "custom-join has no way to construct Plan nodes of child Path nodes", > the history of which I'll avoid recapitulating until I'm sure I can do > it while maintaining my blood pressure at safe levels. > 2. Also, if we're going to make significant multixact-related changes > to 9.5 to try to improve reliability, as you proposed on the other > thread, then it would be nice to do that before beta, so that it gets > tested. Of course, someone is bound to point out that we could make > those changes in time for beta2, and people could test that. But in > practice I think that'll just mean that stuff is only out there for > let's say 2 months before we put it in a major release, which ain't > much. I think your position is completely nuts. The GROUPING SETS code is desperately in need of testing. The custom-plan code is desperately in need of fixing and testing. The multixact code is desperately in need of testing. The open-items list has several other problems besides those. All of those problems are independent. If we insist on tackling them serially rather than in parallel, 9.5 might not come out till 2017. I agree that we are not in a position to promise features won't change. So let's call it an alpha not a beta --- but for heaven's sake let's try to move forward on all these issues, not just some of them. regards, tom lane
On May 29, 2015 8:56:40 PM PDT, Robert Haas <robertmhaas@gmail.com> wrote: >On Fri, May 29, 2015 at 6:33 PM, Andres Freund <andres@anarazel.de> >wrote: >> On 2015-05-29 18:02:36 -0400, Robert Haas wrote: >>> Well, I think we ought to take at least a few weeks to try to do a >bit >>> of code review and clean up what we can from the open items list. >> >> Why? A large portion of the input required to go from beta towards a >> release is from actual users. To see when things break, what confuses >> them and such. > >I have two concerns: > >1. I'm concerned that once we release beta, any idea about reverting a >feature or fixing something that is broken will get harder, because >people will say "well, we can't do that after we've released a beta". >I confess to particularly wanting a solution to the item listed as >"custom-join has no way to construct Plan nodes of child Path nodes", >the history of which I'll avoid recapitulating until I'm sure I can do >it while maintaining my blood pressure at safe levels. I think we should just document that this a beta and that changes are to be expected. And have a release candidate once that'snot the case. I agree that it'd be very good of the custom join issue gets fixed. But I don't see a beta prohibiting it. Independentlyfrom that in going to ask a Citus colleague to make sure that pg-shard can use this. >2. Also, if we're going to make significant multixact-related changes >to 9.5 to try to improve reliability, as you proposed on the other >thread, then it would be nice to do that before beta, so that it gets >tested. Of course, someone is bound to point out that we could make >those changes in time for beta2, and people could test that. But in >practice I think that'll just mean that stuff is only out there for >let's say 2 months before we put it in a major release, which ain't >much. There seems to be enough other stuff in die need of testing that I don't think that's sufficient cause, even though I understandthe sentiment. Andres --- Please excuse brevity and formatting - I am writing this on my mobile phone.
On May 29, 2015 9:08:07 PM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote: >I think your position is completely nuts. Yeehaa. > The GROUPING SETS code is >desperately in need of testing. The custom-plan code is desperately >in need of fixing and testing. The multixact code is desperately >in need of testing. And the array/plpgsql changes and upsert, and... Andres --- Please excuse brevity and formatting - I am writing this on my mobile phone.
On Fri, May 29, 2015 at 04:01:00PM -0400, Tom Lane wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Bruce Momjian (bruce@momjian.us) wrote: > >> I am unclear if we are anywhere near ready for beta1 even in June. Are > >> we? > > > I'm all about having that discussion... but can we do it on another > > thread or at least wait til we've decided about the back-branch > > releases? They are clearly the more important issue to consider. > > It's the same discussion though, ie what releases are we expecting to > get out in the next couple of weeks. +1 for Stephen's thought to decide about back-branch releases first and to Magnus's sentiment upthread that beta has to stand back while we schedule them. In other words, the feedback between these two scheduling decisions ought to be one-way: bringing today's supported branches to a state we can be content about deserves first pick from the calendar.
On Sat, May 30, 2015 at 12:08:07AM -0400, Tom Lane wrote: > desperately in need of testing. The custom-plan code is desperately > in need of fixing and testing. The multixact code is desperately > in need of testing. The open-items list has several other problems > besides those. All of those problems are independent. If we insist > on tackling them serially rather than in parallel, 9.5 might not come > out till 2017. 2017? Really? Is there any need for that hyperbole? Frankly, based on how I feel now, I would have no problem doing 9.5 in 2016 and saying we have a lot of retooling to do. We could say we have gotten too far out ahead of ourselves and we need to regroup and restructure the code. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Sat, May 30, 2015 at 08:56:53AM -0400, Bruce Momjian wrote: > On Sat, May 30, 2015 at 12:08:07AM -0400, Tom Lane wrote: > > desperately in need of testing. The custom-plan code is desperately > > in need of fixing and testing. The multixact code is desperately > > in need of testing. The open-items list has several other problems > > besides those. All of those problems are independent. If we insist > > on tackling them serially rather than in parallel, 9.5 might not come > > out till 2017. > > 2017? Really? Is there any need for that hyperbole? > > Frankly, based on how I feel now, I would have no problem doing 9.5 in > 2016 and saying we have a lot of retooling to do. We could say we have > gotten too far out ahead of ourselves and we need to regroup and > restructure the code. Actually, barrelling ahead to get releases out is how we got into this mess in the first place. I would vote we put the 9.5 release on hold while we do an honest assessment of where we are. In hindsight, we should have known to do this even before 9.4 was released. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Sat, May 30, 2015 at 12:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think your position is completely nuts. The GROUPING SETS code is > desperately in need of testing. The custom-plan code is desperately > in need of fixing and testing. The multixact code is desperately > in need of testing. The open-items list has several other problems > besides those. All of those problems are independent. If we insist > on tackling them serially rather than in parallel, 9.5 might not come > out till 2017. If that means it's stable, +1 from me. I dispute, on every level, the notion that not releasing a beta means that we can't work on things in parallel. We can work on all of the things on the open items list in parallel right now. We can also test. And in fact, we should test. It's entirely appropriate to test our own stuff before we ask other people to test it. It's also appropriate to fix the things that we already know are broken before we ask other people to test it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 05/30/2015 06:11 AM, Bruce Momjian wrote: >> 2017? Really? Is there any need for that hyperbole? >> >> Frankly, based on how I feel now, I would have no problem doing 9.5 in >> 2016 and saying we have a lot of retooling to do. We could say we have >> gotten too far out ahead of ourselves and we need to regroup and >> restructure the code. > > Actually, barrelling ahead to get releases out is how we got into this > mess in the first place. I would vote we put the 9.5 release on hold > while we do an honest assessment of where we are. In hindsight, we > should have known to do this even before 9.4 was released. > It seems that we all are forgetting one of the fundamental concepts of open source development: Q. When will release X be? A. When it is done. A delay because of quality concerns shows the integrity of the project. Sincerely, JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. Announcing "I'm offended" is basically telling the world you can't control your own emotions, so everyone else should do it for you.
On Sat, May 30, 2015 at 10:06:52AM -0400, Robert Haas wrote: > If that means it's stable, +1 from me. > > I dispute, on every level, the notion that not releasing a beta means > that we can't work on things in parallel. We can work on all of the > things on the open items list in parallel right now. We can also > test. And in fact, we should test. It's entirely appropriate to test > our own stuff before we ask other people to test it. It's also > appropriate to fix the things that we already know are broken before > we ask other people to test it. Let me share something that people have told me privately but don't want to state publicly (at least with attribution), and that is that we have seen great increases in feature development (often funded), without a corresponding increase development efforts focused on stability. The fact Alvaro has had to almost single-handedly fix multi-xact bug until very recently is testament to that. The bottom line is that we just can't keep going on like this. The fact we put out a release two weeks ago, then need to put out a fix release for that, but we have more multi-xact bugs to fix and can't decide if we should do one or two minor releases, and are pushing out an alpha of 9.5 because we know we aren't ready for a beta, just confirms my analysis. I hate to be the bearer of bad news, but I think bad news is what we must face. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Sat, May 30, 2015 at 11:45 AM, Bruce Momjian <bruce@momjian.us> wrote: > On Sat, May 30, 2015 at 10:06:52AM -0400, Robert Haas wrote: >> If that means it's stable, +1 from me. >> >> I dispute, on every level, the notion that not releasing a beta means >> that we can't work on things in parallel. We can work on all of the >> things on the open items list in parallel right now. We can also >> test. And in fact, we should test. It's entirely appropriate to test >> our own stuff before we ask other people to test it. It's also >> appropriate to fix the things that we already know are broken before >> we ask other people to test it. > > Let me share something that people have told me privately but don't want > to state publicly (at least with attribution), and that is that we have > seen great increases in feature development (often funded), without a > corresponding increase development efforts focused on stability. The > fact Alvaro has had to almost single-handedly fix multi-xact bug until > very recently is testament to that. It's clear - at least to me - that we need to put more resources into stabilizing the new multixact system. This is killing us. If we can't stabilize this, people will go use some other database. Equally importantly, we need to make sure that we never release something comparably broken ever again. And that's why I'm not sanguine about shipping what we've got without adequate reflection. What, in this release, could break things badly? RLS? Grouping sets? Heikki's WAL format changes? That last one sounds really scary to me; it's painful if not impossible to fix the WAL format in a minor release. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, May 30, 2015 at 5:56 AM, Bruce Momjian <bruce@momjian.us> wrote: > Frankly, based on how I feel now, I would have no problem doing 9.5 in > 2016 and saying we have a lot of retooling to do. We could say we have > gotten too far out ahead of ourselves and we need to regroup and > restructure the code. I wouldn't mind doing that, but I think it's premature to conclude that it's necessary to wait quite that long to release. -- Peter Geoghegan
On Sat, May 30, 2015 at 11:10 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> Let me share something that people have told me privately but don't want >> to state publicly (at least with attribution), and that is that we have >> seen great increases in feature development (often funded), without a >> corresponding increase development efforts focused on stability. The >> fact Alvaro has had to almost single-handedly fix multi-xact bug until >> very recently is testament to that. > > It's clear - at least to me - that we need to put more resources into > stabilizing the new multixact system. This is killing us. If we can't > stabilize this, people will go use some other database. +1. I don't grok the MultiXact code as some people do, but even still, I think problems have been ongoing for so long now that we must change course. FWIW, my perception from afar is that the problems haven't really tapered off, and we'd be better off taking a fresh approach. > Equally importantly, we need to make sure that we never release > something comparably broken ever again. And that's why I'm not > sanguine about shipping what we've got without adequate reflection. As you said, there was a failure to appreciate the interactions with VACUUM. That should have made us more introspective about what we didn't know and couldn't know during during 9.3 development, but it didn't. > What, in this release, could break things badly? RLS? Grouping sets? > Heikki's WAL format changes? That last one sounds really scary to me; > it's painful if not impossible to fix the WAL format in a minor > release. I think we actually have learned some lessons here. MultiXacts were a somewhat unusual case for a couple of reasons that I need not rehash. In contrast, Heikki's WAL format changes (just for example) are fundamentally just a restructuring to the existing format. Sure, there could be bugs, but I think that it's fundamentally different to the 9.3 MultiXact stuff, in that the MultiXact stuff appears to be stubbornly difficult to stabilize over months and years. That feels like something that is unlikely to be true for anything that made it into 9.5. -- Peter Geoghegan
Hi Bruce, Everyone, On 2015-05-30 11:45:59 -0400, Bruce Momjian wrote: > Let me share something that people have told me privately but don't want > to state publicly (at least with attribution), and that is that we have > seen great increases in feature development (often funded), without a > corresponding increase development efforts focused on stability. Yes, I have seen and heard that too. What I think is also important that in turn our adoption has outpaced feature development (and thus transitively stability work). > The bottom line is that we just can't keep going on like this. The fact > we put out a release two weeks ago, then need to put out a fix release > for that, but we have more multi-xact bugs to fix and can't decide if we > should do one or two minor releases, and are pushing out an alpha of 9.5 > because we know we aren't ready for a beta, just confirms my analysis. I don't think that alone confirms very much. > I hate to be the bearer of bad news, but I think bad news is what we > must face. Well, the question is what we do with that observation. Personally I think it's not a new one. This point has been made repeatedly, including at most if not all developer meetings I attended. I definitely had conversations around it both in person, on IM and on list. I don't think it's primarily a problem of lack of review; although that is a large problem. I think the biggest systematic problem is that the compound complexity of postgres has increased dramatically over the years. Features have added complexity little by little, each not incrementally not looking that bad. But very little has been done to manage complexity. Since 8.0 the codesize has roughly doubled, but little has been done to manage the increased complexity. Few new abstractions have been introduced and the structure of the code is largely the same. As a somewhat extreme example, let's look at StartupXLOG(). In 8.0 it was ~500 LOC, in master it's ~1500. The interactions in 8.0 were complex, they have gotten much more complex since. It fullfills lots of different roles, all in one function: (roughly in the order things happen, but simplified) * Read the control file/determine whether we crashed * recovery.conf handling * backup label handling * tablespace map handling (huh, I missed that this was added directly to StartupXLOG. What a bad idea) * Determine whether we're doing archive recovery, read the relevant checkpoint if so * relcache init file removal * timeline switch handling * Loading the checkpoint we're starting from * Initialization of a lot of subsystems * crash recovery/replay * Including pgstat, unlogged table, exported snapshot handling * iff hot standby, some more subsystemsare initialized here * hot standby state handling * replay process intialization * crash replay itself, including * progress tracking * recovery pause handling * nextxid tracking * timeline increase handling * hot standbystate handling * unlogged relations handling * archive recovery handling * creation/initialization of the end of recoverycheckpoint * timeline increment if failover * subsystem initialization iff !hot_standby * end of recovery actions Yes. that's one routine. And, to make things even funnier, half of that routine isn't exercised by our tests. You can argue that this is an outlier, but I don't think so. Heapam, the planner, etc. have similar cases. And I think this, to some degree, explains a lot of the multixact problems. While there were a few "simple bugs", most of them were interactions between the various subsystems that are rather intricate. So, I think we have built up a lot of technical debt. And very little effort has been made to fix that; and in the cases where people have the reception has often been cool, because refactoring things obviously will destabilize in the short term, even if it fixes problems in the long term. I don't think that's sustainable. We can't improve the situation by just delaying the 9.5 release or something like that. We need to actively work on making the codebase easier to understand and better tested. But that is actual development work, and shouldn't happen at the tail end of a release. Regards, Andres
On 2015-05-30 14:10:36 -0400, Robert Haas wrote: > It's clear - at least to me - that we need to put more resources into > stabilizing the new multixact system. This is killing us. If we can't > stabilize this, people will go use some other database. I agree. Perhaps I don't see things quite as direly, but then I didn't just spend weeks on the issue. I remember that I was incredibly frustrated around 9.3.2 because I'd spent weeks on fixing issued around this and it just never seemed to stop. > Equally importantly, we need to make sure that we never release > something comparably broken ever again. And that's why I'm not > sanguine about shipping what we've got without adequate reflection. I think you're inferring something wrong here. A beta/alpha *is* getting feedback on how good/bad things are. It's just one source of such information, but we don't have that many others. As explained in the email I sent before this, I think a lot of the problems come from too complex code (with barely any testing). But we're not going to be able to clean this up in 9.5. This will be a longer term effort. If we, without further changes, decide to let the release slip to, say, Q1 2016, the only thing that'll happen is to happen that 9.6 will have larger, more complex features. With barely any additional review and testing done. There was very little, if any, additional testing/review outside jsonb due to the 9.4 slippage. I don't think the problems have much to do with the release schedule. > What, in this release, could break things badly? > RLS? Mostly localized to users of the feature. Niche use case. > Grouping sets? Few changes to code unless grouping sets are used. > Heikki's WAL format changes? Yes, that's quite invasive. On the other hand, I can't think of another feature that had as much invested in tooling to detect problem. What's more: * Upsert - it's probably the most complex feature in 9.5. It's quite localized though. * The locking changes, a good amount of potential for subtle problems * The signal handling, sinval, client communication changes. Little to none problems so far, but it's complex stuff. Thesechanges are an example of potential for problems due to changes to reduce complexity... Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > * The signal handling, sinval, client communication changes. Little to > none problems so far, but it's complex stuff. These changes are an > example of potential for problems due to changes to reduce > complexity... As far as that goes, it's quite clear from the buildfarm that the atomics stuff is not very stable on non-mainstream architectures. regards, tom lane
On May 30, 2015 2:19:00 PM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote: >Andres Freund <andres@anarazel.de> writes: >> * The signal handling, sinval, client communication changes. Little >to >> none problems so far, but it's complex stuff. These changes are an >> example of potential for problems due to changes to reduce >> complexity... > >As far as that goes, it's quite clear from the buildfarm that the >atomics stuff is not very stable on non-mainstream architectures. Is that the case? So far it seems to primarily be a problem of the, old, barrier emulation being buggy (non reentrant). Andthat being visible due to the new barrier in the latch code. If not be surprised if there were more bugs, don't get me wrong, this is highly platform dependant stuff. --- Please excuse brevity and formatting - I am writing this on my mobile phone.
On 5/30/15 2:10 PM, Robert Haas wrote: > What, in this release, could break things badly? RLS? Grouping sets? > Heikki's WAL format changes? That last one sounds really scary to me; > it's painful if not impossible to fix the WAL format in a minor > release. I would argue Heikki's WAL stuff is a perfect case for releasing a public alpha/beta soon. I'd love to test PgBackRest with an "official" 9.5dev build. The PgBackRest test suite has lots of tests that run on versions 8.3+ and might well shake out any bugs that are lying around. In fact, I've added a new feature based on monitoring the thread and I'm interested to see how that pans out. -- - David Steele david@pgmasters.net
On 05/30/2015 03:48 PM, David Steele wrote: > On 5/30/15 2:10 PM, Robert Haas wrote: >> What, in this release, could break things badly? RLS? Grouping sets? >> Heikki's WAL format changes? That last one sounds really scary to me; >> it's painful if not impossible to fix the WAL format in a minor >> release. > > I would argue Heikki's WAL stuff is a perfect case for releasing a > public alpha/beta soon. I'd love to test PgBackRest with an "official" > 9.5dev build. The PgBackRest test suite has lots of tests that run on > versions 8.3+ and might well shake out any bugs that are lying around. You are right. Clone git, run it nightly automated and please, please report anything you find. There is no reason for a tagged release for that. Consider it a custom, purpose built, build-test farm. Sincerely, JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. Announcing "I'm offended" is basically telling the world you can't control your own emotions, so everyone else should do it for you.
On 5/30/15 8:38 PM, Joshua D. Drake wrote: > > On 05/30/2015 03:48 PM, David Steele wrote: >> On 5/30/15 2:10 PM, Robert Haas wrote: >>> What, in this release, could break things badly? RLS? Grouping sets? >>> Heikki's WAL format changes? That last one sounds really scary to me; >>> it's painful if not impossible to fix the WAL format in a minor >>> release. >> >> I would argue Heikki's WAL stuff is a perfect case for releasing a >> public alpha/beta soon. I'd love to test PgBackRest with an "official" >> 9.5dev build. The PgBackRest test suite has lots of tests that run on >> versions 8.3+ and might well shake out any bugs that are lying around. > > You are right. Clone git, run it nightly automated and please, please > report anything you find. There is no reason for a tagged release for > that. Consider it a custom, purpose built, build-test farm. Sure - I can write code to do that. But then why release a beta at all? -- - David Steele david@pgmasters.net
On Sat, May 30, 2015 at 12:26:11PM -0700, Peter Geoghegan wrote: > On Sat, May 30, 2015 at 5:56 AM, Bruce Momjian <bruce@momjian.us> wrote: > > Frankly, based on how I feel now, I would have no problem doing 9.5 in > > 2016 and saying we have a lot of retooling to do. We could say we have > > gotten too far out ahead of ourselves and we need to regroup and > > restructure the code. > > I wouldn't mind doing that, but I think it's premature to conclude > that it's necessary to wait quite that long to release. I agree it probably wouldn't take until 2016, but if does take until 2016, we have to be fine with that. What I am saying is we can't just continue to focus on hitting target dates and assume everything will be fine, because it isn't. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On 05/30/2015 06:51 PM, David Steele wrote: > On 5/30/15 8:38 PM, Joshua D. Drake wrote: >> >> On 05/30/2015 03:48 PM, David Steele wrote: >>> On 5/30/15 2:10 PM, Robert Haas wrote: >>>> What, in this release, could break things badly? RLS? Grouping sets? >>>> Heikki's WAL format changes? That last one sounds really scary to me; >>>> it's painful if not impossible to fix the WAL format in a minor >>>> release. >>> >>> I would argue Heikki's WAL stuff is a perfect case for releasing a >>> public alpha/beta soon. I'd love to test PgBackRest with an "official" >>> 9.5dev build. The PgBackRest test suite has lots of tests that run on >>> versions 8.3+ and might well shake out any bugs that are lying around. >> >> You are right. Clone git, run it nightly automated and please, please >> report anything you find. There is no reason for a tagged release for >> that. Consider it a custom, purpose built, build-test farm. > > Sure - I can write code to do that. But then why release a beta at all? 1. Continuous testing (especially automated) is a great thing (see Buildfarm) 2. The rules for patches change a bit when we move to Beta 3. We may be able to fix a problem now (or soon) that you might catch before Beta. Sincerely, J -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. Announcing "I'm offended" is basically telling the world you can't control your own emotions, so everyone else should do it for you.
On Saturday, May 30, 2015, Bruce Momjian <bruce@momjian.us> wrote:
On Sat, May 30, 2015 at 12:26:11PM -0700, Peter Geoghegan wrote:
> On Sat, May 30, 2015 at 5:56 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > Frankly, based on how I feel now, I would have no problem doing 9.5 in
> > 2016 and saying we have a lot of retooling to do. We could say we have
> > gotten too far out ahead of ourselves and we need to regroup and
> > restructure the code.
>
> I wouldn't mind doing that, but I think it's premature to conclude
> that it's necessary to wait quite that long to release.
I agree it probably wouldn't take until 2016, but if does take until
2016, we have to be fine with that. What I am saying is we can't just
continue to focus on hitting target dates and assume everything will be
fine, because it isn't.
On a slightly tangential note: I'm not prepared to defend doing so but it seems worth at least considering whether we should continue supporting 9.0 beyond this October.
I don't think it should be be de-supported until at least a couple of 9.5 point releases have been found to be stable.
David J.
On Sat, May 30, 2015 at 10:47:27PM +0200, Andres Freund wrote: > > The bottom line is that we just can't keep going on like this. The fact > > we put out a release two weeks ago, then need to put out a fix release > > for that, but we have more multi-xact bugs to fix and can't decide if we > > should do one or two minor releases, and are pushing out an alpha of 9.5 > > because we know we aren't ready for a beta, just confirms my analysis. > > I don't think that alone confirms very much. Huh? In what world is that release timeline ever reasonable? It points to a serious problem. > > I hate to be the bearer of bad news, but I think bad news is what we > > must face. > > Well, the question is what we do with that observation. Personally I > think it's not a new one. This point has been made repeatedly, including > at most if not all developer meetings I attended. I definitely had > conversations around it both in person, on IM and on list. Well, I think we stop what we are doing, focus on restructuring, testing, and reviewing areas that historically have had problems, and when we are done, we can look to go to 9.5 beta. What we don't want to do is to push out more code and get back into a wack-a-bug-as-they-are-found mode, which obviously did not serve us well for multi-xact, and which is what releasing a beta will do, and of course, more commit-fests, and more features. If we have to totally stop feature development until we are all happy with the code we have, so be it. If people feel they have to get into cleanup mode or they will never get to add a feature to Postgres again, so be it. If people say, heh, I am not going to do anything and just come back when cleanup is done (by someone else), then we will end up with a smaller but more dedicated development team, and I am fine with that too. I am suggesting that until everyone is happy with the code we have, we should not move forward. Forget 9.5 feature testing --- we don't even have 9.3 and 9.4 working to my satisfaction yet, and I bet others share my opinion. We do not want to look back on this period and say _this_ is when Postgres lost its reputation for reliability, and when other databases took that reputation from us. > I don't think it's primarily a problem of lack of review; although that > is a large problem. I think the biggest systematic problem is that the > compound complexity of postgres has increased dramatically over the > years. Features have added complexity little by little, each not > incrementally not looking that bad. But very little has been done to > manage complexity. Since 8.0 the codesize has roughly doubled, but > little has been done to manage the increased complexity. Few new > abstractions have been introduced and the structure of the code is > largely the same. > > As a somewhat extreme example, let's look at StartupXLOG(). In 8.0 it > was ~500 LOC, in master it's ~1500. The interactions in 8.0 were > complex, they have gotten much more complex since. It fullfills lots of > different roles, all in one function: Yep, great please to start our work. > So, I think we have built up a lot of technical debt. And very little > effort has been made to fix that; and in the cases where people have the > reception has often been cool, because refactoring things obviously will > destabilize in the short term, even if it fixes problems in the long > term. I don't think that's sustainable. Agreed. > We can't improve the situation by just delaying the 9.5 release or > something like that. We need to actively work on making the codebase > easier to understand and better tested. But that is actual development > work, and shouldn't happen at the tail end of a release. It should start right now, and then, once we are happy with our code, we can take periodic breaks to revisit the exact issues you describe. What I am saying is that we shouldn't wait until after 9.5 beta or after 9.5 final, or after the next commitfest or whatever. We have already waited too long to do this. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Sun, May 31, 2015 at 11:48 AM, Bruce Momjian wrote: > On Sat, May 30, 2015 at 10:47:27PM +0200, Andres Freund wrote: >> So, I think we have built up a lot of technical debt. And very little >> effort has been made to fix that; and in the cases where people have the >> reception has often been cool, because refactoring things obviously will >> destabilize in the short term, even if it fixes problems in the long >> term. I don't think that's sustainable. > > Agreed. +1. Complexity has increased, and we are actually never at 100% sure that a given bug fix does not have side effects on other things, hence I think that a portion of this technical debt is the lack of regression test coverage, for both existing features and platforms (like Windows). The thing is that complexity has increased, but for example for many features we lack test coverage, thinking mainly replication-related stuff here. Of course we will never get to a level of 100% of confidence with just the test coverage and the buildfarm, but we should at least try to get closer to such a goal. Those are things I am really willing to work on in the very short term for what it's worth (of course not only that as reviewing/refactoring/testing existing things is as well damn important). Now improving the test coverage requires new infrastructure, so those are new features, and that's perhaps not dedicated to 9.5, except if we consider that this is part of this technical debt accumulated among the years. Honestly I think it is. >> We can't improve the situation by just delaying the 9.5 release or >> something like that. We need to actively work on making the codebase >> easier to understand and better tested. But that is actual development >> work, and shouldn't happen at the tail end of a release. > > It should start right now, and then, once we are happy with our code, we > can take periodic breaks to revisit the exact issues you describe. What > I am saying is that we shouldn't wait until after 9.5 beta or after 9.5 > final, or after the next commitfest or whatever. We have already waited > too long to do this. Definitely. -- Michael
On Sat, May 30, 2015 at 3:46 PM, Peter Geoghegan <pg@heroku.com> wrote: >> What, in this release, could break things badly? RLS? Grouping sets? >> Heikki's WAL format changes? That last one sounds really scary to me; >> it's painful if not impossible to fix the WAL format in a minor >> release. > > I think we actually have learned some lessons here. MultiXacts were a > somewhat unusual case for a couple of reasons that I need not rehash. > > In contrast, Heikki's WAL format changes (just for example) are > fundamentally just a restructuring to the existing format. Sure, there > could be bugs, but I think that it's fundamentally different to the > 9.3 MultiXact stuff, in that the MultiXact stuff appears to be > stubbornly difficult to stabilize over months and years. That feels > like something that is unlikely to be true for anything that made it > into 9.5. I hope you're right. But I don't think any of us foresaw just how bad the MultiXact thing was likely to be either. In fact, I think to some extent we may STILL be in denial about how bad it is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, May 31, 2015 at 08:15:38PM +0900, Michael Paquier wrote: > On Sun, May 31, 2015 at 11:48 AM, Bruce Momjian wrote: > > On Sat, May 30, 2015 at 10:47:27PM +0200, Andres Freund wrote: > >> So, I think we have built up a lot of technical debt. And very little > >> effort has been made to fix that; and in the cases where people have the > >> reception has often been cool, because refactoring things obviously will > >> destabilize in the short term, even if it fixes problems in the long > >> term. I don't think that's sustainable. > > > > Agreed. > > +1. Complexity has increased, and we are actually never at 100% sure > that a given bug fix does not have side effects on other things, hence > I think that a portion of this technical debt is the lack of > regression test coverage, for both existing features and platforms > (like Windows). The thing is that complexity has increased, but for > example for many features we lack test coverage, thinking mainly > replication-related stuff here. Of course we will never get to a level > of 100% of confidence with just the test coverage and the buildfarm, > but we should at least try to get closer to such a goal. FYI, I realize that one additional thing that has discouraged code reorganization is the additional backpatch overhead. I think we now need to accept that our reorganization-adverse approach might have cost us some reliability, and that reorganization is going to add work to backpatching. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Sun, May 31, 2015 at 09:50:25AM -0400, Bruce Momjian wrote: > > +1. Complexity has increased, and we are actually never at 100% sure > > that a given bug fix does not have side effects on other things, hence > > I think that a portion of this technical debt is the lack of > > regression test coverage, for both existing features and platforms > > (like Windows). The thing is that complexity has increased, but for > > example for many features we lack test coverage, thinking mainly > > replication-related stuff here. Of course we will never get to a level > > of 100% of confidence with just the test coverage and the buildfarm, > > but we should at least try to get closer to such a goal. > > FYI, I realize that one additional thing that has discouraged code > reorganization is the additional backpatch overhead. I think we now > need to accept that our reorganization-adverse approach might have cost > us some reliability, and that reorganization is going to add work to > backpatching. Actually, code reorganization in HEAD might cause backpatching to be more buggy, reducing reliability --- obviously we need to have a discussion about that. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Sat, May 30, 2015 at 09:51:04PM -0400, David Steele wrote: > On 5/30/15 8:38 PM, Joshua D. Drake wrote: > > On 05/30/2015 03:48 PM, David Steele wrote: > >> I would argue Heikki's WAL stuff is a perfect case for releasing a > >> public alpha/beta soon. I'd love to test PgBackRest with an "official" > >> 9.5dev build. The PgBackRest test suite has lots of tests that run on > >> versions 8.3+ and might well shake out any bugs that are lying around. > > > > You are right. Clone git, run it nightly automated and please, please > > report anything you find. There is no reason for a tagged release for > > that. Consider it a custom, purpose built, build-test farm. > > Sure - I can write code to do that. But then why release a beta at all? It's largely for the benefit of folks planning manual, or otherwise high-cost, testing. If you budget for just one big test per year, make it a test of beta1. For inexpensive testing, you may as well ignore beta and test git master daily or weekly.
Bruce Momjian <bruce@momjian.us> writes: >> FYI, I realize that one additional thing that has discouraged code >> reorganization is the additional backpatch overhead. I think we now >> need to accept that our reorganization-adverse approach might have cost >> us some reliability, and that reorganization is going to add work to >> backpatching. > Actually, code reorganization in HEAD might cause backpatching to be > more buggy, reducing reliability --- obviously we need to have a > discussion about that. Commit 6b700301c36e380eb4972ab72c0e914cae60f9fd is a recent real example. Not that that should dissuade us from ever doing any reorganizations, but it's foolish to discount back-patching costs. regards, tom lane
On 5/31/15 11:49 AM, Noah Misch wrote: > On Sat, May 30, 2015 at 09:51:04PM -0400, David Steele wrote: >> On 5/30/15 8:38 PM, Joshua D. Drake wrote: >>> On 05/30/2015 03:48 PM, David Steele wrote: >>>> I would argue Heikki's WAL stuff is a perfect case for releasing a >>>> public alpha/beta soon. I'd love to test PgBackRest with an "official" >>>> 9.5dev build. The PgBackRest test suite has lots of tests that run on >>>> versions 8.3+ and might well shake out any bugs that are lying around. >>> >>> You are right. Clone git, run it nightly automated and please, please >>> report anything you find. There is no reason for a tagged release for >>> that. Consider it a custom, purpose built, build-test farm. >> >> Sure - I can write code to do that. But then why release a beta at all? > > It's largely for the benefit of folks planning manual, or otherwise high-cost, > testing. If you budget for just one big test per year, make it a test of > beta1. For inexpensive testing, you may as well ignore beta and test git > master daily or weekly. I've gotten to the point of (relatively) high-cost coding/testing. The removal of checkpoint_segments and pause_on_recovery are leading to refactoring of not only the regressions tests but the actual backup code. 9.5 and 8.3 are the only versions that require exceptions in the code base. I've already done basic testing against 9.5 by disabling certain tests.Now I'm at the point where I need to start modifyingcode to take new 9.5 features/changes into account and make sure the regression tests work for 8.3-9.5 with the fewest number of exceptions possible. From the perspective of backup/restore testing, 9.5 has the most changes since 9.0. I'd like to know that the API at least is stable before investing the time in new development. Perhaps I'm just misunderstanding the nature of the discussion. -- - David Steele david@pgmasters.net
On Sun, May 31, 2015 at 11:55:44AM -0400, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > >> FYI, I realize that one additional thing that has discouraged code > >> reorganization is the additional backpatch overhead. I think we now > >> need to accept that our reorganization-adverse approach might have cost > >> us some reliability, and that reorganization is going to add work to > >> backpatching. > > > Actually, code reorganization in HEAD might cause backpatching to be > > more buggy, reducing reliability --- obviously we need to have a > > discussion about that. > > Commit 6b700301c36e380eb4972ab72c0e914cae60f9fd is a recent real example. > Not that that should dissuade us from ever doing any reorganizations, > but it's foolish to discount back-patching costs. Yep. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On 2015-05-31 11:55:44 -0400, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > >> FYI, I realize that one additional thing that has discouraged code > >> reorganization is the additional backpatch overhead. I think we now > >> need to accept that our reorganization-adverse approach might have cost > >> us some reliability, and that reorganization is going to add work to > >> backpatching. > > > Actually, code reorganization in HEAD might cause backpatching to be > > more buggy, reducing reliability --- obviously we need to have a > > discussion about that. > > Commit 6b700301c36e380eb4972ab72c0e914cae60f9fd is a recent real example. > Not that that should dissuade us from ever doing any reorganizations, > but it's foolish to discount back-patching costs. On the other hand, that code is a complete maintenance nightmare. If there weren't literally dozens of places that needed to be touched to add a single parameter, it'd be far less likely for such a mistake to be made. Right now significant portions of the file differ between the branches, despite primarily minor feature additions...
On Sun, May 31, 2015 at 11:03 PM, Bruce Momjian <bruce@momjian.us> wrote: > On Sun, May 31, 2015 at 09:50:25AM -0400, Bruce Momjian wrote: >> > +1. Complexity has increased, and we are actually never at 100% sure >> > that a given bug fix does not have side effects on other things, hence >> > I think that a portion of this technical debt is the lack of >> > regression test coverage, for both existing features and platforms >> > (like Windows). The thing is that complexity has increased, but for >> > example for many features we lack test coverage, thinking mainly >> > replication-related stuff here. Of course we will never get to a level >> > of 100% of confidence with just the test coverage and the buildfarm, >> > but we should at least try to get closer to such a goal. >> >> FYI, I realize that one additional thing that has discouraged code >> reorganization is the additional backpatch overhead. I think we now >> need to accept that our reorganization-adverse approach might have cost >> us some reliability, and that reorganization is going to add work to >> backpatching. > > Actually, code reorganization in HEAD might cause backpatching to be > more buggy, reducing reliability --- obviously we need to have a > discussion about that. As a result, IMO all the folks gathering to PGCon (won't be there sorry, but I read the MLs) should have a talk about that and define a clear list of items to tackle in terms of reorganization for 9.5, and then update this page: https://wiki.postgresql.org/wiki/PostgreSQL_9.5_Open_Items This does not prevent to move on with all the current items and continue reviewing existing features that have been pushed of course. -- Michael
Magnus Hagander <magnus@hagander.net> writes: > On Fri, May 29, 2015 at 8:02 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> I think we should postpone next week's release. > I'm a bit split on this. > We *definitely* don't want to release the multixact fix without it being > carefully reviewed, that's the part I'm not split about :) And I fully > appreciate we can't have that done by monday. > However, the file-permission thing seems to hit quite a few people (have we > ever had this many bug reports after a minor release), which means wed > really want to get that out quickly. After dithering over the weekend, the majority view on -core seems to be that we should go ahead with making a release today for the fsync issue. We'll plan another release next week, or whenever the dust seems to have settled on the multixact issue(s). regards, tom lane
On 5/29/15 5:28 PM, Bruce Momjian wrote: >> could expect that anyone committing a user-visible semantics change should >> >update the release notes themselves. > Yes, that would be nice. FWIW, I've always wondered why we don't create an empty next-version release notes as part of stamping a major release and expect patch authors to add to it. I realize that likely creates merge conflicts, but that seems less work than doing it all at the end. (Or maybe each patch just creates a file and the final process is pulling all the files together.) -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com
Jim Nasby <Jim.Nasby@bluetreble.com> writes: > FWIW, I've always wondered why we don't create an empty next-version > release notes as part of stamping a major release and expect patch > authors to add to it. I realize that likely creates merge conflicts, but > that seems less work than doing it all at the end. (Or maybe each patch > just creates a file and the final process is pulling all the files > together.) There are good reasons to write the release notes all in one batch: otherwise you don't get any uniformity of editorial style. regards, tom lane
On 2015-06-01 12:32:21 -0400, Tom Lane wrote: > There are good reasons to write the release notes all in one batch: > otherwise you don't get any uniformity of editorial style. I agree that that's a good reason for major releases, I do however wonder if it'd not be a good idea to do differently for backpatched bugfixes. It's imo a good thing to force committers to write a release notice at the same time they're backpatching. The memory is fresh, and the commit message is more likely to contain pertinent details.
Andres Freund <andres@anarazel.de> writes: > On 2015-06-01 12:32:21 -0400, Tom Lane wrote: >> There are good reasons to write the release notes all in one batch: >> otherwise you don't get any uniformity of editorial style. > I agree that that's a good reason for major releases, I do however > wonder if it'd not be a good idea to do differently for backpatched > bugfixes. It's imo a good thing to force committers to write a release > notice at the same time they're backpatching. The memory is fresh, and > the commit message is more likely to contain pertinent details. We do expect committers to write commit log messages that contain appropriate raw material for the release notes. That's not the same as expecting them to prepare an actual, sgml-marked-up, release note entry that's in good English and occupies a reasonable amount of space relative to other items. Jim's point about merge problems is very pertinent as well. In the first place, if we had running release notes like that, they'd often differ from one branch to the next, making back-patching rather annoying. In the second place, SGML is so bulky that the patch context you'd be working with would frequently look like not much more than </para> </listitem> <listitem> <para> making it very easy for the hunks to be misapplied. Lastly, we have recently adopted a practice of labeling release note entries with the associated commit hashes. I dunno how much value that really has, but it would be entirely impossible to write such labels in advance of pushing the fixes. regards, tom lane
All, Just my $0.02 on PR: it has never been a PR problem to do multiple update releases, as long as we could provide a good reason for doing so (like: fix A is available now and we didn't want to hold it back waiting for fix B). It's always a practical question of (a) packaging and (b) deployment. That is, we can get packager fatigue where some updates don't get packaged, and we can get user fatigue where they start ignoring updates. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Sun, May 31, 2015 at 12:09:16PM -0400, David Steele wrote: > On 5/31/15 11:49 AM, Noah Misch wrote: > > On Sat, May 30, 2015 at 09:51:04PM -0400, David Steele wrote: > >> Sure - I can write code to do that. But then why release a beta at all? > > > > It's largely for the benefit of folks planning manual, or otherwise high-cost, > > testing. If you budget for just one big test per year, make it a test of > > beta1. For inexpensive testing, you may as well ignore beta and test git > > master daily or weekly. > > I've gotten to the point of (relatively) high-cost coding/testing. The > removal of checkpoint_segments and pause_on_recovery are leading to > refactoring of not only the regressions tests but the actual backup > code. 9.5 and 8.3 are the only versions that require exceptions in the > code base. > > I've already done basic testing against 9.5 by disabling certain tests. > Now I'm at the point where I need to start modifying code to take new > 9.5 features/changes into account and make sure the regression tests > work for 8.3-9.5 with the fewest number of exceptions possible. Release of beta1 is the cue for that sort of work. > From the perspective of backup/restore testing, 9.5 has the most changes > since 9.0. I'd like to know that the API at least is stable before > investing the time in new development. Its API will be as good as pgsql-hackers could make it; beta1 is also a call for help discovering API problems we overlooked. Subsequent API changes are usually reactions to beta test reports.
Subject changed from "Re: [CORE] postpone next week's release". On Sat, May 30, 2015 at 10:48:45PM -0400, Bruce Momjian wrote: > Well, I think we stop what we are doing, focus on restructuring, > testing, and reviewing areas that historically have had problems, and > when we are done, we can look to go to 9.5 beta. What we don't want to > do is to push out more code and get back into a > wack-a-bug-as-they-are-found mode, which obviously did not serve us well > for multi-xact, and which is what releasing a beta will do, and of > course, more commit-fests, and more features. > > If we have to totally stop feature development until we are all happy > with the code we have, so be it. If people feel they have to get into > cleanup mode or they will never get to add a feature to Postgres again, > so be it. If people say, heh, I am not going to do anything and just > come back when cleanup is done (by someone else), then we will end up > with a smaller but more dedicated development team, and I am fine with > that too. I am suggesting that until everyone is happy with the code we > have, we should not move forward. I like the essence of this proposal. Two suggestions. We can't achieve or even robustly measure "everyone is happy with the code," so let's pick concrete exit criteria. Given criteria framed like "Files A,B,C and patches X,Y,Z have a sign-off from a committer other than their original committer." anyone can monitor progress and find specific ways to contribute. Second, I would define the subject matter as "bug fixes, testing and review", not "restructuring, testing and review." Different code structures are clearest to different hackers. Restructuring, on average, adds bugs even more quickly than feature development adds them. Thanks, nm
<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On 3 June 2015 at 14:50, Noah Misch <span dir="ltr"><<ahref="mailto:noah@leadboat.com" target="_blank">noah@leadboat.com</a>></span> wrote:<br /><blockquoteclass="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><divclass="" id=":6l2"style="overflow:hidden"> I<div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small;display:inline"></div>would define the subject matter as "bug fixes,testing and review", not<br /> "restructuring, testing and review." Different code structures are clearest<br /> todifferent hackers. Restructuring, on average, adds bugs even more quickly<br /> than feature development adds them.<br/></div></blockquote></div><br /><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small">+1to this. Rewriting or restructuring code because you don't trustit (even though you have no reported real-world bugs) is a terrible idea. </div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><br/></div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small">Stoppingall feature development to do it is even worse.</div><divclass="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><br /></div><div class="gmail_default"style="style"><font face="verdana, sans-serif">I know you're not talking about rewriting, but I think<a href="http://www.joelonsoftware.com/articles/fog0000000069.html">http://www.joelonsoftware.com/articles/fog0000000069.html</a> isalways worth a re-read, if only because it's funny :)</font><br /></div><div class="gmail_default" style="style"><fontface="verdana, sans-serif"><br /></font></div><div class="gmail_default" style="style"><font face="verdana,sans-serif">I would always 100% support a decision to push back new releases because of bugfixes for <i>known</i> issues,but if you think you <i>might </i>be able to find bugs in code you don't like, you should do that onyour own time. Iff you find actual bugs, <i>then </i>you talk about halting new releases.</font></div><div class="gmail_default"style="style"><font face="verdana, sans-serif"><br /></font></div><div class="gmail_default" style="style"><spanstyle="font-family:verdana,sans-serif">Geoff</span></div></div></div>
On 2015-06-03 09:50:49 -0400, Noah Misch wrote: > Second, I would define the subject matter as "bug fixes, testing and > review", not "restructuring, testing and review." Different code > structures are clearest to different hackers. Restructuring, on > average, adds bugs even more quickly than feature development adds > them. I can't agree with this. While I agree with not doing large restructuring for 9.5, I think we can't affort not to refactor for clarity, even if that introduces bugs. Noticeable parts of our code have to frequently be modified for new features and are badly structured at the same time. While restructuring will may temporarily increase the number of bugs in the short term, it'll decrease the number of bugs long term while increasing the number of potential contributors and new features. That's obviously not to say we should just refactor for the sake of it.
On 06/03/2015 07:18 AM, Andres Freund wrote: > > On 2015-06-03 09:50:49 -0400, Noah Misch wrote: >> Second, I would define the subject matter as "bug fixes, testing and >> review", not "restructuring, testing and review." Different code >> structures are clearest to different hackers. Restructuring, on >> average, adds bugs even more quickly than feature development adds >> them. > > I can't agree with this. While I agree with not doing large > restructuring for 9.5, I think we can't affort not to refactor for > clarity, even if that introduces bugs. Noticeable parts of our code have > to frequently be modified for new features and are badly structured at > the same time. While restructuring will may temporarily increase the > number of bugs in the short term, it'll decrease the number of bugs long > term while increasing the number of potential contributors and new > features. That's obviously not to say we should just refactor for the > sake of it. > Our project has been continuing to increase momentum over the last few years and our adoption has increased at an amazing rate. It is important to remember that we have users. These users have needs that must be met else those users will move on to a different technology. I agree that we need to postpone this release. I also agree that there is likely re-factoring to be done. I have also never met a programmer who doesn't think something needs to be re-factored. The majority of programmers I know all suffer from NIH and want to change how things are implemented. If we are going to re-factor, it should not be considered global and should be attacked with specific goals in mind. If those goals are not specifically defined and agreed on, we will get very pretty code with very little use for our users. Then our users will leave because they are busy waiting on us to re-factor. In short, we must balance this effort with the needs of the code versus the needs of our users. Sincerely, JD -- The most kicking donkey PostgreSQL Infrastructure company in existence. The oldest, the most experienced, the consulting company to the stars. Command Prompt, Inc. http://www.commandprompt.com/ +1 -503-667-4564 - 24x7 - 365 - Proactive and Managed Professional Services!
On 06/03/2015 06:50 AM, Noah Misch wrote: > Subject changed from "Re: [CORE] postpone next week's release". > > On Sat, May 30, 2015 at 10:48:45PM -0400, Bruce Momjian wrote: >> If we have to totally stop feature development until we are all happy >> with the code we have, so be it. If people feel they have to get into >> cleanup mode or they will never get to add a feature to Postgres again, >> so be it. If people say, heh, I am not going to do anything and just >> come back when cleanup is done (by someone else), then we will end up >> with a smaller but more dedicated development team, and I am fine with >> that too. I am suggesting that until everyone is happy with the code we >> have, we should not move forward. > > I like the essence of this proposal. Two suggestions. We can't achieve or > even robustly measure "everyone is happy with the code," so let's pick > concrete exit criteria. Given criteria framed like "Files A,B,C and patches > X,Y,Z have a sign-off from a committer other than their original committer." > anyone can monitor progress and find specific ways to contribute. Second, I > would define the subject matter as "bug fixes, testing and review", not > "restructuring, testing and review." Different code structures are clearest > to different hackers. Restructuring, on average, adds bugs even more quickly > than feature development adds them. So, historically, this is what the period between feature freeze and beta1 was for; the "consolidation" phase was supposed to deal with this.The problem over the last few years, by my observation,has been that consolidation has been left to just a few people (usually just Bruce & Tom or Tom & Robert) and our code base is now much to large for that. The way other projects deal with this is having continuous testing as stuff comes in, and *more* testing that just our regression tests (e.g. acceptance tests, integration tests, performance tests, etc.). So our other issue has been that our code complexity has been growing faster than our test suite. Part of that is that this community has never placed much value in automated testing or testers, so people who are interested in it find other projects to contribute to. I would argue that if we delay 9.5 in order to do a 100% manual review of code, without adding any new automated tests or other non-manual tools for improving stability, then it's a waste of time; we might as well just release the beta, and our users will find more issues than we will. I am concerned that if we declare a cleanup period, especially in the middle of the summer, all that will happen is that the project will go to sleep for an extra three months. I will also point out that there is a major adoption cost to delaying 9.5. Right now users are excited about UPSERT, big data, and extra JSON features. If they have to wait another 7 months, they'll be a lot less excited, and we'll lose more potential users to the new databases and the MySQL forks. It could also delay the BDR project (Simon/Craig can speak to this) which would suck. Reliability of having a release every year is important as well as database reliability ... and for a lot of the new webdev generation, PostgreSQL is already the most reliable piece of software infrastructure they use. So if we're going to have a cleanup delay, then let's please make it an *intensive* cleanup delay, with specific goals, milestones, and a schedule. Otherwise, don't bother. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 2015-06-03 10:21:28 -0700, Josh Berkus wrote: > So, historically, this is what the period between feature freeze and > beta1 was for; the "consolidation" phase was supposed to deal with this. > The problem over the last few years, by my observation, has been that > consolidation has been left to just a few people (usually just Bruce & > Tom or Tom & Robert) and our code base is now much to large for that. > > The way other projects deal with this is having continuous testing as > stuff comes in, and *more* testing that just our regression tests (e.g. > acceptance tests, integration tests, performance tests, etc.). So our > other issue has been that our code complexity has been growing faster > than our test suite. Part of that is that this community has never > placed much value in automated testing or testers, so people who are > interested in it find other projects to contribute to. > > I would argue that if we delay 9.5 in order to do a 100% manual review > of code, without adding any new automated tests or other non-manual > tools for improving stability, then it's a waste of time; we might as > well just release the beta, and our users will find more issues than we > will. I am concerned that if we declare a cleanup period, especially in > the middle of the summer, all that will happen is that the project will > go to sleep for an extra three months. > > I will also point out that there is a major adoption cost to delaying > 9.5. Right now users are excited about UPSERT, big data, and extra > JSON features. If they have to wait another 7 months, they'll be a lot > less excited, and we'll lose more potential users to the new databases > and the MySQL forks. It could also delay the BDR project (Simon/Craig > can speak to this) which would suck. > > Reliability of having a release every year is important as well as > database reliability ... and for a lot of the new webdev generation, > PostgreSQL is already the most reliable piece of software infrastructure > they use. So if we're going to have a cleanup delay, then let's please > make it an *intensive* cleanup delay, with specific goals, milestones, > and a schedule. Otherwise, don't bother. +very many
On 05/31/2015 03:51 AM, David Steele wrote: > On 5/30/15 8:38 PM, Joshua D. Drake wrote: >> >> On 05/30/2015 03:48 PM, David Steele wrote: >>> On 5/30/15 2:10 PM, Robert Haas wrote: >>>> What, in this release, could break things badly? RLS? Grouping sets? >>>> Heikki's WAL format changes? That last one sounds really scary to me; >>>> it's painful if not impossible to fix the WAL format in a minor >>>> release. >>> >>> I would argue Heikki's WAL stuff is a perfect case for releasing a >>> public alpha/beta soon. I'd love to test PgBackRest with an "official" >>> 9.5dev build. The PgBackRest test suite has lots of tests that run on >>> versions 8.3+ and might well shake out any bugs that are lying around. >> >> You are right. Clone git, run it nightly automated and please, please >> report anything you find. There is no reason for a tagged release for >> that. Consider it a custom, purpose built, build-test farm. > > Sure - I can write code to do that. But then why release a beta at all? FWIW: we also carry "official" snapshots on the download site ( https://ftp.postgresql.org/pub/snapshot/dev/) that you could use if you dont want git directly - those even receive some form of QA (for a snapshot to be posted it is required to pass a full buildfarm run on the buildbox). Stefan
On 05/30/2015 11:47 PM, Andres Freund wrote: > I don't think it's primarily a problem of lack of review; although that > is a large problem. I think the biggest systematic problem is that the > compound complexity of postgres has increased dramatically over the > years. Features have added complexity little by little, each not > incrementally not looking that bad. But very little has been done to > manage complexity. Since 8.0 the codesize has roughly doubled, but > little has been done to manage the increased complexity. Few new > abstractions have been introduced and the structure of the code is > largely the same. > > As a somewhat extreme example, let's look at StartupXLOG(). In 8.0 it > was ~500 LOC, in master it's ~1500. The interactions in 8.0 were > complex, they have gotten much more complex since. It fullfills lots of > different roles, all in one function: > > (roughly in the order things happen, but simplified) > * Read the control file/determine whether we crashed > * recovery.conf handling > * backup label handling > * tablespace map handling (huh, I missed that this was added directly to > StartupXLOG. What a bad idea) > * Determine whether we're doing archive recovery, read the relevant > checkpoint if so > * relcache init file removal > * timeline switch handling > * Loading the checkpoint we're starting from > * Initialization of a lot of subsystems > * crash recovery/replay > * Including pgstat, unlogged table, exported snapshot handling > * iff hot standby, some more subsystems are initialized here > * hot standby state handling > * replay process intialization > * crash replay itself, including > * progress tracking > * recovery pause handling > * nextxid tracking > * timeline increase handling > * hot standby state handling > * unlogged relations handling > * archive recovery handling > * creation/initialization of the end of recovery checkpoint > * timeline increment if failover > * subsystem initialization iff !hot_standby > * end of recovery actions > > Yes. that's one routine. And, to make things even funnier, half of that > routine isn't exercised by our tests. > > You can argue that this is an outlier, but I don't think so. Heapam, the > planner, etc. have similar cases. > > And I think this, to some degree, explains a lot of the multixact > problems. While there were a few "simple bugs", most of them were > interactions between the various subsystems that are rather intricate. I think this explanation is wrong. I agree that there are many places that would be good to refactor - like StartupXLOG() - but the multixact code was not too bad in that regard. IIRC the patch included some refactoring, it added some new helper functions in heapam.c, for example. You can argue that it didn't do enough of it, but that was not the big issue. The big issue was at the architecture level. Basically, we liked vacuuming of XIDs and clog so much that we decided that it'd be nice if you had to vacuum multixids too, in order to not lose data. Many of the bugs and issues were not new - we had multixids before - but we upped the ante and turned minor locking bugs into data loss. And that had nothing to do with the code structure - we'd have similar issues if we had rewritten everything java, with the same design. So, I'm all for refactoring and adding abstractions where it makes sense, but it's not going to solve design problems. - Heikki
On 2015-06-04 11:51:44 +0300, Heikki Linnakangas wrote: > I think this explanation is wrong. I agree that there are many places that > would be good to refactor - like StartupXLOG() - but the multixact code was > not too bad in that regard. IIRC the patch included some refactoring, it > added some new helper functions in heapam.c, for example. You can argue that > it didn't do enough of it, but that was not the big issue. Yea, but the bugs were more around the interactions to other parts of the system. Like e.g. crash recovery, which now is about bug 7 or so. And those are the ones that are hard to understand. > The big issue was at the architecture level. Basically, we liked vacuuming > of XIDs and clog so much that we decided that it'd be nice if you had to > vacuum multixids too, in order to not lose data. Many of the bugs and issues > were not new - we had multixids before - but we upped the ante and turned > minor locking bugs into data loss. And that had nothing to do with the code > structure - we'd have similar issues if we had rewritten everything java, > with the same design. I think we're probably just using slightly different terms here - for me one very good way of fixing some structurally bad things *is* improving the design. If you look at the bugs around multixacts: The first few were around ctid-chaining, hard to find and fix because there's about 8-10 places implementing it with slight differences. The next bunch were around vacuuming, some of them oversights, a good bunch of them more fundamental. Crash recovery wasn't thought about (lack of testing/review), and more generally the new code tripped over bad old decisions (hey, wraparound is ok!). Then there were a bunch of stupid bugs in crash-recovery (testing mainly), and larger scale bugs (hey, let's access stuff during recovery). Then there's the whole row level locking code - which is by now among the hardest to understand code in postgres - and voila it contained a bunch of oversights that were hard to spot. So yes, I think nicer code to work with would have prevented us from making a significant portion of these. It might have also made us realize earlier how significant the increase in complexity was. > So, I'm all for refactoring and adding abstractions where it makes sense, > but it's not going to solve design problems. I personally don't really see the multixact changes being that bad on the overall design. It pretty much just extended an earlier design. Now that wasn't great, but I don't think too many people had realized that at that point. The biggest problem was underestimating the complexity. Greetings, Andres Freund
On 06/04/2015 12:17 PM, Andres Freund wrote: > On 2015-06-04 11:51:44 +0300, Heikki Linnakangas wrote: >> So, I'm all for refactoring and adding abstractions where it makes sense, >> but it's not going to solve design problems. > > I personally don't really see the multixact changes being that bad on > the overall design. It pretty much just extended an earlier design. Now > that wasn't great, but I don't think too many people had realized that > at that point. The biggest problem was underestimating the complexity. Yeah, many of the issues were pre-existing, and would've been good to fix anyway. The multixact issues remind me of the another similar thing we did: the visibility map. It too was non-critical when it was first introduced, but later we started using it for index-only-scans, and it suddenly became important that it's up-to-date and crash-safe. We did uncover some bugs in that area when index-only-scans were introduced, similar to the multixact bugs, only not as bad because it didn't lead to data loss. I don't have any point to make with that comparison, but it was similar in many ways. - Heikki
On 30 May 2015 at 05:08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
--
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, May 29, 2015 at 6:33 PM, Andres Freund <andres@anarazel.de> wrote:
>> Why? A large portion of the input required to go from beta towards a
>> release is from actual users. To see when things break, what confuses
>> them and such.
> I have two concerns:
> 1. I'm concerned that once we release beta, any idea about reverting a
> feature or fixing something that is broken will get harder, because
> people will say "well, we can't do that after we've released a beta".
> I confess to particularly wanting a solution to the item listed as
> "custom-join has no way to construct Plan nodes of child Path nodes",
> the history of which I'll avoid recapitulating until I'm sure I can do
> it while maintaining my blood pressure at safe levels.
> 2. Also, if we're going to make significant multixact-related changes
> to 9.5 to try to improve reliability, as you proposed on the other
> thread, then it would be nice to do that before beta, so that it gets
> tested. Of course, someone is bound to point out that we could make
> those changes in time for beta2, and people could test that. But in
> practice I think that'll just mean that stuff is only out there for
> let's say 2 months before we put it in a major release, which ain't
> much.
I think your position is completely nuts. The GROUPING SETS code is
desperately in need of testing. The custom-plan code is desperately
in need of fixing and testing. The multixact code is desperately
in need of testing. The open-items list has several other problems
besides those. All of those problems are independent. If we insist
on tackling them serially rather than in parallel, 9.5 might not come
out till 2017.
I agree that we are not in a position to promise features won't change.
So let's call it an alpha not a beta --- but for heaven's sake let's
try to move forward on all these issues, not just some of them.
I think releasing 9.5 in some form NOW will aid its software quality.
We've never linked Beta release date to final release date, so if the quality proves to be as poor as some people think then the list of bugs will show that and we release later.
AFAIK beta period is exactly the time when we are allowed to pull features from the release. I welcome the idea that we test it, if its stable and it works we release it. If doesn't, we pull it.
Not releasing our software yet making a list of our fears doesn't work towards a solution. Our fears will make us shout at each other too, so I for one would rather skip that part and do some practical actions.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Josh, * Josh Berkus (josh@agliodbs.com) wrote: > I would argue that if we delay 9.5 in order to do a 100% manual review > of code, without adding any new automated tests or other non-manual > tools for improving stability, then it's a waste of time; we might as > well just release the beta, and our users will find more issues than we > will. I am concerned that if we declare a cleanup period, especially in > the middle of the summer, all that will happen is that the project will > go to sleep for an extra three months. This is the exact same concern that I have. A delay just to have a delay is not useful. I completely agree that we need more automated testing, etc, though getting all of that set up and running could be done at any time too- there's no reason to wait, nor do I believe delaying 9.5 would make such automated testing appear. Thanks! Stephen
On 4 June 2015 at 22:43, Stephen Frost <sfrost@snowman.net> wrote:
* Logical decoding streaming testing, comparing to expected decoded outputJosh,
* Josh Berkus (josh@agliodbs.com) wrote:
> I would argue that if we delay 9.5 in order to do a 100% manual review
> of code, without adding any new automated tests or other non-manual
> tools for improving stability, then it's a waste of time; we might as
> well just release the beta, and our users will find more issues than we
> will. I am concerned that if we declare a cleanup period, especially in
> the middle of the summer, all that will happen is that the project will
> go to sleep for an extra three months.
This is the exact same concern that I have. A delay just to have a
delay is not useful. I completely agree that we need more automated
testing, etc, though getting all of that set up and running could be
done at any time too- there's no reason to wait, nor do I believe
delaying 9.5 would make such automated testing appear.
In terms of specific testing improvements, things I think we need to have covered and runnable on the buildfarm are:
* pg_dump and pg_restore testing (because it's scary we don't do this)
* WAL archiving based warm standby testing with promotion
* Two node streaming replication with promotion, both with a slot and with archive fallback
* Three node cascading streaming replication with middle node promotion then tail end node promotion
* DDL deparse test coverage for all operations
* pg_basebackup + start up from backup
* hard-kill the postmaster, start up from crashed datadir
* pg_start_backup, rsync, pg_stop_backup, start up in hot standby
* disk exhaustion tests both for pg_xlog and for the main datadir, showing we can recover OK when disk is filled then space is freed
* Tests of crash recovery during various DDL operations
Obviously some of these overlap, so one test can cover more than one item.
Implementing these requires stepping outside the comfortable zone of pg_regress and the isolationtester and having something that can manage multiple data directories. It's also hard to be sure you're testing the same thing each time - for example, when using streaming replication with archive fallback, it might be tricky to ensure that your replica falls behind and falls back to WAL archive each time. There's always SIGSTOP I guess.
While these are multi-node tests, at least in PostgreSQL we can just run on different ports, so there's no need to muck about with containers or VMs.
I already run some of these tests using Ansible for BDR, but I don't imagine that'd be acceptable in core. It's Python, and it's not especially well suited to use as a regression testing framework, it's just what I had to hand and already needed for other automation tasks.
Is pg_tap a reasonable starting point for this sort of testing?
Am I missing obvious and important tests?
How would a test that would've caught the multixact issues look?
On Fri, Jun 5, 2015 at 8:53 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > > > On 4 June 2015 at 22:43, Stephen Frost <sfrost@snowman.net> wrote: >> >> Josh, >> >> * Josh Berkus (josh@agliodbs.com) wrote: >> > I would argue that if we delay 9.5 in order to do a 100% manual review >> > of code, without adding any new automated tests or other non-manual >> > tools for improving stability, then it's a waste of time; we might as >> > well just release the beta, and our users will find more issues than we >> > will. I am concerned that if we declare a cleanup period, especially in >> > the middle of the summer, all that will happen is that the project will >> > go to sleep for an extra three months. >> >> This is the exact same concern that I have. A delay just to have a >> delay is not useful. I completely agree that we need more automated >> testing, etc, though getting all of that set up and running could be >> done at any time too- there's no reason to wait, nor do I believe >> delaying 9.5 would make such automated testing appear. >> > > In terms of specific testing improvements, things I think we need to have > covered and runnable on the buildfarm are: > > * pg_dump and pg_restore testing (because it's scary we don't do this) We do test it in some way with pg_upgrade using set of objects that are not removed by the regression test suite. Extension dumps are uncovered yet though. > * WAL archiving based warm standby testing with promotion > * Two node streaming replication with promotion, both with a slot and with > archive fallback > * Three node cascading streaming replication with middle node promotion then > tail end node promotion > * Logical decoding streaming testing, comparing to expected decoded output > * hard-kill the postmaster, start up from crashed datadir > * pg_basebackup + start up from backup > * pg_start_backup, rsync, pg_stop_backup, start up in hot standby > * Tests of crash recovery during various DDL operations Well, steps in this direction are the point of this patch, the replication test suite: https://commitfest.postgresql.org/5/197/ And this one, addition of Windows support for TAP tests: https://commitfest.postgresql.org/5/207/ > * DDL deparse test coverage for all operations What do you have in mind except what is already in objectaddress.sql and src/test/modules/test_dll_deparse/? > * disk exhaustion tests both for pg_xlog and for the main datadir, showing > we can recover OK when disk is filled then space is freed This may be tricky. How would you emulate that? > Is pg_tap a reasonable starting point for this sort of testing? IMO, using the TAP machinery would be a good base for that. What lacks is a basic set of perl routines that one can easily use to set of test scenarios. > How would a test that would've caught the multixact issues look? I have not followed closely those discussions, not sure about that. Regards, -- Michael
On 3 June 2015 at 18:21, Josh Berkus <josh@agliodbs.com> wrote:
--
I would argue that if we delay 9.5 in order to do a 100% manual review
of code, without adding any new automated tests or other non-manual
tools for improving stability, then it's a waste of time; we might as
well just release the beta, and our users will find more issues than we
will. I am concerned that if we declare a cleanup period, especially in
the middle of the summer, all that will happen is that the project will
go to sleep for an extra three months.
Agreed. Cleanup can occur while we release code for public testing.
Many eyeballs of Beta beats anything we can throw at it thru manual inspection. The whole problem of bugs is that they are mostly found by people trying to use the software.
I will also point out that there is a major adoption cost to delaying
9.5. Right now users are excited about UPSERT, big data, and extra
JSON features. If they have to wait another 7 months, they'll be a lot
less excited, and we'll lose more potential users to the new databases
and the MySQL forks.
Reliability of having a release every year is important as well as
database reliability ... and for a lot of the new webdev generation,
PostgreSQL is already the most reliable piece of software infrastructure
they use. So if we're going to have a cleanup delay, then let's please
make it an *intensive* cleanup delay, with specific goals, milestones,
and a schedule. Otherwise, don't bother.
We've decided previously that having a fixed annual schedule was a good thing for the project. Getting the features that work into the hands of the people that want them is very important.
Discussing halting the development schedule publicly is very damaging.
If there are features in doubt, lets do more work on them or just pull them now and return to the schedule. I don't really care which ones get canned as long as we return to the schedule.
Whatever we do must be exact and measurable. If its not, it means we haven't assembled enough evidence for action that is sufficiently directed to achieve the desired goal.
On 3 June 2015 at 18:21, Josh Berkus <josh@agliodbs.com> wrote:
It could also delay the BDR project (Simon/Craig
can speak to this) which would suck.
Nothing being discussed here can/will slow down the BDR project since it is already a different thread of development. More so, 2ndQuadrant has zero income tied to the release of 9.5 or the commit of any feature, so as far as that company is concerned, the release could wait for 10 years.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 3 June 2015 at 14:50, Noah Misch <noah@leadboat.com> wrote:
--
Subject changed from "Re: [CORE] postpone next week's release".
On Sat, May 30, 2015 at 10:48:45PM -0400, Bruce Momjian wrote:
> Well, I think we stop what we are doing, focus on restructuring,
> testing, and reviewing areas that historically have had problems, and
> when we are done, we can look to go to 9.5 beta. What we don't want to
> do is to push out more code and get back into a
> wack-a-bug-as-they-are-found mode, which obviously did not serve us well
> for multi-xact, and which is what releasing a beta will do, and of
> course, more commit-fests, and more features.
>
> If we have to totally stop feature development until we are all happy
> with the code we have, so be it. If people feel they have to get into
> cleanup mode or they will never get to add a feature to Postgres again,
> so be it. If people say, heh, I am not going to do anything and just
> come back when cleanup is done (by someone else), then we will end up
> with a smaller but more dedicated development team, and I am fine with
> that too. I am suggesting that until everyone is happy with the code we
> have, we should not move forward.
I like the essence of this proposal. Two suggestions. We can't achieve or
even robustly measure "everyone is happy with the code," so let's pick
concrete exit criteria. Given criteria framed like "Files A,B,C and patches
X,Y,Z have a sign-off from a committer other than their original committer."
anyone can monitor progress and find specific ways to contribute.
I don't like the proposal, nor do I like the follow on comments made.
This whole idea of "feature development" vs reliability is bogus. It implies people that work on features don't care about reliability. Given the fact that many of the features are actually about increasing database reliability in the event of crashes and corruptions it just makes no sense.
How will we participate in cleanup efforts? How do we know when something has been "cleaned up", how will we measure our success or failure? I think we should be clear that wasting N months on cleanup can *fail* to achieve a useful objective. Without a clear plan it almost certainly will do so. The flip side is that wasting N months will cause great amusement and dancing amongst those people who wish to pull ahead of our open source project and we should take care not to hand them a victory from an overreaction.
Lastly, the idea that we allow developers to drift away and we're OK with that is just plain mad. I've spent a decade trying to grow the pool of skilled developers who can assist the project. Acting against that, in deed or just word, is highly counter productive for the project.
Let's just take a breath and think about this.
It is normal for us to spend a month or so consolidating our work. It is also normal for people that see major problems to call them out, effectively using the "Stop The Line" technique. https://leanbuilds.wordpress.com/tag/stop-the-line/
So lets do our normal things, not do a "total stop" for an indefinite period. If someone has specific things that in their opinion need to be addressed, list them and we can talk about doing them, together. I thought that was what the Open Items list was for. Let's use it.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jun 5, 2015 at 2:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Agreed. Cleanup can occur while we release code for public testing. The code is available for public testing right now. Stamping it a beta implies that we think it's something fairly stable that we'd be pretty happy to release if things go well, which is a higher bar to clear. I can't help noticing for all the drumbeat of "let's release 9.5 beta now", activity to clean up the items on this list seems quite sluggish: https://wiki.postgresql.org/wiki/PostgreSQL_9.5_Open_Items I've seen Tom and a few other people doing some work that I would describe as useful pre-beta stabilization, but I think there is a good bit more that could be done, and that list is a good starting point. I hope to have time to do some myself, but right now I am busy trying to stabilize 9.3, along with Alvaro, Noah, Andres, and Thomas Munro, and PGCon is coming up in just over a week. I think we could afford to give ourselves at least until a few weeks following PGCon to tidy up. I do agree that an indefinite development freeze with unclear parameters for resuming development and unclear goals is a bad plan. But I think giving ourselves a little more time to, say, turn the buildfarm consistently green, and, say, fix the known but currently-unfixed multixact bugs, and, say, fix the known bugs in 9.5 features is a good plan, and I hope you and others will support it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Jun 5, 2015 at 2:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> Agreed. Cleanup can occur while we release code for public testing. > The code is available for public testing right now. Only to people who have the time and ability to pull the code from git and build from source. I don't know exactly what fraction of interested testers that excludes, but I bet it's significant. The point of producing packages would be to remove that barrier to testing. > Stamping it a > beta implies that we think it's something fairly stable that we'd be > pretty happy to release if things go well, which is a higher bar to > clear. So let's call it an alpha, or some other way of setting expectations appropriately. But I think it's silly to maintain that the code is not in a state where end-user testing is useful. They just have to understand that they can't trust it with production data. > I can't help noticing for all the drumbeat of "let's release 9.5 beta > now", activity to clean up the items on this list seems quite > sluggish: > https://wiki.postgresql.org/wiki/PostgreSQL_9.5_Open_Items While we need to work on those items, I do not agree that getting that list to empty has to happen before we release a test version. I think serializing effort in that way is simply bad project management. And it's not how we've operated in the past either: getting the open items list to empty has always been understood as a prerequisite to RC versions, not to betas. To get to specifics instead of generalities: exactly which of the current open items do you think is so bad that it precludes user testing? I do not see a beta-blocker in the lot. regards, tom lane
On Fri, Jun 5, 2015 at 07:50:31AM +0100, Simon Riggs wrote: > On 3 June 2015 at 18:21, Josh Berkus <josh@agliodbs.com> wrote: > > > I would argue that if we delay 9.5 in order to do a 100% manual review > of code, without adding any new automated tests or other non-manual > tools for improving stability, then it's a waste of time; we might as > well just release the beta, and our users will find more issues than we > will. I am concerned that if we declare a cleanup period, especially in > the middle of the summer, all that will happen is that the project will > go to sleep for an extra three months. > > > Agreed. Cleanup can occur while we release code for public testing. > > Many eyeballs of Beta beats anything we can throw at it thru manual inspection. > The whole problem of bugs is that they are mostly found by people trying to use > the software. Please address some of the specific issues I mentioned. The problem with the multi-xact case is that we just kept fixing bugs as people found them, and did not do a holistic review of the code. I am saying let's not _keep_ doing that and let's make sure we don't have any systematic problems in our code where we just keep fixing things without doing a thorough analysis. To release 9.5 beta would be to get back into that cycle, and I am not sure we are ready for that. I think the fact we have multiple people all reviewing the multi-xact code now (and not dealing with 9.5) is a good sign. If we were focused on 9.5 beta, I doubt this would have happened. I am saying let's make sure we are not deficient in other areas, then let's move forward again. I would love to think we can do multiple things at once, but for multi-xact, serious review didn't happen for 18 months, so if slowing release development is what is required, I support it. > We've decided previously that having a fixed annual schedule was a good thing > for the project. Getting the features that work into the hands of the people > that want them is very important. Yes, but let's not be a slave to the schedule if our reliability is suffering, which it clearly has in the past 18 months. > Discussing halting the development schedule publicly is very damaging. Agreed. > If there are features in doubt, lets do more work on them or just pull them now > and return to the schedule. I don't really care which ones get canned as long > as we return to the schedule. Again, please address my concerns above. This is not about 9.5 features, but rather our overall focus on schedule vs. reliability, and your arguments are reinforcing my idea that we do not have the proper balance here. > Whatever we do must be exact and measurable. If its not, it means we haven't > assembled enough evidence for action that is sufficiently directed to achieve > the desired goal. Sure. I think everyone agrees the multi-xact work is all good, so I am asking what else needs this kind of research. If there is nothing else, we can move forward again --- I am just saying we need to ask the reliability question _first_. Let me restate something that has appeared in many replies to my ideas --- I am not asking for infinite or unbounded review, but I am asking that we make sure reliability gets the proper focus in relation to our time pressures. Our balance was so off a month ago that I feel only a full stop on time pressure would allow us to refocus because people are not good at focusing on multiple things. It is sometimes necessary to stop everything to get people's attention, and to help them remember that without reliability, a database is useless. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On 5 June 2015 at 15:00, Robert Haas <robertmhaas@gmail.com> wrote:
We don't have a clear definition of what Beta means. For me, Beta has always meant "trial software, please test".
--
On Fri, Jun 5, 2015 at 2:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Agreed. Cleanup can occur while we release code for public testing.
The code is available for public testing right now.
People test when they get the signal from us, not before. While what you say is literally correct, that is not the point.
Stamping it a
beta implies that we think it's something fairly stable that we'd be
pretty happy to release if things go well, which is a higher bar to
clear.
I don't think anybody will say anything bad about us if we release a beta and then later pull some of the features because we are not confident with them when AFTER testing the feature is shown to be below our normal standard; that will bring us credit, I feel. It is extremely common in software development to defer some of the features if their goals aren't met, or to change APIs and interfaces based upon user feedback.
Making decisions on what will definitely be in a release BEFORE testing and feedback seems foolhardy and certainly not scientific.
None of this means I disagree with assessments of the current state of the software, I'm saying that we should simply follow the normal process and stick to the schedule we have previously agreed, for all of the reasons cited when we agreed it.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jun 5, 2015 at 10:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Fri, Jun 5, 2015 at 2:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> Agreed. Cleanup can occur while we release code for public testing. > >> The code is available for public testing right now. > > Only to people who have the time and ability to pull the code from git > and build from source. I don't know exactly what fraction of interested > testers that excludes, but I bet it's significant. The point of producing > packages would be to remove that barrier to testing. Sure, I agree with that. >> Stamping it a >> beta implies that we think it's something fairly stable that we'd be >> pretty happy to release if things go well, which is a higher bar to >> clear. > > So let's call it an alpha, or some other way of setting expectations > appropriately. But I think it's silly to maintain that the code is not in > a state where end-user testing is useful. They just have to understand > that they can't trust it with production data. I don't maintain that end-user testing is unuseful at this point. I do maintain that it would be better to (1) finish fixing the known multixact bugs and (2) clean up some of the open items before we make a big push in that direction. For example, consider this item from the open items list: http://www.postgresql.org/message-id/CAHGQGwEqWD=yNQE+ZojbpoxyWT3xLK52-V_q9S+XOfCKJd5egA@mail.gmail.com Now this is a fundamental definitional issue about how RLS is supposed to work. I'm not going to deny that we COULD ship a release without deciding what the behavior should be there, but I don't think it's a good idea. I am fine with the possibility that one of our new features may, say, dump core someplace due to a NULL pointer deference we haven't found yet. Such bugs can always exist, but they are easily fixed once found. But if we're not clear on how a feature is supposed to behave, which seems to be the case here, I favor trying to resolve that issue before shipping anything. Otherwise, we're saying "test this, even though the final version will likely work differently". That's not really helpful for us and will discourage testers from doing anything at all. Going through the open items, the other ones that seem to involve definitional changes are: 1. FPW compression leaks information - The usefulness of the GUC may depend on its PGC_*-ness. We should decide what we want to do before asking people to test it. 2. custom-join has no way to construct Plan nodes of child Path nodes - The entire feature is a C API, and the API needs to be changed. We should finalize the API before asking people to test whether they can use it for interesting things. 3. recovery_target_action = pause & hot_standby = off - Rumor has it we replaced one surprising behavior with a different but equally-surprising behavior. We should decide what the right thing is and make sure the code is doing that before calling it a release. 4. Arguable RLS security bug, EvalPlanQual() paranoia - This seems like another question of what the expectations around RLS actually are. I would also argue that we really ought to make a decision about "basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe" before we get too close to final release. Maybe it's not a beta-blocker, exactly, but it doesn't seem like the sort of change that should be rushed in too close to the end, because it looks sorta complicated and scary. (Those are the technical terms.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 5 June 2015 at 15:00, Robert Haas <robertmhaas@gmail.com> wrote:
--
I do agree that an indefinite development freeze with unclear
parameters for resuming development and unclear goals is a bad plan.
But I think giving ourselves a little more time to, say, turn the
buildfarm consistently green, and, say, fix the known but
currently-unfixed multixact bugs, and, say, fix the known bugs in 9.5
features is a good plan, and I hope you and others will support it.
Yes, its a good plan and I support that. That's just normal process.
If you mean we should allow that to stall the release of Beta then I disagree. The presence of bugs clearly has nothing to do with the discovery of new ones and we should be looking to discover as many as possible as quickly as possible.
I can understand the argument to avoid releasing Beta because of Dev Meeting, so we should aim for June 25th Beta 1.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2015-06-05 11:05:14 -0400, Bruce Momjian wrote: > To release 9.5 beta would be to get back into that cycle, and I am not > sure we are ready for that. I think the fact we have multiple people > all reviewing the multi-xact code now (and not dealing with 9.5) is a > good sign. If we were focused on 9.5 beta, I doubt this would have > happened. At least form me that I'm working on multixacts right now has nothing to do with to beta or not to beta. And I don't understand why releasing an alpha or beta would detract from that right now. We need more people doing crazy shit with our codebase, not fewer. None of the master-only issues is a blocker for an alpha, so besides some release work within the next two weeks I don't see what'd detract us that much? > I am saying let's make sure we are not deficient in other areas, then > let's move forward again. I don't think we actually can do that. The problem of the multixact stuff is precisely that it looked so innocent that a bunch of experienced people just didn't see the problem. Omniscience is easy in hindsight. > I would love to think we can do multiple things at once, but for > multi-xact, serious review didn't happen for 18 months, so if slowing > release development is what is required, I support it. FWIW, I can stomach a week or four of doing bugfix only stuff. After that I'm simply not going to be efficient at that anymore. And I seriously doubt that I'm the only one like that. Doing the same thing for weeks makes you miss obvious stuff. I don't think anything as localized as 'do nothing but bugfixes for a while and then carry on' actually will solve the problem. We need to find and reallocate resources to put more emphasis on review, robustness and refactoring in the long term, not do panick-y stuff short term. This isn't a problem that can be solved by focusing on bugfixing for a week or four. That means we have to convince employers to actually *pay* us (people experienced with the codebase) to do work on these kind of things instead of much-easier-to-market new features. A lot of review/robustness work has been essentially done in our spare time, after long days. Which means the employers need to get more people. > Sure. I think everyone agrees the multi-xact work is all good, so I am > asking what else needs this kind of research. If there is nothing else, > we can move forward again --- I am just saying we need to ask the > reliability question _first_. I'm starting to get grumpy here. You've called for review in lots of emails now. Let's get going then? Greetings, Andres Freund
Robert Haas <robertmhaas@gmail.com> writes: > I don't maintain that end-user testing is unuseful at this point. I > do maintain that it would be better to (1) finish fixing the known > multixact bugs and (2) clean up some of the open items before we make > a big push in that direction. For example, consider this item from > the open items list: > http://www.postgresql.org/message-id/CAHGQGwEqWD=yNQE+ZojbpoxyWT3xLK52-V_q9S+XOfCKJd5egA@mail.gmail.com > Now this is a fundamental definitional issue about how RLS is supposed > to work. I'm not going to deny that we COULD ship a release without > deciding what the behavior should be there, but I don't think it's a > good idea. I am fine with the possibility that one of our new > features may, say, dump core someplace due to a NULL pointer deference > we haven't found yet. Such bugs can always exist, but they are easily > fixed once found. But if we're not clear on how a feature is supposed > to behave, which seems to be the case here, I favor trying to resolve > that issue before shipping anything. Otherwise, we're saying "test > this, even though the final version will likely work differently". > That's not really helpful for us and will discourage testers from > doing anything at all. The other side of that coin is that we might get useful comments from testers on how the feature ought to work. I don't agree with the notion that all feature details must be graven on stone tablets before we start trying to get feedback from people outside the core development community. The same point applies to the FDW C API questions, or to RLS, or to the "expanded objects" work that I did. (I'd really love it if the PostGIS folk would try to use that sometime before it's too late to adjust the definition...) Now, you could argue that people likely to have useful input on those issues are fully capable of working with git tip, and you'd probably be right, but would they do so? As Simon says nearby, publishing an alpha/beta/whatever is our signal to the wider community that it's time for them to start paying attention. I do not think they will look at 9.5 until we do that; and I think it'll be our loss if they don't start looking at these things soon. regards, tom lane
Michael Paquier wrote: > On Fri, Jun 5, 2015 at 8:53 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > > In terms of specific testing improvements, things I think we need to have > > covered and runnable on the buildfarm are: > > > > * pg_dump and pg_restore testing (because it's scary we don't do this) > > We do test it in some way with pg_upgrade using set of objects that > are not removed by the regression test suite. Extension dumps are > uncovered yet though. We could put more emphasis on having objects of all kinds remain in the regression database, so that the pg_upgrade test covers more of this. What happened with the extension tests patches you submitted? They seemed valuable to me, but I lost track. > > * DDL deparse test coverage for all operations > > What do you have in mind except what is already in objectaddress.sql > and src/test/modules/test_dll_deparse/? The current SQL scripts in that test do not cover all possible object types, so there's a lot of the decoding capabilities that are currently not exercised. So one way to attack this would be to add more object types to those files. However, a completely different way is to have the test process serial_schedule from src/test/regress and run everything in there under deparse. That would be even more useful, because whenever some future DDL is added, we will automatically get coverage. > > How would a test that would've caught the multixact issues look? > > I have not followed closely those discussions, not sure about that. One issue with these bugs is that unless you use things such as pg_burn_multixact, producing large enough numbers of multixacts takes a long time. I've wondered if we could somehow make those easier to reproduce by lowering the range, and thus doing thousands of wraparounds, freezing and truncations in reasonable time. (For example, change the typedefs to uint16 rather than uint32). But then the issue becomes that the test code is not exactly equivalent to the production code, which could cause its own bugs. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2015-06-05 11:20:52 -0400, Robert Haas wrote: > I don't maintain that end-user testing is unuseful at this point. Unless I misunderstand you, and you're not saying that user level testing wouldn't be helpful right now, I'm utterly baffled. There's loads of user-exposed features that desperately need exposure. Looking at https://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL I don't see a single item that correlates with the ones on the open items list list. Sure, it's incomplete. But that's a lot of stuff to test already. And the authors of those features can work on fixing the issues coming up. Lots of those features have barely got any testing at this point. > do maintain that it would be better to (1) finish fixing the known > multixact bugs and (2) clean up some of the open items before we make > a big push in that direction. There's maybe 3-4 people that can actually do something about the existing issues on that list. The community is far bigger than that. Right now everyone is sitting on the sidelines and twiddling their thumbs or developing new stuff. At least that's my impression. > 2. custom-join has no way to construct Plan nodes of child Path nodes > - The entire feature is a C API, and the API needs to be changed. We > should finalize the API before asking people to test whether they can > use it for interesting things. I think any real world exposure of that API will result in much larger changes than that. > 3. recovery_target_action = pause & hot_standby = off - Rumor has it > we replaced one surprising behavior with a different but > equally-surprising behavior. We should decide what the right thing is > and make sure the code is doing that before calling it a release. Fujii pushed the bugfix, restoring the old behaviour afaics. It's imo still crazy, but at this point it doesn't look like a 9.5 discussion. > 4. Arguable RLS security bug, EvalPlanQual() paranoia - This seems > like another question of what the expectations around RLS actually > are. In the end that's minor from the end user's perspective. > I would also argue that we really ought to make a decision about > "basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe" > before we get too close to final release. Maybe it's not a > beta-blocker, exactly, but it doesn't seem like the sort of change > that should be rushed in too close to the end, because it looks sorta > complicated and scary. (Those are the technical terms.) Yea, I'd really like to get that in at some point. I'll work on it as soon I've finished the multixact truncation thingy. Greetings, Andres Freund
On Fri, Jun 5, 2015 at 05:36:41PM +0200, Andres Freund wrote: > I don't think anything as localized as 'do nothing but bugfixes for a > while and then carry on' actually will solve the problem. We need to > find and reallocate resources to put more emphasis on review, robustness > and refactoring in the long term, not do panick-y stuff short term. This > isn't a problem that can be solved by focusing on bugfixing for a week > or four. Fine. We just need that refocus, and people usually can't refocus while they are worried about other pressures, e.g. time --- its like trying to adjust the GPS while driving --- not easy. > That means we have to convince employers to actually *pay* us (people > experienced with the codebase) to do work on these kind of things > instead of much-easier-to-market new features. A lot of > review/robustness work has been essentially done in our spare time, > after long days. Which means the employers need to get more people. Agreed --- that is a serious long-term need. > > Sure. I think everyone agrees the multi-xact work is all good, so I am > > asking what else needs this kind of research. If there is nothing else, > > we can move forward again --- I am just saying we need to ask the > > reliability question _first_. > > I'm starting to get grumpy here. You've called for review in lots of > emails now. Let's get going then? I really don't know. If people say we don't have anything like multi-xact that we have avoided, then I have no further concerns. I am asking that such decisions be made independent of external time pressures. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On 5 June 2015 at 16:05, Bruce Momjian <bruce@momjian.us> wrote:
--
Please address some of the specific issues I mentioned.
I can discuss them but not because I am involved directly. I take responsibility as a committer and have an interest from that perspective.
In my role at 2ndQuadrant, I approved all of the time Alvaro and Andres spent on submitting, reviewing and fixing bugs - at this point that has cost something close to fifty thousand dollars just on this feature and subsequent actions. (I believe the feature was originally funded, but we never saw a penny of that, though others did.)
The problem
with the multi-xact case is that we just kept fixing bugs as people
found them, and did not do a holistic review of the code.
I observed much discussion and review. The bugs we've had have all been fairly straightforwardly fixed. There haven't been any design-level oversights or head-palm moments. It's complex software that had complex behaviour that caused problems. The problem has been that anything on-disk causes more problems when errors occur. We should review carefully anything that alters the way on-disk structures work, like the WAL changes, UPSERTs new mechanism etc..
From my side, it is only recently I got some clear answers to my questions about how it worked. I think it is very important that major features have extensive README type documentation with them so the underlying principles used in the development are clear. I would define the measure of a good feature as whether another committer can read the code comments and get a good feel. A bad feature is one where committers walk away from it, saying I don't really get it and I can't read an explanation of why it does that. Tom's most significant contribution is his long descriptive comments on what the problem is that need to be solved, the options and the method chosen. Clarity of thought is what solves bugs.
Overall, I don't see the need to stop the normal release process and do a holistic review. But I do think we should check each feature to see whether it is fully documented or whether we are simply trusting one of us to be around to fix it.
I am just saying we need to ask the
reliability question _first_.
Agreed
Let me restate something that has appeared in many replies to my ideas
--- I am not asking for infinite or unbounded review, but I am asking
that we make sure reliability gets the proper focus in relation to our
time pressures. Our balance was so off a month ago that I feel only a
full stop on time pressure would allow us to refocus because people are
not good at focusing on multiple things. It is sometimes necessary to
stop everything to get people's attention, and to help them remember
that without reliability, a database is useless.
Here, I think we are talking about different types of reliability. PostgreSQL software is well ahead of most industry measures of quality; these recent bugs have done nothing to damage that, other than a few people woke up and said "Wow! Postgres had a bug??!?!?". The presence of bugs is common and if we have grown unused to them, we should be wary of that, though not tolerant.
PostgreSQL is now reliable in the sense that we have many features that ensure availability even in the face of software problems and bug induced corruption. Those have helped us get out of the current situations, giving users a workaround while bugs are fixed. So the impact of database software bugs is not what it once was.
Reliable delivery of new versions of software is important too. New versions often contain new features that fix real world problems, just as much as bug fixes do, hence why I don't wish to divert from the normal process and schedule.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 6/4/15 11:28 PM, Michael Paquier wrote: <list of things to test> * More configuration variations; ./configure, initdb options, and *.conf * More edge-case testing. (ie: what happens to each varlena as it approaches 1GB? 1B tables test. Etc.) * More race-condition testing, like the tool Peter used heavily during ON CONFLICT development (written by Jeff Janes?) * More non-SQL testing. For example, the logic in HeapTupleSatisfies* is quite complicated yet there's no tests dedicated to ensuring the logic is correct because it'd be extremely difficult (if not impossible) to construct those tests at a SQL level. Testing them with direct test calls to HeapTupleSatisfies* wouldn't be difficult, but we have no machinery to do C level testing. >> Is pg_tap a reasonable starting point for this sort of testing? > IMO, using the TAP machinery would be a good base for that. What lacks > is a basic set of perl routines that one can easily use to set of test > scenarios. I think Stephen was referring specifically to pgTap (http://pgtap.org/). Isn't our TAP framework just different output from pg_regress? Is there documentation on our TAP stuff? >> >How would a test that would've caught the multixact issues look? > I have not followed closely those discussions, not sure about that. I've thought about this and unfortunately I think this may be a scenario that's just too complex to completely protect against with a test. What might help though is having better testing of edge cases (such as MXID wrap) and then combining that with other forms of testing, such as pg_upgrade and streaming rep. testing. Test things like "What happens if we pg_upgrade a cluster that's in danger of wraparound?" -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com
On 6/5/15 10:39 AM, Tom Lane wrote: > The other side of that coin is that we might get useful comments from > testers on how the feature ought to work. I don't agree with the notion > that all feature details must be graven on stone tablets before we start > trying to get feedback from people outside the core development community. +1 > The same point applies to the FDW C API questions, or to RLS, or to the > "expanded objects" work that I did. (I'd really love it if the PostGIS > folk would try to use that sometime before it's too late to adjust the > definition...) Now, you could argue that people likely to have useful > input on those issues are fully capable of working with git tip, and you'd > probably be right, but would they do so? As Simon says nearby, publishing > an alpha/beta/whatever is our signal to the wider community that it's time > for them to start paying attention. I do not think they will look at 9.5 > until we do that; and I think it'll be our loss if they don't start > looking at these things soon. +1, but I also think we should have a better mechanism for soliciting user input on these things while design discussions are happening. ISTM that there's a lot of hand-waving that happens around use cases that could probably be clarified with end user input. FWIW, I don't think the blocker here is git or building from source. If someone has that amount of time to invest it's not much different than grabbing a tarball. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com
Simon Riggs wrote: > On 5 June 2015 at 15:00, Robert Haas <robertmhaas@gmail.com> wrote: > > Stamping it a beta implies that we think it's something fairly > > stable that we'd be pretty happy to release if things go well, which > > is a higher bar to clear. > > We don't have a clear definition of what Beta means. For me, Beta has > always meant "trial software, please test". I think that definition *is* the problem, actually. To me, "beta" means "trial software, please test, but final product will be very similar to what you see here". What we need to convey at this point is what you said, but I think a better word for that is "alpha". There may be more mobility in there than in a beta, in users's perception, which is the right impression we want to convey. Another point is that historically, once we've released a beta, we're pretty reluctant to bump catversion. We're not ready for that at this stage, which is one criteria that suggests to me that we're not ready for beta. So I think the right thing to do at this point is to get an alpha out, shortly after releasing upcoming minors. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jun 5, 2015 at 11:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > We don't have a clear definition of what Beta means. For me, Beta has always > meant "trial software, please test". > > I don't think anybody will say anything bad about us if we release a beta > and then later pull some of the features because we are not confident with > them when AFTER testing the feature is shown to be below our normal > standard; that will bring us credit, I feel. It is extremely common in > software development to defer some of the features if their goals aren't > met, or to change APIs and interfaces based upon user feedback. Yeah, but we usually haven't. Tom, for example, has previously not wanted to even bump catversion after beta1, which rules out a huge variety of possible fixes and interface changes. If we want to make a policy decision to change our approach, we should be up-front about that. > None of this means I disagree with assessments of the current state of the > software, I'm saying that we should simply follow the normal process and > stick to the schedule we have previously agreed, for all of the reasons > cited when we agreed it. Well, to my way of looking at it, our feature freeze was later this year than it has been in the past, so our beta will be later, too. If we want to stick with the schedule, we have to do that throughout. Our typical schedule has been a two-month final CommitFest starting on January 15th. This year we had a three month final CommitFest starting on February 15th. So we finished the last CommitFest two months later than has been typical. Typically our beta has been in early May, 1-2 months after the end of the last CommitFest. If you add the same two months to that, you get early July, which sounds reasonable, rather than early June, which sounds rushed, especially since we have an urgent need to get minor releases out the door to fix critical stability bugs right now, and then we have PGCon, during which nobody's going to be looking at anything. It sounds to me like the original plan was to put out a beta in early June, which would have been fine if we'd stuck to the traditional 2-month final CommitFest. But we didn't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 06/05/2015 07:23 AM, Tom Lane wrote: > So let's call it an alpha, or some other way of setting expectations > appropriately. But I think it's silly to maintain that the code is not in > a state where end-user testing is useful. They just have to understand > that they can't trust it with production data. Yes ... that seems like a good compromise. Frankly, I'm testing 9.5 already; having alpha packages would make that testing easier for me, and maybe possible for others. We'd need to take into account that our packagers are a bit overworked this month due to update releases ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Fri, Jun 5, 2015 at 8:51 AM, Andres Freund <andres@anarazel.de> wrote: >> 4. Arguable RLS security bug, EvalPlanQual() paranoia - This seems >> like another question of what the expectations around RLS actually >> are. > > In the end that's minor from the end user's perspective. I think that depends on what we ultimately decide to do about it, which is something that I have yet to form an opinion on (although I know we need to document the issue, at the very least). For example, one idea that Stephen and I discussed privately was making security barrier quals referencing other relations lock the referenced rows. This was an informal throwing around of ideas, but it's possible that something like that could end up happening. -- Peter Geoghegan
On Fri, Jun 5, 2015 at 7:00 AM, Robert Haas <robertmhaas@gmail.com> wrote: > I do agree that an indefinite development freeze with unclear > parameters for resuming development and unclear goals is a bad plan. > But I think giving ourselves a little more time to, say, turn the > buildfarm consistently green, and, say, fix the known but > currently-unfixed multixact bugs, and, say, fix the known bugs in 9.5 > features is a good plan, and I hope you and others will support it. FWIW, I have 3 pending bug fixes for UPSERT. While those are pretty benign issues, I'd be annoyed if they didn't get into the first 9.5 beta (or alpha, even). -- Peter Geoghegan
On Fri, Jun 5, 2015 at 04:54:56PM +0100, Simon Riggs wrote: > On 5 June 2015 at 16:05, Bruce Momjian <bruce@momjian.us> wrote: > > > Please address some of the specific issues I mentioned. > > > I can discuss them but not because I am involved directly. I take > responsibility as a committer and have an interest from that perspective. > > In my role at 2ndQuadrant, I approved all of the time Alvaro and Andres spent > on submitting, reviewing and fixing bugs - at this point that has cost > something close to fifty thousand dollars just on this feature and subsequent > actions. (I believe the feature was originally funded, but we never saw a penny > of that, though others did.) Yes, the burden has fallen heavily on Alvaro. I personally am concerned that many people were focusing on 9.5 rather than helping him. I think that was a mistake on our part and we need to take reliability problems more seriously. What has also concerned me is that there are so many 9.3/9.4 bugs in this area that few of us can even understand what was fixed when, and we are then having problems figuring out what bugs were present when analyzing bug reports. pg_upgrade has made this worse by allowing multi-xact bugs to propagate across major versions, and pg_upgrade had some multi-xact bugs of its own in early 9.3 releases. :-( > The problem > with the multi-xact case is that we just kept fixing bugs as people > found them, and did not do a holistic review of the code. > > > I observed much discussion and review. The bugs we've had have all been fairly > straightforwardly fixed. There haven't been any design-level oversights or > head-palm moments. It's complex software that had complex behaviour that caused > problems. The problem has been that anything on-disk causes more problems when > errors occur. We should review carefully anything that alters the way on-disk > structures work, like the WAL changes, UPSERTs new mechanism etc.. Agreed. However, I think a thorough review early on could have caught many of these bugs before they were reported by users. As proof, even in the past few weeks, review is finding bugs before they are found by users. > From my side, it is only recently I got some clear answers to my questions > about how it worked. I think it is very important that major features have > extensive README type documentation with them so the underlying principles used > in the development are clear. I would define the measure of a good feature as > whether another committer can read the code comments and get a good feel. A bad > feature is one where committers walk away from it, saying I don't really get it > and I can't read an explanation of why it does that. Tom's most significant > contribution is his long descriptive comments on what the problem is that need > to be solved, the options and the method chosen. Clarity of thought is what > solves bugs. Yes, I think we should have done that early-on for multi-xact, and I am hopeful we will learn to do that more often when complex features are implemented, or when we identify areas that are more complex than we thought. > Overall, I don't see the need to stop the normal release process and do a > holistic review. But I do think we should check each feature to see whether it > is fully documented or whether we are simply trusting one of us to be around to > fix it. Agreed. We just need to be honest that we are doing what we need for reliability and not allow schedule and feature pressure to cause us to skimp in this area. > I am just saying we need to ask the > reliability question _first_. > > > Agreed > > > Let me restate something that has appeared in many replies to my ideas > --- I am not asking for infinite or unbounded review, but I am asking > that we make sure reliability gets the proper focus in relation to our > time pressures. Our balance was so off a month ago that I feel only a > full stop on time pressure would allow us to refocus because people are > not good at focusing on multiple things. It is sometimes necessary to > stop everything to get people's attention, and to help them remember > that without reliability, a database is useless. > > > Here, I think we are talking about different types of reliability. PostgreSQL > software is well ahead of most industry measures of quality; these recent bugs > have done nothing to damage that, other than a few people woke up and said > "Wow! Postgres had a bug??!?!?". The presence of bugs is common and if we have > grown unused to them, we should be wary of that, though not tolerant. In going over the 9.5 commits, I was struck by a high volume of cleanups and fixes, which is good. > PostgreSQL is now reliable in the sense that we have many features that ensure > availability even in the face of software problems and bug induced corruption. > Those have helped us get out of the current situations, giving users a > workaround while bugs are fixed. So the impact of database software bugs is not > what it once was. Uh, yes, we have avoided the worst of the impact from these bugs. In my understanding, each bug has X% chance of being serious, and you might go for a long time before a serious bug is created, but the more bugs we have, the more likely that one will serious. The _volume_ of multi-xact bugs should have triggered a review much sooner. People think I want to stop feature development to review. What I am saying is that we need to stop development so we can be honest about whether we need review, and where. It is hard to be honest when time and feature pressure are on you. It shouldn't take long to make that decision as a group. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Sat, Jun 6, 2015 at 12:05 AM, Alvaro Herrera wrote: > Michael Paquier wrote: > What happened with the extension tests patches you submitted? They > seemed valuable to me, but I lost track. Those ones are registered in the queue of 9.6: https://commitfest.postgresql.org/5/187/ And this is the latest patch: http://www.postgresql.org/message-id/CAB7nPqSQr1UjZ1h8=be1wBq3mMdmM38nrjBKvBJuM--tTTY=EA@mail.gmail.com This patch extends prove_check by giving the possibility for a given utility using t/ to add extra modules in t/extra that will be installed and usable for its regression tests. This becomes more interesting considering as well that pg_upgrade could be switched to use the TAP infrastructure, where we could have modules dedicated to only the tests of pg_upgrade (supporting TAP tests on Windows is a necessary condition though before switching pg_upgrade). -- Michael
On 5 June 2015 at 17:20, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
--
Simon Riggs wrote:
> On 5 June 2015 at 15:00, Robert Haas <robertmhaas@gmail.com> wrote:
> > Stamping it a beta implies that we think it's something fairly
> > stable that we'd be pretty happy to release if things go well, which
> > is a higher bar to clear.
>
> We don't have a clear definition of what Beta means. For me, Beta has
> always meant "trial software, please test".
I think that definition *is* the problem, actually. To me, "beta" means
"trial software, please test, but final product will be very similar to
what you see here". What we need to convey at this point is what you
said, but I think a better word for that is "alpha". There may be more
mobility in there than in a beta, in users's perception, which is the
right impression we want to convey.
Another point is that historically, once we've released a beta, we're
pretty reluctant to bump catversion. We're not ready for that at this
stage, which is one criteria that suggests to me that we're not ready
for beta.
So I think the right thing to do at this point is to get an alpha out,
shortly after releasing upcoming minors.
OK, I can get behind that.
My only additional point is that it is a good idea to release an Alpha every time, not just this release.
And if its called Alpha, lets release it immediately. We can allow Alpha1, Alpha2 as needed, plus we allow catversion and file format changes between Alpha versions.
Proposed definitions
Alpha: This is trial software please actively test and report bugs. Your feedback is sought on usability and performance, which may result in changes to the features included here. Not all known issues have been resolved but work continues on resolving them. Multiple Alpha versions may be released before we move to Beta. We reserve the right to change internal API definitions, file formats and increment the catalog version between Alpha versions and Beta, so we do not guarantee and easy upgrade path from this version to later versions of this release.
Beta: This is trial software please actively test and report bugs and performance issues. Multiple Beta versions may be released before we move to Release Candidate. We will attempt to maintain APIs, file formats and catversions.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 06/06/15 21:07, Simon Riggs wrote: > On 5 June 2015 at 17:20, Alvaro Herrera <alvherre@2ndquadrant.com > <mailto:alvherre@2ndquadrant.com>> wrote: > > Simon Riggs wrote: > > On 5 June 2015 at 15:00, Robert Haas <robertmhaas@gmail.com > <mailto:robertmhaas@gmail.com>> wrote: > > > > Stamping it a beta implies that we think it's something fairly > > > stable that we'd be pretty happy to release if things go well, > which > > > is a higher bar to clear. > > > > We don't have a clear definition of what Beta means. For me, > Beta has > > always meant "trial software, please test". > > I think that definition *is* the problem, actually. To me, "beta" > means > "trial software, please test, but final product will be very > similar to > what you see here". What we need to convey at this point is what you > said, but I think a better word for that is "alpha". There may be more > mobility in there than in a beta, in users's perception, which is the > right impression we want to convey. > > Another point is that historically, once we've released a beta, we're > pretty reluctant to bump catversion. We're not ready for that at this > stage, which is one criteria that suggests to me that we're not ready > for beta. > > So I think the right thing to do at this point is to get an alpha out, > shortly after releasing upcoming minors. > > > OK, I can get behind that. > > My only additional point is that it is a good idea to release an Alpha > every time, not just this release. > > And if its called Alpha, lets release it immediately. We can allow > Alpha1, Alpha2 as needed, plus we allow catversion and file format > changes between Alpha versions. > > Proposed definitions > > Alpha: This is trial software please actively test and report bugs. > Your feedback is sought on usability and performance, which may result > in changes to the features included here. Not all known issues have > been resolved but work continues on resolving them. Multiple Alpha > versions may be released before we move to Beta. We reserve the right > to change internal API definitions, file formats and increment the > catalog version between Alpha versions and Beta, so we do not > guarantee and easy upgrade path from this version to later versions of > this release. > > Beta: This is trial software please actively test and report bugs and > performance issues. Multiple Beta versions may be released before we > move to Release Candidate. We will attempt to maintain APIs, file > formats and catversions. > > -- > Simon Riggs http://www.2ndQuadrant.com/ <http://www.2ndquadrant.com/> > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services As a 'user' I am very happy with the idea of having Alpha's, gives me a feeling that there will be less chance of problems being released in the final version. Because not only does it give more chances to test, but might encourage more people to get involved in contributing, either ideas for minor tweaks or simple patches etc. (as being not quite finished, and an expectation that minor functional changes have a possibility of being accepted for the version, if there is sufficient merit). Cheers, Gavin
On Sat, Jun 6, 2015 at 11:07 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 5 June 2015 at 17:20, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:Simon Riggs wrote:
> On 5 June 2015 at 15:00, Robert Haas <robertmhaas@gmail.com> wrote:
> > Stamping it a beta implies that we think it's something fairly
> > stable that we'd be pretty happy to release if things go well, which
> > is a higher bar to clear.
>
> We don't have a clear definition of what Beta means. For me, Beta has
> always meant "trial software, please test".
I think that definition *is* the problem, actually. To me, "beta" means
"trial software, please test, but final product will be very similar to
what you see here". What we need to convey at this point is what you
said, but I think a better word for that is "alpha". There may be more
mobility in there than in a beta, in users's perception, which is the
right impression we want to convey.
Another point is that historically, once we've released a beta, we're
pretty reluctant to bump catversion. We're not ready for that at this
stage, which is one criteria that suggests to me that we're not ready
for beta.
So I think the right thing to do at this point is to get an alpha out,
shortly after releasing upcoming minors.OK, I can get behind that.My only additional point is that it is a good idea to release an Alpha every time, not just this release.And if its called Alpha, lets release it immediately. We can allow Alpha1, Alpha2 as needed, plus we allow catversion and file format changes between Alpha versions.
If I'm not mistaken, we (Simon and me) actually discussed something else along this line a while ago that might be worth considering. That is, maybe we should consider time-based alpha releases. That is, we can just decide "we wrap an alpha every other Monday until we think we are good to go with beta". The reason for that is to get much quicker iteration on bugfixes, which would encourage people to use and test these versions. Report a bug and if it was easy enough to fix, you have a wrapped release with the fix in 2 weeks top.
This would require that we can (at least mostly) automate the wrapping of an alpha release, but I'm pretty sure we can solve that problem. We can also, I think, get a way with doing the release notes for an alpha just as a wiki page and a lot less formal than others, meaning we don't need to hold up any process for that.
Package availability would depend on platform. For those platforms where package building is more or less entirely automatic already, this could probably also be easily automated. And for those that take a lot more work, such as the Windows installers, we could just go with wrapping every other or every third alpha. As this is not a production release, I don't see why we'd need to hold some back to cover for the rest.
Proposed definitionsAlpha: This is trial software please actively test and report bugs. Your feedback is sought on usability and performance, which may result in changes to the features included here. Not all known issues have been resolved but work continues on resolving them. Multiple Alpha versions may be released before we move to Beta. We reserve the right to change internal API definitions, file formats and increment the catalog version between Alpha versions and Beta, so we do not guarantee and easy upgrade path from this version to later versions of this release.Beta: This is trial software please actively test and report bugs and performance issues. Multiple Beta versions may be released before we move to Release Candidate. We will attempt to maintain APIs, file formats and catversions.
These sound like good definitions. Might add to the beta one something like "whilst we will try to avoid it, pg_upgrade may be required between betas and from beta to rc versions".
Hi, On Sat, 2015-06-06 at 12:15 +0200, Magnus Hagander wrote: > If I'm not mistaken, we (Simon and me) actually discussed something > else along this line a while ago that might be worth considering. That > is, maybe we should consider time-based alpha releases. That is, we > can just decide "we wrap an alpha every other Monday until we think we > are good to go with beta". The reason for that is to get much quicker > iteration on bugfixes, which would encourage people to use and test > these versions. Report a bug and if it was easy enough to fix, you > have a wrapped release with the fix in 2 weeks top. +1. > Package availability would depend on platform. For those platforms > where package building is more or less entirely automatic already, > this could probably also be easily automated. When we used to release more alphas years ago, I was releasing Alpha RPMs for many platforms. I'll do it again if we keep doing it. Regards, -- Devrim GÜNDÜZ Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer Twitter: @DevrimGunduz , @DevrimGunduzTR
To play devil's advocate for a moment, is there anyone who would genuinely be prepared to download and install an alpha release who would not already have downloaded one of the nightlies? I only ask because I assume that releasing
an alpha is not zero-developer-cost and I don't believe that
there's a large
number of people who would be happy to install something that's described as being buggy and subject to change but are put off by having to type "configure" and "make".
Further, it seems to me that the number of people who won't roll their own who are useful as bug-finders is even smaller.
I get the feeling that the argument appears to be "Bruce doesn't want to release a beta, Simon wants to release something. Let's release an alpha because it's sort-of half way in between" as a consensus compromise (I'm not deliberately picking on specific people, I'm aware you're not the only two involved and arguing for either side, but you do seem to be fairly polar opposite sides of the argument :) ); I don't really believe that releasing an alpha moves anything further forward from a testing point of view, and I'm fairly sure that it will have just as dele
terious effect on bugfixing as would a beta, with the added disadvantage of the extra developer cost.
Geoff
On Sat, Jun 6, 2015 at 6:47 AM, Geoff Winkless <pgsqladmin@geoff.dj> wrote: > To play devil's advocate for a moment, is there anyone who would genuinely be prepared to download > and install an alpha release who would not already have downloaded one of the nightlies? I only ask > because I assume that releasing an alpha is not zero-developer-cost and I don't believe that > there's a large number of people who would be happy to install something that's described as being > buggy and subject to change but are put off by having to type "configure" and "make". I fit into that category and I would guess there would be others as well. Having system packages available via an "apt-get install ..." lowers the bar significantly to try things out. As an example, I installed the 9.4 beta as soon as it was available to run a smoke test and try out some of the new jsonb features. I'll be doing the same with a 9.5 alpha/beta (or whatever it's called), for both similar testing and to try out UPSERT. It's much easier to work into dev/test setups if there are system packages as it's just a config change to an existing script. Building from source would require a whole new workflow that I don't have time to incorporate. > Further, it seems to me that the number of people who won't roll their own who are useful as bug-finders is even smaller. That's probably true but they definitely won't find any bugs if they don't test at all. If it's possible to have automated packaging, even for just a subset of platforms, I think that'd be useful. Regards, -- Sehrope Sarkuni Founder & CEO | JackDB, Inc. | https://www.jackdb.com/
On Sat, Jun 6, 2015 at 6:47 AM, Geoff Winkless <pgsqladmin@geoff.dj> wrote: > To play devil's advocate for a moment, is there anyone who would genuinely > be prepared to download and install an alpha release who would not already > have downloaded one of the nightlies? I only ask because I assume that > releasing > an alpha is not zero-developer-cost and I don't believe > that > there's a large > number of people who would be happy to install something that's described as > being buggy and subject to change but are put off by having to type > "configure" and "make". This is pretty much why Peter Eisentraut gave up on doing alphas after the 9.1 cycle. Admittedly, what is being proposed here is somewhat different. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Jun 6, 2015 at 6:47 AM, Geoff Winkless <pgsqladmin@geoff.dj> wrote:
> To play devil's advocate for a moment, is there anyone who would genuinely be prepared to download
> and install an alpha release who would not already have downloaded one of the nightlies? I only ask
> because I assume that releasing an alpha is not zero-developer-cost and I don't believe that
> there's a large number of people who would be happy to install something that's described as being
> buggy and subject to change but are put off by having to type "configure" and "make".
I fit into that category and I would guess there would be others as
well. Having system packages available via an "apt-get install ..."
lowers the bar significantly to try things out.
But it also lowers the bar to the extent that you get the people who won't read the todo list and end up complaining about the things that everyone already knows about.
It's much easier to work into dev/test setups if there are system
packages as it's just a config change to an existing script. Building
from source would require a whole new workflow that I don't have time
to incorporate.
Really? You genuinely don't have time to paste, say:
mkdir -p ~/src/pgdevel
cd ~/src/pgdevel
tar xjf postgresql-snapshot.tar.bz2
mkdir bld
cd bld../postgresql-9.5devel/configure $(pg_config --configure | sed -e 's/\(pg\|postgresql[-\/]\)\(doc-\)\?9\.[0-9]*\(dev\)\?/\1\29.5dev/g')
make wor
ld
make check
make world-install
and yet you think you have enough time to provide more than a "looks like it's working" report to the developers?
(NB the sed for the pg_config line will probably need work, it looks like it should work on the two types of system I have here but I have to admit I changed the config line manually when I built it)
> Further, it seems to me that the number of people who won't roll their own who are useful as bug-finders is even smaller.
That's probably true but they definitely won't find any bugs if they
don't test at all.
If it's possible to have automated packaging, even for just a subset
of platforms, I think that'd be useful.
Well yes, automated packaging of the nightly build, that doesn't involve the developers having to stop what they're doing to write official alpha release docs or any of the other stuff that goes along with doing a release, would be zero-impact on development (assuming the developers didn't have to build or maintain the auto-packager) and therefore any return (however small) would make it worthwhile.
Fancy building (and maintaining) the auto-packaging system, and managing a mailing list for its users?
Geoff
On Sat, Jun 6, 2015 at 10:35 AM, Geoff Winkless <pgsqladmin@geoff.dj> wrote: > Really? You genuinely don't have time to paste, say: > > mkdir -p ~/src/pgdevel > cd ~/src/pgdevel > wget https://ftp.postgresql.org/pub/snapshot/dev/postgresql-snapshot.tar.bz2 > tar xjf postgresql-snapshot.tar.bz2 > mkdir bld > cd bld > ../postgresql-9.5devel/configure $(pg_config --configure | sed -e 's/\(pg\|postgresql[-> \/]\)\(doc-\)\?9\.[0-9]*\(dev\)\?/\1\29.5dev/g') > make world > make check > make world-install > > and yet you think you have enough time to provide more than a "looks like it's working" report to the developers? Adding steps to an existing process to fetch and build from source is significantly more complicated then flipping a version number. And I'm not trying to run PG's built in tests on my machine. I want to run the tests for my applications, and ideally, my applications themselves. If doing so leads me to find that something doesn't work then of course I would research and report the cause. At that point it's something that I know will directly effect me if it's not fixed! > Well yes, automated packaging of the nightly build, that doesn't involve the developers having to stop what they're doingto write official alpha release docs or any of the other stuff that goes along with doing a release, would be zero-impacton development (assuming the developers didn't have to build or maintain the auto-packager) and therefore anyreturn (however small) would make it worthwhile. > Fancy building (and maintaining) the auto-packaging system, and managing a mailing list for its users? I don't have much experience in setting things like this up so I'm not one to estimate the work load involved. If it existed though, I'd use it. Regards, -- Sehrope Sarkuni Founder & CEO | JackDB, Inc. | https://www.jackdb.com/
Robert Haas <robertmhaas@gmail.com> wrote: > Tom, for example, has previously not wanted to even bump > catversion after beta1, which rules out a huge variety of > possible fixes and interface changes. If we want to make a > policy decision to change our approach, we should be up-front > about that. What?!? There have been catversion bumps between the REL?_?_BETA1 tag and the REL?_?_0 tag for 8.2, 8.3, 9.0, 9.1, 9.3, and 9.4. (That is, it has happend on 6 of the last 8 releases.) I don't think we're talking about any policy change here. We try to avoid a catversion bump after beta if we can; we're not that reluctant to do so if needed. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 06/05/2015 08:07 PM, Bruce Momjian wrote: >> From my side, it is only recently I got some clear answers to my questions >> about how it worked. I think it is very important that major features have >> extensive README type documentation with them so the underlying principles used >> in the development are clear. I would define the measure of a good feature as >> whether another committer can read the code comments and get a good feel. A bad >> feature is one where committers walk away from it, saying I don't really get it >> and I can't read an explanation of why it does that. Tom's most significant >> contribution is his long descriptive comments on what the problem is that need >> to be solved, the options and the method chosen. Clarity of thought is what >> solves bugs. > > Yes, I think we should have done that early-on for multi-xact, and I am > hopeful we will learn to do that more often when complex features are > implemented, or when we identify areas that are more complex than we > thought. > I see this idea of the README as very useful. There are far more people like me in this community than Simon or Alvaro. I can test, I can break things, I can script up a harness but I need to be understand HOW and the README would help allow for that. > > People think I want to stop feature development to review. What I am > saying is that we need to stop development so we can be honest about > whether we need review, and where. It is hard to be honest when time > and feature pressure are on you. It shouldn't take long to make that > decision as a group. > Right. This is all about taking a step back, a deep breath, an objective look and then digging in with a more productive and reliable manner. Sincerely, JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. Announcing "I'm offended" is basically telling the world you can't control your own emotions, so everyone else should do it for you.
On 06/06/2015 07:33 AM, Robert Haas wrote: > > On Sat, Jun 6, 2015 at 6:47 AM, Geoff Winkless <pgsqladmin@geoff.dj> wrote: >> To play devil's advocate for a moment, is there anyone who would genuinely >> be prepared to download and install an alpha release who would not already >> have downloaded one of the nightlies? I only ask because I assume that >> releasing >> an alpha is not zero-developer-cost and I don't believe >> that >> there's a large >> number of people who would be happy to install something that's described as >> being buggy and subject to change but are put off by having to type >> "configure" and "make". Yes, me and everyone like me in feature set. Compiling takes time, time that does not need to be spent. If I can push an alpha into a container and start testing, I will do so. If I have to: git pull; configure --prefix; make -j8 install Then I will likely move on to other things because my time (nor is any other's on this list) is not free. If you add into this a test harness that I can execute from the alpha release (or another package) that allows me to instant report via buildfarm or just email a tarball to -hackers that is even better. I know that I am not taking everything into account here but remember that most of our users are not -hackers. They are practitioners and a lot of them would love to help but just can't because a lot of the infrastructure has never been built and -hackers think like -hackers. Sincerely, JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. Announcing "I'm offended" is basically telling the world you can't control your own emotions, so everyone else should do it for you.
On Fri, Jun 05, 2015 at 08:25:34AM +0100, Simon Riggs wrote: > This whole idea of "feature development" vs reliability is bogus. It > implies people that work on features don't care about reliability. Given > the fact that many of the features are actually about increasing database > reliability in the event of crashes and corruptions it just makes no sense. I'm contrasting work that helps to keep our existing promises ("reliability") with work that makes new promises ("features"). In software development, we invariably hazard old promises to make new promises; our success hinges on electing neither too little nor too much risk. Two years ago, PostgreSQL's track record had placed it in a good position to invest in new, high-risk, high-reward promises. We did that, and we emerged solvent yet carrying an elevated debt service ratio. It's time to reduce risk somewhat. You write about a different sense of "reliability." (Had I anticipated this misunderstanding, I might have written "Restore-probity mode.") None of this was about classifying people, most of whom allocate substantial time to each kind of work. > How will we participate in cleanup efforts? How do we know when something > has been "cleaned up", how will we measure our success or failure? I think > we should be clear that wasting N months on cleanup can *fail* to achieve a > useful objective. Without a clear plan it almost certainly will do so. The > flip side is that wasting N months will cause great amusement and dancing > amongst those people who wish to pull ahead of our open source project and > we should take care not to hand them a victory from an overreaction. I agree with all that. We should likewise take care not to become insolvent from an underreaction. > So lets do our normal things, not do a "total stop" for an indefinite > period. If someone has specific things that in their opinion need to be > addressed, list them and we can talk about doing them, together. I recommend these four exit criteria: 1. Non-author committer review of foreign keys locks/multixact durability. Done when that committer certifies, as if hewere committing the patch himself today, that the code will not eat data. 2. Non-author committer review of row-level security. Done when that committer certifies that the code keeps its promisesand that the documentation bounds those promises accurately. 3. Second committer review of the src/backend/access changes for INSERT ... ON CONFLICT DO NOTHING/UPDATE. (Bugs affectingfolks who don't use the new syntax are most likely to fall in that portion.) Unlike the previous two criteria,a review without certification is sufficient. 4. Non-author committer certifying that the 9.5 WAL format changes will not eat your data. The patch lists Andres and Alvaroas reviewers; if they already reviewed it enough to make that certification, this one is easy. That ties up four people. For everyone else: - Fix bugs those reviews find. This will start slow but will grow to keep everyone busy. Committers won't certify code,and thus we can't declare victory, until these bugs are fixed. The rest of this list, in contrast, calls out topicsto sample from, not topics to exhaust. - Turn current buildfarm members green. - Write, review and commit more automated test machinery to PostgreSQL. Test whatever excites you. If you need ideas, Craigposted some good ones upthread. Here are a few more: - Add a debug mode that calls sched_yield() in SpinLockRelease();see 6322.1406219591@sss.pgh.pa.us. - Improve TAP suite (src/test/perl/TestLib.pm) logging. Currently,these suites redirect much output to /dev/null. Instead, log that output and teach the buildfarm to capturethe log. - Call VALGRIND_MAKE_MEM_NOACCESS() on a shared buffer when its local pin count falls to zero. Under CLOBBER_FREED_MEMORY,wipe a shared buffer when its global pin count falls to zero. - With assertions enabled, or perhapsin a new debug mode, have pg_do_encoding_conversion() and pg_server_to_any() check the data for a no-op conversioninstead of assuming the data is valid. - Add buildfarm members. This entails reporting any bugs that prevent an initial passing run. Once you have a passing run,schedule regular runs. Examples of useful additions: - "./configure ac_cv_func_getopt_long=no, ac_cv_func_snprintf=no..." to enable all the replacement code regardless of the current platform's need for it. Thishelps distinguish "Windows bug" from "replacement code bug." - --disable-integer-datetimes, --disable-float8-byval, disable-float4-byval, --disable-spinlocks, --disable-atomics, disable-thread-safety, --disable-largefile, #define RANDOMIZE_ALLOCATED_MEMORY- Any OS or CPU architecture other than x86 GNU/Linux, even ones already represented. - Write, review and commit fixes for the bugs that come to light by way of these new automated tests. - Anything else targeted to make PostgreSQL keep the promises it has already made to our users.
On Sun, Jun 7, 2015 at 4:58 AM, Noah Misch <noah@leadboat.com> wrote: > - Write, review and commit more automated test machinery to PostgreSQL. Test > whatever excites you. If you need ideas, Craig posted some good ones > upthread. Here are a few more: > - Improve TAP suite (src/test/perl/TestLib.pm) logging. Currently, these > suites redirect much output to /dev/null. Instead, log that output and > teach the buildfarm to capture the log. We can capture the logs and redirect them by replacing system_or_bail() with more calls to IPC::run. That would be a patch simple enough. pg_rewind's tests should be switched to use that as well. -- Michael
On Sat, Jun 6, 2015 at 12:33 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > Robert Haas <robertmhaas@gmail.com> wrote: >> Tom, for example, has previously not wanted to even bump >> catversion after beta1, which rules out a huge variety of >> possible fixes and interface changes. If we want to make a >> policy decision to change our approach, we should be up-front >> about that. > > What?!? There have been catversion bumps between the REL?_?_BETA1 > tag and the REL?_?_0 tag for 8.2, 8.3, 9.0, 9.1, 9.3, and 9.4. > (That is, it has happend on 6 of the last 8 releases.) I don't > think we're talking about any policy change here. We try to avoid > a catversion bump after beta if we can; we're not that reluctant to > do so if needed. Perhaps we're honoring this more in the breech than in the observance, but I'm not making up what Tom has said about this: http://www.postgresql.org/message-id/27310.1251410965@sss.pgh.pa.us http://www.postgresql.org/message-id/19174.1299782543@sss.pgh.pa.us http://www.postgresql.org/message-id/3413.1301154369@sss.pgh.pa.us http://www.postgresql.org/message-id/3261.1401915832@sss.pgh.pa.us -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Jun 6, 2015 at 7:07 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Perhaps we're honoring this more in the breech than in the observance, > but I'm not making up what Tom has said about this: > > http://www.postgresql.org/message-id/27310.1251410965@sss.pgh.pa.us > http://www.postgresql.org/message-id/19174.1299782543@sss.pgh.pa.us > http://www.postgresql.org/message-id/3413.1301154369@sss.pgh.pa.us > http://www.postgresql.org/message-id/3261.1401915832@sss.pgh.pa.us Of course, not doing a catversion bump after beta1 doesn't necessarily have much value in and of itself. *Promising* to not do a catversion bump, and then usually keeping that promise definitely has a certain value, but clearly we are incapable of that. -- Peter Geoghegan
On 06/06/2015 07:14 PM, Peter Geoghegan wrote: > > On Sat, Jun 6, 2015 at 7:07 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> Perhaps we're honoring this more in the breech than in the observance, >> but I'm not making up what Tom has said about this: >> >> http://www.postgresql.org/message-id/27310.1251410965@sss.pgh.pa.us >> http://www.postgresql.org/message-id/19174.1299782543@sss.pgh.pa.us >> http://www.postgresql.org/message-id/3413.1301154369@sss.pgh.pa.us >> http://www.postgresql.org/message-id/3261.1401915832@sss.pgh.pa.us > > Of course, not doing a catversion bump after beta1 doesn't necessarily > have much value in and of itself. *Promising* to not do a catversion > bump, and then usually keeping that promise definitely has a certain > value, but clearly we are incapable of that. > It seems to me that a cat bump during Alpha or Beta should be absolutely fine and reservedly fine respectively. Where we should absolutely not cat bump unless there is absolutely no other choice is during and RC. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. Announcing "I'm offended" is basically telling the world you can't control your own emotions, so everyone else should do it for you.
Joshua D. Drake <jd@commandprompt.com> wrote: > On 06/06/2015 07:14 PM, Peter Geoghegan wrote: >> On Sat, Jun 6, 2015 at 7:07 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> Perhaps we're honoring this more in the breech than in the >>> observance, but I'm not making up what Tom has said about this: >>> >>> http://www.postgresql.org/message-id/27310.1251410965@sss.pgh.pa.us That's 9.0 release discussion: | I think that the traditional criterion is that we don't release beta1 | as long as there are any known issues that might force an initdb. | We were successful in avoiding a post-beta initdb this time, although | IIRC the majority of release cycles have had one --- so maybe you | could argue that that's not so important. It would certainly be | less important if we had working pg_migrator functionality to ease | the pain of going from beta to final. >>> http://www.postgresql.org/message-id/19174.1299782543@sss.pgh.pa.us That's 9.1 release discussion: | Historically we've declared it beta once we think we are done with | initdb-forcing problems. | In any case, the existence of pg_upgrade means that "might we need | another initdb?" is not as strong a consideration as it once was, so | I'm not sure if we should still use that as a criterion. I don't know | quite what "ready for beta" should mean otherwise, though. >>> http://www.postgresql.org/message-id/3413.1301154369@sss.pgh.pa.us Also 9.1, it is listed as one criterion: | * No open issues that are expected to result in a catversion bump. | (With pg_upgrade, this is not as critical as it used to be, but | I still think catalog stability is a good indicator of a release's | maturity) >>> http://www.postgresql.org/message-id/3261.1401915832@sss.pgh.pa.us Here we jump to 9.4 discussion: | > Agreed. Additionally I also agree with Stefan that the price of a initdb | > during beta isn't that high these days. | | Yeah, if nothing else it gives testers another opportunity to exercise | pg_upgrade ;-). The policy about post-beta1 initdb is "avoid if | practical", not "avoid at all costs". So I think these examples show that the policy has shifted from a pretty strong requirement to "it's probably nice if" status, with some benefits seen in pg_upgrade testing to actually having a bump. >> Of course, not doing a catversion bump after beta1 doesn't necessarily >> have much value in and of itself. *Promising* to not do a catversion >> bump, and then usually keeping that promise definitely has a certain >> value, but clearly we are incapable of that. As someone who was able to bring up a new production application on 8.2 because it was all redundant data and not yet mission-critical, I appreciate that in very rate circumstances that combination could have benefit. But really, how often are people in that position? > It seems to me that a cat bump during Alpha or Beta should be absolutely > fine and reservedly fine respectively. Where we should absolutely not > cat bump unless there is absolutely no other choice is during and RC. +1 on all of that. And for a while now we've been talking about an alpha test release, right? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Jun 6, 2015 at 7:35 AM, Geoff Winkless <pgsqladmin@geoff.dj> wrote:
It's much easier to work into dev/test setups if there are system
packages as it's just a config change to an existing script. Building
from source would require a whole new workflow that I don't have time
to incorporate.Really? You genuinely don't have time to paste, say:mkdir -p ~/src/pgdevelcd ~/src/pgdeveltar xjf postgresql-snapshot.tar.bz2mkdir bldcd bld../postgresql-9.5devel/configure $(pg_config --configure | sed -e 's/\(pg\|postgresql[-\/]\)\(doc-\)\?9\.[0-9]*\(dev\)\?/\1\29.5dev/g')make worldmake checkmake world-install
I think this is rather uncharitable. You don't include yum, zypper, or apt-get anywhere in there, and I vaguely recall it took me quite a bit of time to install all the prereqs in order to get it to compile several years ago when I first started trying to compile it. And then even more time get it to compile with several of the config flags I wanted. And then even more time to get the docs to compile.
And now after I got all of that, when I try your code, it still doesn't work. $(pg_config ....) seems to not quote things the way that configure wants them quoted, or something. And the package from which I was using pg_config uses more config options than I was set up for when compiling myself, so I either have to manually remove the flags, or find more dependencies (pam, xslt, ossp-uuid, tcl, tcl-dev, and counting). This is not very fun, and I didn't even need to get bureaucratic approval to do any of this stuff, like a lot of people do.
And then when I try to install it, it tries to overwrite some of the files which were initially installed by the package manager in /usr/lib. That doesn't seem good.
And how do I, as a hypothetical package manager user, start this puppy up? Where is pg_ctlcluster? How does one do pg_upgrade between a package-controlled data directory and this new binary?
And then when I find a bug, how do I know it is a bug and not me doing something wrong in the build process, and getting the wrong .so to load with the wrong binary or something like that?
and yet you think you have enough time to provide more than a "looks like it's working" report to the developers?
If it isn't working, reports of it isn't working with error messages are pretty helpful and don't take a whole lot of time. If it is working, reports of that probably aren't terribly helpful without putting a lot more work into it. But people might be willing to put a lot of work into, say, performance regression testing it that is their area of expertise, if they don' t also have to put a lot of work into getting the software to compile in the first place, which is not their area.
Now I don't see a lot of evidence of beta testing from the public (i.e. unfamiliar names) on -hackers and -bugs lists. But a lot of hackers report things that "a client" or "someone on IRC" reported to them, so I'm willing to believe that a lot of useful beta testing does go on, even though I don't directly see it, and if there were alpha releases, why wouldn't it extend to that?
(NB the sed for the pg_config line will probably need work, it looks like it should work on the two types of system I have here but I have to admit I changed the config line manually when I built it)
Right, and are the people who use apt-get to install everything likely to know how to do that work?
Cheers,
Jeff
Joshua D. Drake wrote: > > On 06/05/2015 08:07 PM, Bruce Momjian wrote: > > >> From my side, it is only recently I got some clear answers to my questions > >>about how it worked. I think it is very important that major features have > >>extensive README type documentation with them so the underlying principles used > >>in the development are clear. I would define the measure of a good feature as > >>whether another committer can read the code comments and get a good feel. A bad > >>feature is one where committers walk away from it, saying I don't really get it > >>and I can't read an explanation of why it does that. Tom's most significant > >>contribution is his long descriptive comments on what the problem is that need > >>to be solved, the options and the method chosen. Clarity of thought is what > >>solves bugs. > > > >Yes, I think we should have done that early-on for multi-xact, and I am > >hopeful we will learn to do that more often when complex features are > >implemented, or when we identify areas that are more complex than we > >thought. > > I see this idea of the README as very useful. There are far more people like > me in this community than Simon or Alvaro. I can test, I can break things, I > can script up a harness but I need to be understand HOW and the README would > help allow for that. There is a src/backend/access/README.tuplock that attempts to describe multixacts. Is that not sufficient? Now that I think about it, this file hasn't been updated with the latest changes, so it's probably a bit outdated now. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Jun 6, 2015 at 12:58 PM, Noah Misch <noah@leadboat.com> wrote: > - Call VALGRIND_MAKE_MEM_NOACCESS() on a shared buffer when its local pin > count falls to zero. Under CLOBBER_FREED_MEMORY, wipe a shared buffer > when its global pin count falls to zero. Did a patch for this ever materialize? -- Peter Geoghegan
I think Alphas are valuable and useful and even more so if they have release notes. For example, some of my clients are capable of fetching sources and building from scratch and filing bug reports and are often interested in particular new features. They even have staging infrastructure that could test new postgres releases with real applications. But they don't do it. They also don't follow -hackers, they don't track git, and they don't have any easy way to tell if if the new feature they are interested in is actually complete and ready to test at any particular time. A lot of features are developed in multiple commits over a period of time and they see no point in testing until at least most of the feature is complete and expected to work. But it is not obvious from outside when that happens for any given feature. For my clients the value of Alpha releases would mainly be the release notes, or some other mark in the sand that says "As of Alpha-3 feature X is included and expected to mostly work." -dg -- David Gould daveg@sonic.net If simplicity worked, the world would be overrun with insects.
<div dir="ltr"><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><span style="font-family:arial,sans-serif">Amongseveral others, On 8 June 2015 at 13:59, David Gould </span><span dir="ltr" style="font-family:arial,sans-serif"><<ahref="mailto:daveg@sonic.net" target="_blank">daveg@sonic.net</a>></span><spanstyle="font-family:arial,sans-serif"> wrote:</span></div><div class="gmail_extra"><divclass="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #cccsolid;padding-left:1ex"> I think Alphas are valuable and useful and even more so if they have release<br /> notes. Forexample, some of my clients are capable of fetching sources and<br /> building from scratch and filing bug reports andare often interested in<br /> particular new features. They even have staging infrastructure that could<br /> test newpostgres releases with real applications. But they don't do it.<br /> They also don't follow -hackers, they don't trackgit, and they don't have<br /> any easy way to tell if if the new feature they are interested in is<br /> actually completeand ready to test at any particular time. A lot of<br /> features are developed in multiple commits over a periodof time and they<br /> see no point in testing until at least most of the feature is complete and<br /> expected towork. But it is not obvious from outside when that happens for<br /> any given feature. For my clients the value of Alphareleases would<br /> mainly be the release notes, or some other mark in the sand that says "As of<br /> Alpha-3 featureX is included and expected to mostly work."<br /></blockquote></div></div><div class="gmail_extra"><br /></div><divclass="gmail_extra"><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small">Wow! Inever knew there were all these people out there who would be rushing to help test if only the PG developers released alphaversions. It's funny how they never used to do it when those alphas were done.</div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><br/></div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small">Isay again: in my experience you don't get useful test reports frompeople who aren't able or prepared to compile software; what you do get is lots of unrelated and/or unhelpful noise inthe mailing list. That may be harsh or unfair or whatever, it's just my experience.</div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><br/></div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small">Iguess the only thing we can do is see who's right. I'm simply tryingto point out that it's not the zero-cost exercise that everyone appears to think that it is.</div><div class="gmail_default"style="font-family:verdana,sans-serif;font-size:small"><br /></div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small">Geoff</div></div></div>
On Mon, Jun 8, 2015 at 9:21 AM, Geoff Winkless <pgsqladmin@geoff.dj> wrote: > Wow! I never knew there were all these people out there who would be rushing > to help test if only the PG developers released alpha versions. It's funny > how they never used to do it when those alphas were done. That's probably overplaying your hand a little bit (and it sounds a bit catty, too). Some testing got done and it had some value. It just wasn't enough to make Peter feel like it was worthwhile. That doesn't mean that no testing got done and that it had no value, or that the same thing would happen this time. I'm as skeptical about this whole rush-out-an-alpha business as anyone, but I think that skepticism has to yield to contrary evidence, and people saying "I would test if..." is legitimate contrary evidence. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 06/08/2015 06:21 AM, Geoff Winkless wrote: > > Wow! I never knew there were all these people out there who would be > rushing to help test if only the PG developers released alpha versions. > It's funny how they never used to do it when those alphas were done. The type of responses you are providing on this thread are not warranted. JD -- The most kicking donkey PostgreSQL Infrastructure company in existence. The oldest, the most experienced, the consulting company to the stars. Command Prompt, Inc. http://www.commandprompt.com/ +1 -503-667-4564 - 24x7 - 365 - Proactive and Managed Professional Services!
On Mon, Jun 8, 2015 at 5:01 , Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Jun 8, 2015 at 9:21 AM, Geoff Winkless <pgsqladmin@geoff.dj> > wrote: >> Wow! I never knew there were all these people out there who would >> be rushing >> to help test if only the PG developers released alpha versions. >> It's funny >> how they never used to do it when those alphas were done. > > That's probably overplaying your hand a little bit (and it sounds a > bit catty, too). Some testing got done and it had some value. It > just wasn't enough to make Peter feel like it was worthwhile. That > doesn't mean that no testing got done and that it had no value, or > that the same thing would happen this time. I'm as skeptical about > this whole rush-out-an-alpha business as anyone, but I think that > skepticism has to yield to contrary evidence, and people saying "I > would test if..." is legitimate contrary evidence. Agreed. To get back to the point, I think the problem with original alphas was that they were after CF snapshots, not something that represented the final release. I do think that proper alpha/beta release is signal for several companies (I do know some that do testing once beta gets out) to do testing as it does indeed say that we are releasing something that is close in functionality to the final release. Also the packages are really important, there are enough companies that don't install development packages to servers at all so it's not just compile and run for them, they have to move it over to other machines, etc. We should be lowering the barrier to user based testing as much as possible and doing alpha with packages is exactly how we do that. IMHO the only real discussion here is if current 9.5 is ready for user testing and FWIW I thin it is. -- Petr Jelinek http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jun 8, 2015 at 9:21 AM, Geoff Winkless <pgsqladmin@geoff.dj> wrote:
> Wow! I never knew there were all these people out there who would be rushing
> to help test if only the PG developers released alpha versions. It's funny
> how they never used to do it when those alphas were done.
That's probably overplaying your hand a little bit (and it sounds a
bit catty, too).
I agree. The responses I had written yesterday but didn't send were much worse.
Mainly because I think it's quite an attitude to take that open-source developers should put extra time into building RPMs of development versions rather than testers waiting 5 minutes while their machines compile. Ohmygosh, you have to rpm install a bunch of -devel stuff? What a massive hardship.
The type of responses you are providing on this thread are not warranted.
I got people appearing completely insulted at my remarks and telling me that if only they could run the alpha they would provide testing, so I pointed out how easy it is to install the nightly from source and then they tell me that actually compiling is far too difficult and complicated, and that there are loads of clients who would run these nightlies if they had RPMS...
If I truly believed that such an RPM would produce useful testing, I would spend some of my own time building a setup to produce those RPMs myself and post here publicising them, at which point we would have a huge number of useful and productive test reports. Any one of the people telling me that I'm wrong could easily do the same, but so far none has.
I'm not harping on because I want to make people feel bad, I'm harping on because I don't want to see beta (and final) releases pushed back further because of a bad compromise, and I believe that that will happen. I apologise that I've clearly upset some people but they all have a very easy route to prove me wrong, and I'll be happy to admit my error.
Geoff
On Mon, Jun 8, 2015 at 12:22 PM, Geoff Winkless <pgsqladmin@geoff.dj> wrote: > On 8 June 2015 at 16:01, Robert Haas <robertmhaas@gmail.com> wrote: >> >> On Mon, Jun 8, 2015 at 9:21 AM, Geoff Winkless <pgsqladmin@geoff.dj> >> wrote: >> > Wow! I never knew there were all these people out there who would be >> > rushing >> > to help test if only the PG developers released alpha versions. It's >> > funny >> > how they never used to do it when those alphas were done. >> >> That's probably overplaying your hand a little bit (and it sounds a >> bit catty, too). > > > I agree. The responses I had written yesterday but didn't send were much > worse. > > Mainly because I think it's quite an attitude to take that open-source > developers should put extra time into building RPMs of development versions > rather than testers waiting 5 minutes while their machines compile. > Ohmygosh, you have to rpm install a bunch of -devel stuff? What a massive > hardship. It's not about the 5 minutes of compile time, it's about the signalling. Just *when* is git ready for testing? You don't know from the outside. I do lurk here a lot and still am unsure quite often. Even simply releasing an alpha *tarball* would be useful enough. What is needed is the signal to test, rather than a fully-built package.
<div dir="ltr"><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><span style="font-family:arial,sans-serif">On8 June 2015 at 17:03, Claudio Freire </span><span dir="ltr" style="font-family:arial,sans-serif"><<ahref="mailto:klaussfreire@gmail.com" target="_blank">klaussfreire@gmail.com</a>></span><spanstyle="font-family:arial,sans-serif"> wrote:</span><br /></div><divclass="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px#ccc solid;padding-left:1ex">It's not about the 5 minutes of compile time, it's about the signalling.<br/><br /> Just *when* is git ready for testing? You don't know from the outside.<br /><br /> I do lurk herea lot and still am unsure quite often.<br /><br /> Even simply releasing an alpha *tarball* would be useful enough. What<br/> is needed is the signal to test, rather than a fully-built package.<br /></blockquote></div><br /></div><div class="gmail_extra"><divclass="gmail_default" style="font-family:verdana,sans-serif;font-size:small">I can see that, andcan absolutely get behind the idea of a nightly being flagged as an alpha, since it should involve next to no developertime.</div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><br /></div><div class="gmail_default"style="font-family:verdana,sans-serif;font-size:small">I may be overestimating the amount of time thatgoes towards producing a release; the fact that the full-on alpha releases were stopped did imply to me that it's notinsignificant.</div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><br /></div><divclass="gmail_default" style="font-family:verdana,sans-serif;font-size:small">Geoff</div></div></div>
Just *when* is git ready for testing? You don't know from the outside.
I do lurk here a lot and still am unsure quite often.
Even simply releasing an alpha *tarball* would be useful enough. What
is needed is the signal to test, rather than a fully-built package.
IIUC the master branch is always ready for testing.
I do not think the project cares whether everyone is testing the exact same codebase; as long as test findings include the relevant commit hash the results will be informative.
David J.
It's not about the 5 minutes of compile time, it's about the signalling.
Just *when* is git ready for testing? You don't know from the outside.
I do lurk here a lot and still am unsure quite often.
Even simply releasing an alpha *tarball* would be useful enough. What
is needed is the signal to test, rather than a fully-built package.I can see that, and can absolutely get behind the idea of a nightly being flagged as an alpha, since it should involve next to no developer time.
Nightly where? This is an international community.
The tip of the master branch is the current "alpha" - so the question is whether a tar bundle should be provided instead of asking people to simply keep their Git clone up-to-date. These both have the flaw of excluding people who would test the application if it could simply be installed like any other package on their system. But I'm not seeing where there would be a huge group of people who would test an automatically generated source tar-ball but would not be willing to use Git. Or are we talking about a non-source tar-ball?
Maybe packagers could be convinced to bundle up the master branch on a monthly basis and simply call it Master-SNAPSHOT. No alpha, no beta, no version number. I've never packaged before so I don't know but while the project should encourage this as things currently standard the core project is doing its job by ensuring that the tip of master is always in a usable state.
Or, whenever a new patch release goes out packagers can also bundle up the current master at the same time.
David J.
On 2015-06-08 12:16:34 -0400, David G. Johnston wrote: > IIUC the master branch is always ready for testing. I don't really think so. For one we often find bugs ourselves quite quickly. But more importantly, I'd much rather have users use their precious (and thus limited!) time to test when the set of features (not every detail of a feature) is mostly set in stone. There's not much point in doing in-depth testing before that. Similarly it's not particularly worthwhile to test while the buildfarm still shows failures on common platforms. Andres
David, * David G. Johnston (david.g.johnston@gmail.com) wrote: > On Mon, Jun 8, 2015 at 12:03 PM, Claudio Freire <klaussfreire@gmail.com> > wrote: > > Just *when* is git ready for testing? You don't know from the outside. > > > > I do lurk here a lot and still am unsure quite often. > > > > Even simply releasing an alpha *tarball* would be useful enough. What > > is needed is the signal to test, rather than a fully-built package. > > > > > IIUC the master branch is always ready for testing. > > I do not think the project cares whether everyone is testing the exact > same codebase; as long as test findings include the relevant commit hash > the results will be informative. For my 2c, I do believe it's useful for projects which are based on PG or which work with PG to have a 'alpha1' tag to refer to. Asking users to test with git hash XYZABC isn't great. Getting more users of applications which use PG to do testing is, in my view at least, a great way to improve our test coverage and I do think having an alpha will help with that. That said, I'm not pushing to have one released this week or before PGCon or any such- let's get the back-branch releases dealt with and then we can get an alpha out. Thanks! Stephen
David G. Johnston wrote: > On Mon, Jun 8, 2015 at 12:14 PM, Geoff Winkless <pgsqladmin@geoff.dj> wrote: > > I can see that, and can absolutely get behind the idea of a nightly being > > flagged as an alpha, since it should involve next to no developer time. > > > Nightly where? This is an international community. A "nightly" refers to our development snapshots, which are uploaded to the ftp servers every "night" (according to some timezone). You can find them in pub/snapshot/ for each branch. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Jun 8, 2015 at 12:32:45PM -0400, David G. Johnston wrote: > On Mon, Jun 8, 2015 at 12:14 PM, Geoff Winkless <pgsqladmin@geoff.dj> wrote: > > On 8 June 2015 at 17:03, Claudio Freire <klaussfreire@gmail.com> wrote: > > It's not about the 5 minutes of compile time, it's about the > signalling. > > Just *when* is git ready for testing? You don't know from the outside. > > I do lurk here a lot and still am unsure quite often. > > Even simply releasing an alpha *tarball* would be useful enough. What > is needed is the signal to test, rather than a fully-built package. > > > I can see that, and can absolutely get behind the idea of a nightly being > flagged as an alpha, since it should involve next to no developer time. > > > > Nightly where? This is an international community. The daily snapshot tarballs are built in a way to minimize the number of development tools required: http://www.postgresql.org/ftp/snapshot/dev/ These would be easier to use than pulling from git. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Mon, Jun 8, 2015 at 7:01 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
David G. Johnston wrote:
> On Mon, Jun 8, 2015 at 12:14 PM, Geoff Winkless <pgsqladmin@geoff.dj> wrote:
> > I can see that, and can absolutely get behind the idea of a nightly being
> > flagged as an alpha, since it should involve next to no developer time.
> >
> Nightly where? This is an international community.
A "nightly" refers to our development snapshots, which are uploaded to
the ftp servers every "night" (according to some timezone). You can
find them in pub/snapshot/ for each branch.
Snapshots are actually not nightly anymore, and haven't been for some time. They are currently run every 4 hours, and are uploaded to the ftp server once a buildfarm run (on debian x64) finishes.
On Sat, Jun 6, 2015 at 03:58:05PM -0400, Noah Misch wrote: > On Fri, Jun 05, 2015 at 08:25:34AM +0100, Simon Riggs wrote: > > This whole idea of "feature development" vs reliability is bogus. It > > implies people that work on features don't care about reliability. Given > > the fact that many of the features are actually about increasing database > > reliability in the event of crashes and corruptions it just makes no sense. > > I'm contrasting work that helps to keep our existing promises ("reliability") > with work that makes new promises ("features"). In software development, we > invariably hazard old promises to make new promises; our success hinges on > electing neither too little nor too much risk. Two years ago, PostgreSQL's > track record had placed it in a good position to invest in new, high-risk, > high-reward promises. We did that, and we emerged solvent yet carrying an > elevated debt service ratio. It's time to reduce risk somewhat. > > You write about a different sense of "reliability." (Had I anticipated this > misunderstanding, I might have written "Restore-probity mode.") None of this > was about classifying people, most of whom allocate substantial time to each > kind of work. > > > How will we participate in cleanup efforts? How do we know when something > > has been "cleaned up", how will we measure our success or failure? I think > > we should be clear that wasting N months on cleanup can *fail* to achieve a > > useful objective. Without a clear plan it almost certainly will do so. The > > flip side is that wasting N months will cause great amusement and dancing > > amongst those people who wish to pull ahead of our open source project and > > we should take care not to hand them a victory from an overreaction. > > I agree with all that. We should likewise take care not to become insolvent > from an underreaction. I understand the overreaction/underreaction debate. Here were my goals in this discussion: 1. stop worry about the 9.5 timeline so we could honestly assess our software - *done* 2. seriously address multi-xact issues without 9.5/commit-fest pressure - *in process* 3. identify any other areas in need of serious work While I like the list you provided, I don't think we can be effective in an environment where we assume every big new features will have problems like multi-xact. For example, we have not seen destabilization from any major 9.4 features, that I can remember anyway. Unless there is consensus about new areas for #3, I am thinking we will continue looking at multi-xact until we are happy, then move ahead with 9.5 items in the way we have before. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On 2015-06-08 13:44:05 -0400, Bruce Momjian wrote: > I understand the overreaction/underreaction debate. Here were my goals > in this discussion: > > 1. stop worry about the 9.5 timeline so we could honestly assess our > software - *done* > 2. seriously address multi-xact issues without 9.5/commit-fest pressure - > *in process* > 3. identify any other areas in need of serious work > > While I like the list you provided, I don't think we can be effective in > an environment where we assume every big new features will have problems > like multi-xact. For example, we have not seen destabilization from any > major 9.4 features, that I can remember anyway. > > Unless there is consensus about new areas for #3, I am thinking we will > continue looking at multi-xact until we are happy, then move ahead with > 9.5 items in the way we have before. I think one important part is that we (continue to?) regularly tell our employers that work on pre-commit, post-commit review, and refactoring are critical for their long term business prospects. My impression so far is that that the employer side hasn't widely realized that fact, and that many contributors do the review etc. part in their spare time. Andres
On Mon, Jun 8, 2015 at 07:48:36PM +0200, Andres Freund wrote: > On 2015-06-08 13:44:05 -0400, Bruce Momjian wrote: > > I understand the overreaction/underreaction debate. Here were my goals > > in this discussion: > > > > 1. stop worry about the 9.5 timeline so we could honestly assess our > > software - *done* > > 2. seriously address multi-xact issues without 9.5/commit-fest pressure - > > *in process* > > 3. identify any other areas in need of serious work > > > > While I like the list you provided, I don't think we can be effective in > > an environment where we assume every big new features will have problems > > like multi-xact. For example, we have not seen destabilization from any > > major 9.4 features, that I can remember anyway. > > > > Unless there is consensus about new areas for #3, I am thinking we will > > continue looking at multi-xact until we are happy, then move ahead with > > 9.5 items in the way we have before. > > I think one important part is that we (continue to?) regularly tell our > employers that work on pre-commit, post-commit review, and refactoring > are critical for their long term business prospects. My impression so > far is that that the employer side hasn't widely realized that fact, and > that many contributors do the review etc. part in their spare time. Agreed. My bet is that more employers realize it now than they did a few months ago --- kind of hard to miss all those minor releases and customer complaints. :-( -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On 09/06/15 00:59, David Gould wrote: > I think Alphas are valuable and useful and even more so if they have release > notes. For example, some of my clients are capable of fetching sources and > building from scratch and filing bug reports and are often interested in > particular new features. They even have staging infrastructure that could > test new postgres releases with real applications. But they don't do it. > They also don't follow -hackers, they don't track git, and they don't have > any easy way to tell if if the new feature they are interested in is > actually complete and ready to test at any particular time. A lot of > features are developed in multiple commits over a period of time and they > see no point in testing until at least most of the feature is complete and > expected to work. But it is not obvious from outside when that happens for > any given feature. For my clients the value of Alpha releases would > mainly be the release notes, or some other mark in the sand that says "As of > Alpha-3 feature X is included and expected to mostly work." > > -dg > RELEASE NOTES I think that having: 1. release notes 2. an Alpha people can simply install without having to compile Would encourage more people to get involved. Such people would be unlikely to have the time and inclination to use 'nightlies', even if compiling was not required. I have read other posts in this thread, that support the above. Surely, it would be good for pg to have some more people checking quality at an earlier stage? So reducing barriers to do so is a good thing? Cheers, Gavin
On Mon, 8 Jun 2015 13:03:56 -0300 Claudio Freire <klaussfreire@gmail.com> wrote: > > Ohmygosh, you have to rpm install a bunch of -devel stuff? What a massive > > hardship. > > It's not about the 5 minutes of compile time, it's about the signalling. > > Just *when* is git ready for testing? You don't know from the outside. > > I do lurk here a lot and still am unsure quite often. > > Even simply releasing an alpha *tarball* would be useful enough. What > is needed is the signal to test, rather than a fully-built package. This. The clients I referred to earlier don't even use the rpm packages, they build from sources. They need to know when it is worthwhile to take a new set of sources and test. Some sort of labeling about what the contents are would enable them to do this. I don't think a monthly snapshot would work as well as the requirement is knowing that "grouping sets are in" not that "it is July now". -dg -- David Gould daveg@sonic.net If simplicity worked, the world would be overrun with insects.
On Wed, Jun 03, 2015 at 04:18:37PM +0200, Andres Freund wrote: > On 2015-06-03 09:50:49 -0400, Noah Misch wrote: > > Second, I would define the subject matter as "bug fixes, testing and > > review", not "restructuring, testing and review." Different code > > structures are clearest to different hackers. Restructuring, on > > average, adds bugs even more quickly than feature development adds > > them. > > I can't agree with this. While I agree with not doing large > restructuring for 9.5, I think we can't affort not to refactor for > clarity, even if that introduces bugs. Noticeable parts of our code have > to frequently be modified for new features and are badly structured at > the same time. While restructuring will may temporarily increase the > number of bugs in the short term, it'll decrease the number of bugs long > term while increasing the number of potential contributors and new > features. That's obviously not to say we should just refactor for the > sake of it. I think I agree with everything after your first sentence. I liked your specific proposal to split StartupXLOG(), but making broad-appeal restructuring proposals is hard. I doubt we would get good results by casting a wide net for restructuring ideas. Automated testing has a lower barrier to entry and is far less liable to make things worse instead of better. I can hope for good results from a TestSuiteFest, but not from a RestructureFest. That said, if folks initiate compelling restructure proposals, we should be willing to risk bugs from them like we risk bugs to acquire new features.
On 2015-06-10 01:57:22 -0400, Noah Misch wrote: > I think I agree with everything after your first sentence. I liked your > specific proposal to split StartupXLOG(), but making broad-appeal > restructuring proposals is hard. I doubt we would get good results by casting > a wide net for restructuring ideas. I'm not meaning that we should actively strive to find as many things to refactor as possible (yes, over-emphasized a bit). But that we shouldn't skip refactoring if we notice something structurally bad, just because it's been that way and we don't want to touch something "working". That argument has e.g. been made repeatedly for xlog.c contents. My feeling is that we're reaching the stage where a significant number of bugs are added because code is structured "needlessly" complicated and/or repetitive. And better testing can only catch so much - often enough somebody has to think of all the possible corner cases. > Automated testing has a lower barrier to > entry and is far less liable to make things worse instead of better. I can > hope for good results from a TestSuiteFest, but not from a RestructureFest. > That said, if folks initiate compelling restructure proposals, we should be > willing to risk bugs from them like we risk bugs to acquire new > features. Sure, increasing testing and reviews are good independently. And especially testing actually makes refactoring much more realistic. Greetings, Andres Freund
Peter Geoghegan wrote: > On Sat, Jun 6, 2015 at 12:58 PM, Noah Misch <noah@leadboat.com> wrote: > > - Call VALGRIND_MAKE_MEM_NOACCESS() on a shared buffer when its local pin > > count falls to zero. Under CLOBBER_FREED_MEMORY, wipe a shared buffer > > when its global pin count falls to zero. > > Did a patch for this ever materialize? I think the first part would be something like the attached. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Noah Misch wrote: > - Add buildfarm members. This entails reporting any bugs that prevent an > initial passing run. Once you have a passing run, schedule regular runs. > Examples of useful additions: > - "./configure ac_cv_func_getopt_long=no, ac_cv_func_snprintf=no ..." to > enable all the replacement code regardless of the current platform's need > for it. This helps distinguish "Windows bug" from "replacement code bug." > - --disable-integer-datetimes, --disable-float8-byval, disable-float4-byval, > --disable-spinlocks, --disable-atomics, disable-thread-safety, > --disable-largefile, #define RANDOMIZE_ALLOCATED_MEMORY #define RELCACHE_FORCE_RELEASE + #define CLOBBER_FREED_MEMORY -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jul 23, 2015 at 04:53:49PM -0300, Alvaro Herrera wrote: > Peter Geoghegan wrote: > > On Sat, Jun 6, 2015 at 12:58 PM, Noah Misch <noah@leadboat.com> wrote: > > > - Call VALGRIND_MAKE_MEM_NOACCESS() on a shared buffer when its local pin > > > count falls to zero. Under CLOBBER_FREED_MEMORY, wipe a shared buffer > > > when its global pin count falls to zero. > > > > Did a patch for this ever materialize? > > I think the first part would be something like the attached. Neat. Does it produce any new complaints during "make installcheck"?
Noah Misch wrote: > On Thu, Jul 23, 2015 at 04:53:49PM -0300, Alvaro Herrera wrote: > > Peter Geoghegan wrote: > > > On Sat, Jun 6, 2015 at 12:58 PM, Noah Misch <noah@leadboat.com> wrote: > > > > - Call VALGRIND_MAKE_MEM_NOACCESS() on a shared buffer when its local pin > > > > count falls to zero. Under CLOBBER_FREED_MEMORY, wipe a shared buffer > > > > when its global pin count falls to zero. > > > > > > Did a patch for this ever materialize? > > > > I think the first part would be something like the attached. > > Neat. Does it produce any new complaints during "make installcheck"? I only tried a few tests, for lack of time, and it didn't produce any. (To verify that the whole thing was working properly, I reduced the range of memory made available during PinBuffer and that resulted in a crash immediately). I am not really familiar with valgrind TBH and just copied a recipe to run postmaster under it, so if someone with more valgrind-fu could verify this, it would be great. This part: > > > > Under CLOBBER_FREED_MEMORY, wipe a shared buffer when its > > > > global pin count falls to zero. can be done without any valgrind, I think. Any takers? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services