Thread: management of large patches

management of large patches

From
Robert Haas
Date:
We're coming the end of the 9.1 development cycle, and I think that
there is a serious danger of insufficient bandwidth to handle the
large patches we have outstanding.  For my part, I am hoping to find
the bandwidth to two, MAYBE three major commits between now and the
end of 9.1CF4, but I am not positive that I will be able to find even
that much time, and the number of major patches vying for attention is
considerably greater than that.  Quick estimate:

- SQL/MED - probably needs >~3 large commits: foreign table scan, file
FDW, postgresql FDW, plus whatever else gets submitted in the next two
weeks
- MERGE
- checkpoint improvements
- SE-Linux integration
- extensions - may need 2 or more commits
- true serializability - not entirely sure of the status of this
- writeable CTEs (Tom has indicated he will look at this)
- PL/python patches (Peter has indicated he will look look at this)
- snapshot taking inconsistencies (Tom has indicated he will look at this)
- per-column collation (Peter)
- synchronous replication (Simon, and, given the level of interest in
and complexity of this feature, probably others as well)

I guess my basic question is - is it realistic to think that we're
going to get all of the above done in the next 45 days?  Is there
anything we can do make the process more efficient?  If a few more
large patches drop into the queue in the next two weeks, will we have
bandwidth for those as well?  If we don't think we can get everything
done in the time available, what's the best way to handle that?  I
would hate to discourage people from continuing to hack away, but I
think it would be even worse to give people the impression that
there's a chance of getting work reviewed and committed if there
really isn't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: management of large patches

From
Magnus Hagander
Date:
On Sun, Jan 2, 2011 at 06:32, Robert Haas <robertmhaas@gmail.com> wrote:
> We're coming the end of the 9.1 development cycle, and I think that
> there is a serious danger of insufficient bandwidth to handle the
> large patches we have outstanding.  For my part, I am hoping to find
> the bandwidth to two, MAYBE three major commits between now and the
> end of 9.1CF4, but I am not positive that I will be able to find even
> that much time, and the number of major patches vying for attention is
> considerably greater than that.  Quick estimate:
>
> - SQL/MED - probably needs >~3 large commits: foreign table scan, file
> FDW, postgresql FDW, plus whatever else gets submitted in the next two
> weeks
> - MERGE
> - checkpoint improvements
> - SE-Linux integration
> - extensions - may need 2 or more commits
> - true serializability - not entirely sure of the status of this
> - writeable CTEs (Tom has indicated he will look at this)
> - PL/python patches (Peter has indicated he will look look at this)
> - snapshot taking inconsistencies (Tom has indicated he will look at this)
> - per-column collation (Peter)
> - synchronous replication (Simon, and, given the level of interest in
> and complexity of this feature, probably others as well)
>
> I guess my basic question is - is it realistic to think that we're
> going to get all of the above done in the next 45 days?  Is there
> anything we can do make the process more efficient?  If a few more
> large patches drop into the queue in the next two weeks, will we have
> bandwidth for those as well?  If we don't think we can get everything
> done in the time available, what's the best way to handle that?  I

Well, we've always (well, since we had cf's) said that large patches
shouldn't be submitted for the last CF, they should be submitted for
one of the first. So if something *new* gets dumped on us for the last
one, giving priority to the existing ones in the queue seems like the
only fair option.

As for priority between those that *were* submitted earlier, and have
been reworked (which is how the system is supposed to work), it's a
lot harder. And TBH, I think we're going to have a problem getting all
those done. But the question is - are all ready enough, or are a
couple going to need the "returned with feedback" status *regardless*
of if this is the last CF or not?


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: management of large patches

From
Robert Haas
Date:
On Sun, Jan 2, 2011 at 4:29 AM, Magnus Hagander <magnus@hagander.net> wrote:
> As for priority between those that *were* submitted earlier, and have
> been reworked (which is how the system is supposed to work), it's a
> lot harder. And TBH, I think we're going to have a problem getting all
> those done. But the question is - are all ready enough, or are a
> couple going to need the "returned with feedback" status *regardless*
> of if this is the last CF or not?

Well, that all depends on how much work people are willing to put into
reviewing and committing them, which I think is what we need to
determine.  None of those patches are going to be as simple as "patch
-p1 < $F && git commit -a && git push".  Having done a couple of these
now, I'd say that doing final review and commit of a patch of this
scope takes me ~20 hours of work, but it obviously varies a lot based
on how good the patch is to begin with and how much review has already
been done.  So I guess the question is - who is willing to step up to
the plate, either as reviewer or as final reviewer/committer?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: management of large patches

From
KaiGai Kohei
Date:
(2011/01/02 14:32), Robert Haas wrote:
> We're coming the end of the 9.1 development cycle, and I think that
> there is a serious danger of insufficient bandwidth to handle the
> large patches we have outstanding.  For my part, I am hoping to find
> the bandwidth to two, MAYBE three major commits between now and the
> end of 9.1CF4, but I am not positive that I will be able to find even
> that much time, and the number of major patches vying for attention is
> considerably greater than that.  Quick estimate:
>  :
> - SE-Linux integration

How about feasibility to commit this 3KL patch in the last 45 days?

At least, the idea of security provider enables us to maintain a set
of hooks and logic to make access control decision independently.
I'm available to provide a set of sources for this module at
git.postgresql.org, so we can always obtain a working module from here.
The worst scenario for us is nothing were progressed in spite of
large man-power to review and discuss.

It may be more productive to keep features to be committed on the
last CF as small as possible, such as hooks to support a part of DDL
permissions or pg_regress enhancement to run regression test.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: management of large patches

From
"Kevin Grittner"
Date:
Robert Haas  wrote:
> - true serializability - not entirely sure of the status of this
I try to keep the status section of the Wiki page up-to-date.  I have
just reviewed it and tweaked it for the latest events:
http://wiki.postgresql.org/wiki/Serializable#Current_Status
There are a number of pending R&D issues:
http://wiki.postgresql.org/wiki/Serializable#R.26D_Issues
Most of these can be deferred.  The ones which really need at least
some attention before release relate to how to deal with serializable
transactions on replication targets and whether we've been properly
careful about using coding style which is safe for machines with weak
memory ordering.  I've done my best to follow discussions on that
topic and do the right thing, but someone with a deeper understanding
of the issues should probably take a look.
Someone has joined the effort starting this weekend -- a consultant
who has done a lot of technical writing (John Okite) will be working
on doc changes related to the patch.  (I assume that would best be
submitted as a separate patch.)
If you want a shorter version of the patch status: We expect to have
updated patch before the CF, including docs and incorporating
feedback from previous CFs and Heikki's comments on interim work.
-Kevin


Re: management of large patches

From
Greg Smith
Date:
Robert Haas wrote:
> - MERGE
> - checkpoint improvements
>   

As far as these two go, the state of MERGE is still rougher than I would 
like.  The code itself isn't too hard to read, and that the errors that 
are popping up tend to be caught by assertions (rather than just being 
mysterious crashes) makes me feel a little better that there's some 
defensive coding in there.  It's still a 3648 line patch that touches 
grammar, planner, and executor bits though, and I've been doing mainly 
functional and coding style review so far.  I'm afraid here's not too 
many committers in a good position to actually consume the whole scope 
of this thing for a commit level review.  And the way larger patches 
tend to work here, I'd be surprised to find it passes through such a 
review without some as yet unidentified major beef appearing.  Will see 
what we can do to help move this forward more before the CF start.

The checkpoint changes I'm reworking are not really large from a code 
complexity or size perspective--I estimate around 350 lines of diff, 
with the rough version I submitted to CF2010-11 at 258.  I suspect it 
will actually be the least complicated patch to consume from that list, 
from a committer perspective.  The complexity there is mainly in the 
performance testing.  I've been gearing up infrastructure the last 
couple weeks to automate and easily publish all the results I collect 
there.  The main part that hasn't gone through any serious testing yet, 
auto-tuning the spread interval, will also be really easy to revert if a 
problem is found there.  With Simon and I both reviewing each others 
work on this already, I hope we can keep this one from clogging the 
committer critical path you're worried about here.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books



Re: management of large patches

From
Dimitri Fontaine
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> - extensions - may need 2 or more commits

I'm now basically done with coding, I'm writing the docs for the upgrade
patch and preparing the upgrade SQL files for pre-9.1 to 9.1 upgrades of
the contrib modules.

Doing that, I've been cleaning up or reorganising some code: I will
backport some of those changes to the main extension patch.  So I expect
to send both extension.v23.patch and extension-upgrade.v1.patch this
week.

As the main extension patch as received lots of detailed reviews (both
user level and code level) by commiters already, I'm not expecting big
surprises for the last commitfest.  The upgrade patch design has been
discussed in detail on-list too.  Dust has settled here.

Meanwhile, there's this bugfix for HEAD that I've sent:
 http://archives.postgresql.org/pgsql-hackers/2011-01/msg00078.php

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support