Thread: antisocial things you can do in git (but not CVS)

antisocial things you can do in git (but not CVS)

From

Robert Haas

Date:

20 July 2010, 15:34:29

I have some concerns related to the upcoming conversion to git and how
we're going to avoid having things get messy as people start using the
new repository.  git has a lot more flexibility and power than CVS,
and I'm worried that it would be easy, even accidentally, to screw up
our history.

1. Inability to cleanly and easily (and programatically) identify who
committed what.  With CVS, the author of a revision is the person who
committed it, period.  With git, the author string can be set to
anything the person typing 'git commit' feels like.  I think there is
also a committer field, but that doesn't always appear and I'm not
clear on how it works.  Also, the author field defaults to something
dumb if you don't explicitly set it to a value.  So I'm worried we
could end up with stuff like this in the repository:

Author: <rhaas@rhaas-laptop>
Author: Robert Haas <robertmhaas@gmail.com>
Author: Robert Haas <rhaas@enterprisedb.com>
Author: Robert Haas
<rhaas@some-place-i-might-hypothetically-work-in-the-future.com>
Author: The Guy Who Wrote Some Patch Which Robert Haas Ended Up
Committing <somerandomemail@somerandomdomain.whatever>

Right now, it's easy to find all the commits by a particular
committer, and it's easy to see who committed a particular patch, and
the number of distinct committers is pretty small.  I'd hate to give
that up.

git log | grep '^Author' | sort | uniq -c | sort -n | less

My preference would be to stick to a style where we identify the
committer using the author tag and note the patch author, reviewers,
whether the committer made changes, etc. in the commit message.  A
single author field doesn't feel like enough for our workflow, and
having a mix of authors and committers in the author field seems like
a mess.

2. Branch and tag management.  In CVS, there are branches and tags in
only one place: on the server.  In git, you can have local branches
and tags and remote branches and tags, and you can pull and push tags
between servers.  If I'm working on a git repository that has branches
master, REL9_0_STABLE .. REL7_4_STABLE, inner_join_removal,
numeric_2b, and temprelnames, I want to make sure that I don't
accidentally push the last three of those to the authoritative
server... but I do want to push all the others.  Similarly I want to
push only the corrects subset of tags (though that should be less of
an issue, at least for me, as I don't usually create local tags).  I'm
not sure how to set this up, though.

3. Merge commits.  I believe that we have consensus that commits
should always be done as a "squash", so that the history of all of our
branches is linear.  But it seems to me that someone could
accidentally push a merge commit, either because they forgot to squash
locally, or because of a conflict between their local git repo's
master branch and origin/master.  Can we forbid this?

4. History rewriting.  Under what circumstances, if any, are we OK
with rebasing the master?  For example, if we decide not to have merge
commits, and somebody does a merge commit anyway, are we going to
rebase to get rid of it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: antisocial things you can do in git (but not CVS)

From

Magnus Hagander

Date:

20 July 2010, 15:43:09

On Tue, Jul 20, 2010 at 20:34, Robert Haas <robertmhaas@gmail.com> wrote:
> I have some concerns related to the upcoming conversion to git and how
> we're going to avoid having things get messy as people start using the
> new repository.  git has a lot more flexibility and power than CVS,
> and I'm worried that it would be easy, even accidentally, to screw up
> our history.
>
> 1. Inability to cleanly and easily (and programatically) identify who
> committed what.  With CVS, the author of a revision is the person who
> committed it, period.  With git, the author string can be set to
> anything the person typing 'git commit' feels like.  I think there is
> also a committer field, but that doesn't always appear and I'm not
> clear on how it works.  Also, the author field defaults to something
> dumb if you don't explicitly set it to a value.  So I'm worried we
> could end up with stuff like this in the repository:

I'm pretty sure we can enforce this on the server side, refusing
commits that don't follow our standard. I haven't done it myself
(yet), but I've read about it.


> My preference would be to stick to a style where we identify the
> committer using the author tag and note the patch author, reviewers,
> whether the committer made changes, etc. in the commit message.  A
> single author field doesn't feel like enough for our workflow, and
> having a mix of authors and committers in the author field seems like
> a mess.

+1.


> 2. Branch and tag management.  In CVS, there are branches and tags in
> only one place: on the server.  In git, you can have local branches
> and tags and remote branches and tags, and you can pull and push tags
> between servers.  If I'm working on a git repository that has branches
> master, REL9_0_STABLE .. REL7_4_STABLE, inner_join_removal,
> numeric_2b, and temprelnames, I want to make sure that I don't
> accidentally push the last three of those to the authoritative
> server... but I do want to push all the others.  Similarly I want to
> push only the corrects subset of tags (though that should be less of
> an issue, at least for me, as I don't usually create local tags).  I'm
> not sure how to set this up, though.

We could put a safeguard in place on the server that won't let you
push a branch and require that additions of new branches be done
manually on the server?


> 3. Merge commits.  I believe that we have consensus that commits
> should always be done as a "squash", so that the history of all of our
> branches is linear.  But it seems to me that someone could
> accidentally push a merge commit, either because they forgot to squash
> locally, or because of a conflict between their local git repo's
> master branch and origin/master.  Can we forbid this?

Again, I haven't done it, but I've read about it, and I'm almost
certain we can enforce this, yes.


> 4. History rewriting.  Under what circumstances, if any, are we OK
> with rebasing the master?  For example, if we decide not to have merge
> commits, and somebody does a merge commit anyway, are we going to
> rebase to get rid of it?

That's something we need a good policy for. Merge commits are special.
For content commits, I think we should basically *never* do that. If
someone commits bad content, we should just make a revert commit which
keeps history linear and just removes the changes as a new commit.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: antisocial things you can do in git (but not CVS)

From

Peter Eisentraut

Date:

20 July 2010, 16:12:40

On tis, 2010-07-20 at 14:34 -0400, Robert Haas wrote:
> Right now, it's easy to find all the commits by a particular
> committer, and it's easy to see who committed a particular patch, and
> the number of distinct committers is pretty small.  I'd hate to give
> that up.
> 
> git log | grep '^Author' | sort | uniq -c | sort -n | less

git log --format=full | grep '^Commit' | sort | uniq -c | sort -n | less

> My preference would be to stick to a style where we identify the
> committer using the author tag and note the patch author, reviewers,
> whether the committer made changes, etc. in the commit message.  A
> single author field doesn't feel like enough for our workflow, and
> having a mix of authors and committers in the author field seems like
> a mess.

Well, I had looked forward to actually putting the real author into the
author field.

> 2. Branch and tag management.  In CVS, there are branches and tags in
> only one place: on the server.  In git, you can have local branches
> and tags and remote branches and tags, and you can pull and push tags
> between servers.  If I'm working on a git repository that has branches
> master, REL9_0_STABLE .. REL7_4_STABLE, inner_join_removal,
> numeric_2b, and temprelnames, I want to make sure that I don't
> accidentally push the last three of those to the authoritative
> server... but I do want to push all the others.  Similarly I want to
> push only the corrects subset of tags (though that should be less of
> an issue, at least for me, as I don't usually create local tags).  I'm
> not sure how to set this up, though.

I'm going to use one separate clone for my development and one
"pristine" one for the final commits and copy the patches over manually.
That also solves the next problem ...

> 3. Merge commits.  I believe that we have consensus that commits
> should always be done as a "squash", so that the history of all of our
> branches is linear.  But it seems to me that someone could
> accidentally push a merge commit, either because they forgot to squash
> locally, or because of a conflict between their local git repo's
> master branch and origin/master.  Can we forbid this?

Re: antisocial things you can do in git (but not CVS)

From

Robert Haas

Date:

20 July 2010, 18:22:09

On Tue, Jul 20, 2010 at 2:42 PM, Magnus Hagander <magnus@hagander.net> wrote:
> On Tue, Jul 20, 2010 at 20:34, Robert Haas <robertmhaas@gmail.com> wrote:
>> I have some concerns related to the upcoming conversion to git and how
>> we're going to avoid having things get messy as people start using the
>> new repository.  git has a lot more flexibility and power than CVS,
>> and I'm worried that it would be easy, even accidentally, to screw up
>> our history.
>>
>> 1. Inability to cleanly and easily (and programatically) identify who
>> committed what.  With CVS, the author of a revision is the person who
>> committed it, period.  With git, the author string can be set to
>> anything the person typing 'git commit' feels like.  I think there is
>> also a committer field, but that doesn't always appear and I'm not
>> clear on how it works.  Also, the author field defaults to something
>> dumb if you don't explicitly set it to a value.  So I'm worried we
>> could end up with stuff like this in the repository:
>
> I'm pretty sure we can enforce this on the server side, refusing
> commits that don't follow our standard. I haven't done it myself
> (yet), but I've read about it.

+1, though I see downthread that Peter has a contrary opinion.

>> My preference would be to stick to a style where we identify the
>> committer using the author tag and note the patch author, reviewers,
>> whether the committer made changes, etc. in the commit message.  A
>> single author field doesn't feel like enough for our workflow, and
>> having a mix of authors and committers in the author field seems like
>> a mess.
>
> +1.
>
>
>> 2. Branch and tag management.  In CVS, there are branches and tags in
>> only one place: on the server.  In git, you can have local branches
>> and tags and remote branches and tags, and you can pull and push tags
>> between servers.  If I'm working on a git repository that has branches
>> master, REL9_0_STABLE .. REL7_4_STABLE, inner_join_removal,
>> numeric_2b, and temprelnames, I want to make sure that I don't
>> accidentally push the last three of those to the authoritative
>> server... but I do want to push all the others.  Similarly I want to
>> push only the corrects subset of tags (though that should be less of
>> an issue, at least for me, as I don't usually create local tags).  I'm
>> not sure how to set this up, though.
>
> We could put a safeguard in place on the server that won't let you
> push a branch and require that additions of new branches be done
> manually on the server?

On this one, I'd just like a way to prevent accidents.  Is there maybe
a config option I can set on my local repository?

>> 3. Merge commits.  I believe that we have consensus that commits
>> should always be done as a "squash", so that the history of all of our
>> branches is linear.  But it seems to me that someone could
>> accidentally push a merge commit, either because they forgot to squash
>> locally, or because of a conflict between their local git repo's
>> master branch and origin/master.  Can we forbid this?
>
> Again, I haven't done it, but I've read about it, and I'm almost
> certain we can enforce this, yes.

OK, that sounds good...

>> 4. History rewriting.  Under what circumstances, if any, are we OK
>> with rebasing the master?  For example, if we decide not to have merge
>> commits, and somebody does a merge commit anyway, are we going to
>> rebase to get rid of it?
>
> That's something we need a good policy for. Merge commits are special.
> For content commits, I think we should basically *never* do that. If
> someone commits bad content, we should just make a revert commit which
> keeps history linear and just removes the changes as a new commit.

Yeah, I agree.  I'm not sure that merge commits are the ONLY situation
where we'd want to do this, but it should be reserved for cases where
just reversing out the diff wouldn't be sufficient for some reason and
we need to make it as though it never happened.  I don't think it's
probably necessary to disallow this completely - the default setting
of allowing it only with + is probably enough.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: antisocial things you can do in git (but not CVS)

From

Robert Haas

Date:

20 July 2010, 18:22:37

On Tue, Jul 20, 2010 at 3:12 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> Well, I had looked forward to actually putting the real author into the
> author field.

What if there's more than one?  What if you make changes yourself?
How will you credit the reviewer?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: antisocial things you can do in git (but not CVS)

From

Andrew Dunstan

Date:

20 July 2010, 19:31:50


Robert Haas wrote:
> I have some concerns related to the upcoming conversion to git and how
> we're going to avoid having things get messy as people start using the
> new repository.  git has a lot more flexibility and power than CVS,
> and I'm worried that it would be easy, even accidentally, to screw up
> our history.
>
> 1. Inability to cleanly and easily (and programatically) identify who
> committed what.  
>   

Each committer sets their name and email using git config. Doesn't look 
like a problem. We don't have such a large number of committers that 
this should be much of an issue. Maybe we can set a pre-receive hook to 
make sure that it's set appropriately?

> 2. Branch and tag management.  
>   
[snip]

I'm inclined to say that as now committers should not normally push 
tags. Marc or whoever is managing things should create the various tags. 
I think our current tagging policy is about right. "git push" doesn't 
push tags by default, so you'd have to be trying hard to mess this up.
> 3. Merge commits.  I believe that we have consensus that commits
> should always be done as a "squash", so that the history of all of our
> branches is linear.  But it seems to me that someone could
> accidentally push a merge commit, either because they forgot to squash
> locally, or because of a conflict between their local git repo's
> master branch and origin/master.  Can we forbid this?
>   

Again, a pre-receive hook might be able to handle this. See 
<http://progit.org/book/ch7-4.html>
> 4. History rewriting.  Under what circumstances, if any, are we OK
> with rebasing the master?  For example, if we decide not to have merge
> commits, and somebody does a merge commit anyway, are we going to
> rebase to get rid of it?
>
>   

In the end, if we screw up badly enough we can just roll things back. It 
would be a pain, but not insurmountably so. I think we need to expect 
that there will be some teething issues. I keep 7 days worth of backups 
of the CVS repo constantly now, and I'll probably do the same with git, 
and I'm sure there will be other backups.

cheers

andrew

Re: antisocial things you can do in git (but not CVS)

From

Andrew Dunstan

Date:

20 July 2010, 21:29:07


Robert Haas wrote:
> On Tue, Jul 20, 2010 at 3:12 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
>   
>> Well, I had looked forward to actually putting the real author into the
>> author field.
>>     
>
> What if there's more than one?  What if you make changes yourself?
> How will you credit the reviewer?
>
>   

I think our current practice is fine. Put it in the commit log.

cheers

andrew

Re: antisocial things you can do in git (but not CVS)

From

Dave Page

Date:

21 July 2010, 04:53:41

On Tue, Jul 20, 2010 at 8:12 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
>> My preference would be to stick to a style where we identify the
>> committer using the author tag and note the patch author, reviewers,
>> whether the committer made changes, etc. in the commit message.  A
>> single author field doesn't feel like enough for our workflow, and
>> having a mix of authors and committers in the author field seems like
>> a mess.
>
> Well, I had looked forward to actually putting the real author into the
> author field.

I hadn't realised that was possible until Guillaume did so on his
first commit to the new pgAdmin GIT repo. It seems to work nicely:

http://git.postgresql.org/gitweb?p=pgadmin3.git;a=commit;h=08e2826d90129bd4e4b3b7462bab682dd6a703e4

--
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: antisocial things you can do in git (but not CVS)

From

Magnus Hagander

Date:

21 July 2010, 06:03:45

On Wed, Jul 21, 2010 at 02:28, Andrew Dunstan <andrew@dunslane.net> wrote:
>
>
> Robert Haas wrote:
>>
>> On Tue, Jul 20, 2010 at 3:12 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
>>
>>>
>>> Well, I had looked forward to actually putting the real author into the
>>> author field.
>>>
>>
>> What if there's more than one?  What if you make changes yourself?
>> How will you credit the reviewer?
>>
>>
>
> I think our current practice is fine. Put it in the commit log.

If nothing else, I think this definitely falls under the "minimum
changes first" policy. Let's start by doing things exactly as we're
doing now. We can then consider changing this in the future, but let's
not change everything at once.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: antisocial things you can do in git (but not CVS)

From

Abhijit Menon-Sen

Date:

21 July 2010, 07:46:54

At 2010-07-20 14:34:20 -0400, robertmhaas@gmail.com wrote:
>
> I think there is also a committer field, but that doesn't always
> appear and I'm not clear on how it works.

There is always a committer field, and it is set sensibly as long as the
committer has user.name and user.email set correctly with git-config. It
is not displayed by git-log by default, unless it is different from the
"author". (As PeterE showed, it's easy to get the list of committers.)

> My preference would be to stick to a style where we identify the
> committer using the author tag and note the patch author, reviewers,
> whether the committer made changes, etc. in the commit message.

An aside: as a patch author (and elsewhere, as a committer), it's nice
when the log shows the author rather than the committer. Will we really
have so many patches with multiple authors or other complications that
we can't set the author by default and fall back to explanations in the
commit message (e.g. "applied with changes") for more complicated cases?

> I want to make sure that I don't accidentally push the last three of
> those to the authoritative server...

By default (at least with a recent git), "git push" will push branches
that are tracking remote branches, but new local branches have to be
pushed explicitly to create them on the remote.

So don't worry about that.

> 3. Merge commits.  I believe that we have consensus that commits
> should always be done as a "squash", so that the history of all of
> our branches is linear. 

I admit I haven't been paying as much attention as I should, but I did
not know there was such a consensus. If anyone could explain the
rationale, I would be grateful.

> But it seems to me that someone could
> accidentally push a merge commit […]
> Can we forbid this?

Yes, I suppose it's possible, but personally I think it would be a waste
of time to try to ban merge commits.

> 4. History rewriting.  Under what circumstances, if any, are we OK
> with rebasing the master?

Please, let's never do that. The cure for pulling a rebased branch into
an existing clone may seem simple, but it's a huge pain in practice, and
it's never really worth it.

-- ams

Re: antisocial things you can do in git (but not CVS)

From

Robert Haas

Date:

21 July 2010, 07:50:03

On Wed, Jul 21, 2010 at 6:46 AM, Abhijit Menon-Sen <ams@toroid.org> wrote:
>> My preference would be to stick to a style where we identify the
>> committer using the author tag and note the patch author, reviewers,
>> whether the committer made changes, etc. in the commit message.
>
> An aside: as a patch author (and elsewhere, as a committer), it's nice
> when the log shows the author rather than the committer. Will we really
> have so many patches with multiple authors or other complications that
> we can't set the author by default and fall back to explanations in the
> commit message (e.g. "applied with changes") for more complicated cases?

Tom Lane rewrites part of nearly every commit, and even I change maybe
30% of them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: antisocial things you can do in git (but not CVS)

From

Magnus Hagander

Date:

21 July 2010, 07:56:05

On Wed, Jul 21, 2010 at 12:46, Abhijit Menon-Sen <ams@toroid.org> wrote:
> At 2010-07-20 14:34:20 -0400, robertmhaas@gmail.com wrote:
>> I want to make sure that I don't accidentally push the last three of
>> those to the authoritative server...
>
> By default (at least with a recent git), "git push" will push branches
> that are tracking remote branches, but new local branches have to be
> pushed explicitly to create them on the remote.

Yeha, i agree this is probably not a big problem. Plus, if we
accidentally push a branch that shouldn't have been pushed, it can
easily be removed (as long as it's noticed before anybody relies on
it). To the suitable embarrassment of the committer who made the
incorrect push, which has a tendency to teach them not to do it next
time :-)

>> 3. Merge commits.  I believe that we have consensus that commits
>> should always be done as a "squash", so that the history of all of
>> our branches is linear.
>
> I admit I haven't been paying as much attention as I should, but I did
> not know there was such a consensus. If anyone could explain the
> rationale, I would be grateful.

We are not changing the workflow, just the tool.

We may consider changing the workflow sometime in the future (but
don't bet on it), but we're definitely not changing both at the same
time.

This has been discussed many times before, both here on list and in
person on at least two instances of the pgcon developer meeting. This
is not the time to re-open that discussion.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: antisocial things you can do in git (but not CVS)

From

Abhijit Menon-Sen

Date:

21 July 2010, 08:05:28

At 2010-07-21 12:55:55 +0200, magnus@hagander.net wrote:
>
> We are not changing the workflow, just the tool.

OK, but I don't see why accidental merge commits need to be considered
antisocial, and banned or rebased away. Who cares if they exist? They
don't change anything you need to do to pull, create, view, or push
changes.

> This is not the time to re-open that discussion.

Sure. I apologise for bringing it up.

-- ams

Re: antisocial things you can do in git (but not CVS)

From

Magnus Hagander

Date:

21 July 2010, 08:09:32

On Wed, Jul 21, 2010 at 13:05, Abhijit Menon-Sen <ams@toroid.org> wrote:
> At 2010-07-21 12:55:55 +0200, magnus@hagander.net wrote:
>>
>> We are not changing the workflow, just the tool.
>
> OK, but I don't see why accidental merge commits need to be considered
> antisocial, and banned or rebased away. Who cares if they exist? They
> don't change anything you need to do to pull, create, view, or push
> changes.

They makes it harder to track how the project has moved along for
people who don't really know about the concept.

I'm not sure, but I bet they may cause issues for those tracking the
project through git-cvs, or any other tool that doesn't deal with
nonlinear history.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: antisocial things you can do in git (but not CVS)

From

Jonathan Corbet

Date:

21 July 2010, 12:10:18

On Tue, 20 Jul 2010 14:34:20 -0400
Robert Haas <robertmhaas@gmail.com> wrote:

> I have some concerns related to the upcoming conversion to git and how
> we're going to avoid having things get messy as people start using the
> new repository.

Here's a few responses from the point of view of somebody who has been
working with git in the kernel community for some years now.  Hopefully
it's helpful...

> 1. Inability to cleanly and easily (and programatically) identify who
> committed what.  

No, git tracks committer information separately, and it's easily
accessible.  Dig into the grungy details of git-log and you'll see that you
can get out just about anything you need, in any format.

IMHO, vandalizing the author field would be a mistake; it's your best way
of tracking where the patch came from and for ensuring credit in your
changelogs.  Why throw away information?

> 2. Branch and tag management.  In CVS, there are branches and tags in
> only one place: on the server.  In git, you can have local branches
> and tags and remote branches and tags, and you can pull and push tags
> between servers.  If I'm working on a git repository that has branches
> master, REL9_0_STABLE .. REL7_4_STABLE, inner_join_removal,
> numeric_2b, and temprelnames, I want to make sure that I don't
> accidentally push the last three of those to the authoritative
> server... but I do want to push all the others.  Similarly I want to
> push only the corrects subset of tags (though that should be less of
> an issue, at least for me, as I don't usually create local tags).  I'm
> not sure how to set this up, though.

Branch push policy can be tweaked in your local config.  I'm less sure
about tags.  It's worth noting that the kernel community does very little
with push in general - things are much more often pulled.  That may not be
a workflow that's suitable for postgresql, though.

> 3. Merge commits.  I believe that we have consensus that commits
> should always be done as a "squash", so that the history of all of our
> branches is linear.  But it seems to me that someone could
> accidentally push a merge commit, either because they forgot to squash
> locally, or because of a conflict between their local git repo's
> master branch and origin/master.  Can we forbid this?

That seems like a terrible idea to me - why would you destroy history?
Obviously I've missed a discussion here.  But, the first time somebody
wants to use bisect to pinpoint a regression-causing patch, you'll wish you
had that information there.

> 4. History rewriting.  Under what circumstances, if any, are we OK
> with rebasing the master?  For example, if we decide not to have merge
> commits, and somebody does a merge commit anyway, are we going to
> rebase to get rid of it?

A good general rule of thumb is to treat publicly-exposed history as
immutable.  As soon as you start rebasing trees you create misery for
anybody working with those trees.

If you're really set on avoiding things like merges, there are ways to set
up scripts on the server to enforce policies.

jon

Re: antisocial things you can do in git (but not CVS)

From

Robert Haas

Date:

21 July 2010, 13:11:44

On Wed, Jul 21, 2010 at 10:49 AM, Jonathan Corbet <corbet@lwn.net> wrote:
>> 1. Inability to cleanly and easily (and programatically) identify who
>> committed what.
>
> No, git tracks committer information separately, and it's easily
> accessible.  Dig into the grungy details of git-log and you'll see that you
> can get out just about anything you need, in any format.
>
> IMHO, vandalizing the author field would be a mistake; it's your best way
> of tracking where the patch came from and for ensuring credit in your
> changelogs.  Why throw away information?

If git had a place to store all the information we care about, that
would be fine, but it doesn't.  Here's a recent attribution line I
used:
   David Christensen. Reviewed by Steve Singer.  Some further changes by me.

There's no "reviewer" header, and there's no concept that a patch
might have come from the author (or perhaps multiple authors), but
then have been adjusted by one or more reviewers and then frobnicated
some more by the committer.  I'm not sure it's possible to create a
system that can effectively store all the ways we give credit and
attribution, but a single author line is definitely not it.  How much
do I have to change a patch before I would use my own name on the
author line rather than the patch author's?  A single byte?  A
non-whitespace change?  More than 15% of the patch?  And, oh by the
way, it's the committer who writes the commit message, not the patch
author, so at an *absolute minimum* that part of the commit object
isn't coming from the original author.

>> 2. Branch and tag management.  In CVS, there are branches and tags in
>> only one place: on the server.  In git, you can have local branches
>> and tags and remote branches and tags, and you can pull and push tags
>> between servers.  If I'm working on a git repository that has branches
>> master, REL9_0_STABLE .. REL7_4_STABLE, inner_join_removal,
>> numeric_2b, and temprelnames, I want to make sure that I don't
>> accidentally push the last three of those to the authoritative
>> server... but I do want to push all the others.  Similarly I want to
>> push only the corrects subset of tags (though that should be less of
>> an issue, at least for me, as I don't usually create local tags).  I'm
>> not sure how to set this up, though.
>
> Branch push policy can be tweaked in your local config.  I'm less sure
> about tags.  It's worth noting that the kernel community does very little
> with push in general - things are much more often pulled.  That may not be
> a workflow that's suitable for postgresql, though.

Seems like we've got this one worked out, per discussion upthread.

>> 3. Merge commits.  I believe that we have consensus that commits
>> should always be done as a "squash", so that the history of all of our
>> branches is linear.  But it seems to me that someone could
>> accidentally push a merge commit, either because they forgot to squash
>> locally, or because of a conflict between their local git repo's
>> master branch and origin/master.  Can we forbid this?
>
> That seems like a terrible idea to me - why would you destroy history?
> Obviously I've missed a discussion here.  But, the first time somebody
> wants to use bisect to pinpoint a regression-causing patch, you'll wish you
> had that information there.

In any commit pattern, if I use bisect to pinpoint a regression
causing patch, I will find the commit that broke it.  Whoever made
that particular commit is to blame.  Full stop.  I don't really care
where in the development of the patch that was eventually committed
the breakage happened, and I do not want to wade through 50 revs of
somebody's private development to find the particular place where they
made a thinko.  I only care that their patch *as committed* is broken.I don't think that non-linear history is an
advantagein any 
situation.  It may be an unavoidable necessity if you have lots of
cross-merging between different repositories, but we don't, so for us
it's just clutter.

>> 4. History rewriting.  Under what circumstances, if any, are we OK
>> with rebasing the master?  For example, if we decide not to have merge
>> commits, and somebody does a merge commit anyway, are we going to
>> rebase to get rid of it?
>
> A good general rule of thumb is to treat publicly-exposed history as
> immutable.  As soon as you start rebasing trees you create misery for
> anybody working with those trees.
>
> If you're really set on avoiding things like merges, there are ways to set
> up scripts on the server to enforce policies.

Yep, Magnus coded it up today.  It works great.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: antisocial things you can do in git (but not CVS)

From

Andrew Dunstan

Date:

21 July 2010, 16:11:51

Jonathan Corbet wrote:
>> 3. Merge commits.  I believe that we have consensus that commits
>> should always be done as a "squash", so that the history of all of our
>> branches is linear.  But it seems to me that someone could
>> accidentally push a merge commit, either because they forgot to squash
>> locally, or because of a conflict between their local git repo's
>> master branch and origin/master.  Can we forbid this?
>>     
>
> That seems like a terrible idea to me - why would you destroy history?
> Obviously I've missed a discussion here.  But, the first time somebody
> wants to use bisect to pinpoint a regression-causing patch, you'll wish you
> had that information there.
>
>   

We have a clear idea of what should be part of the public history 
contained in the authoritative repo and what should be history that is 
private to the developer/tester/committer. We don't want to pollute the 
former with the latter. The level of granularity of our current CVS 
commits seems to us to be about right.

So when a committer pushes a patch it should add one fast-forward commit 
to the tree. We want to be able to bisect between these commit objects, 
but not between all the work product commits that led up to them. Of 
course, developers, committers and testers can keep what they like 
privately - we're only talking about what should go in the authoritative 
repo.

cheers

andrew

Re: antisocial things you can do in git (but not CVS)

From

Alvaro Herrera

Date:

21 July 2010, 16:34:50

Excerpts from Andrew Dunstan's message of mié jul 21 15:11:41 -0400 2010:
> 
> Jonathan Corbet wrote:

> > That seems like a terrible idea to me - why would you destroy history?
> > Obviously I've missed a discussion here.  But, the first time somebody
> > wants to use bisect to pinpoint a regression-causing patch, you'll wish you
> > had that information there.
> 

> So when a committer pushes a patch it should add one fast-forward commit 
> to the tree. We want to be able to bisect between these commit objects, 
> but not between all the work product commits that led up to them. Of 
> course, developers, committers and testers can keep what they like 
> privately - we're only talking about what should go in the authoritative 
> repo.

I don't disagree that we're going to squash commits, but I don't believe
that developers will be able to keep "what they like privately".  The
commit objects for the final patch are going to differ, if only because
they have different parents than the ones on the main branch.

Of course, they will be able to have a local branch with their local
patch, but to Git there will be no relationship between this branch and
the final, squashed patch in the authoritative repo.

Re: antisocial things you can do in git (but not CVS)

From

Jonathan Corbet

Date:

21 July 2010, 17:18:46

On Wed, 21 Jul 2010 15:11:41 -0400
Andrew Dunstan <andrew@dunslane.net> wrote:

> We have a clear idea of what should be part of the public history 
> contained in the authoritative repo and what should be history that is 
> private to the developer/tester/committer. We don't want to pollute the 
> former with the latter.

The thought makes me shudder...you lose the history, the reasons for
specific changes, the authorship of changes, and the ability of your
testers to pinpoint problematic changes.  But...your project, your
decision...we'll keep using PostgreSQL regardless...:)

Thanks,

jon

Re: antisocial things you can do in git (but not CVS)

From

Florian Weimer

Date:

23 July 2010, 06:28:43

* Robert Haas:

> 1. Inability to cleanly and easily (and programatically) identify who
> committed what.  With CVS, the author of a revision is the person who
> committed it, period.  With git, the author string can be set to
> anything the person typing 'git commit' feels like.

It's even more difficult than that.  Git does not record at all who
updated a particular branch to include a specific commit.  This
operation does not leave any metadata behind.  It is possible to write
a log file for audit purposes, but it's an out-of-band solution.

> My preference would be to stick to a style where we identify the
> committer using the author tag and note the patch author, reviewers,
> whether the committer made changes, etc. in the commit message.  A
> single author field doesn't feel like enough for our workflow, and
> having a mix of authors and committers in the author field seems like
> a mess.

It would be possible to enforce that on the server side, but it would
interfere with merges.

> 3. Merge commits.  I believe that we have consensus that commits
> should always be done as a "squash", so that the history of all of our
> branches is linear.  But it seems to me that someone could
> accidentally push a merge commit, either because they forgot to squash
> locally, or because of a conflict between their local git repo's
> master branch and origin/master.  Can we forbid this?

It's possible to do this with some scripting on the server side.

--
Florian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99

Re: antisocial things you can do in git (but not CVS)

From

Guillaume Lelarge

Date:

24 July 2010, 07:21:33

Le 21/07/2010 09:53, Dave Page a écrit :
> On Tue, Jul 20, 2010 at 8:12 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
>>> My preference would be to stick to a style where we identify the
>>> committer using the author tag and note the patch author, reviewers,
>>> whether the committer made changes, etc. in the commit message.  A
>>> single author field doesn't feel like enough for our workflow, and
>>> having a mix of authors and committers in the author field seems like
>>> a mess.
>>
>> Well, I had looked forward to actually putting the real author into the
>> author field.
> 
> I hadn't realised that was possible until Guillaume did so on his
> first commit to the new pgAdmin GIT repo. It seems to work nicely:
> 
> http://git.postgresql.org/gitweb?p=pgadmin3.git;a=commit;h=08e2826d90129bd4e4b3b7462bab682dd6a703e4
> 

It's one of the nice things with git. So, I'm eager to use it with the
pgAdmin repo.


-- 
Guillaumehttp://www.postgresql.frhttp://dalibo.com

Re: antisocial things you can do in git (but not CVS)

From

Ron Mayer

Date:

24 July 2010, 11:09:42

Robert Haas wrote:
> 
> If git had a place to store all the information we care about, that
> would be fine...
> 
> There's no "reviewer" header, and there's no concept that a patch
> might have come from the author (or perhaps multiple authors), but
> then have been adjusted by one or more reviewers and then frobnicated
> some more by the committer
> ...
> I don't think that non-linear history is an advantage in any
> situation.

ISTM we could have the best of both those worlds - linear history
and author&reviewer&committer information.

Instead of squashing every patch into a single commit, what if it got
squashed into a perhaps 3 separate commits -- one as submitted, one
as reviewed, and one as re-written by the committer.  History stays
linear; and you keep the most relevant parts of the history,
while dropping all the development brainstorming commits.

And ISTM the patch reviewer could be responsible for this squashing
so it's not much more work for the committer.

For example, instead of
   commit 351c0b92eca40c1a36934cf83fe75db9dc458cde   Author: Robert Haas <robertmhaas@gmail.com>   Date:   Fri Jul 23
00:43:002010 +0000   Avoid deep recursion when assigning XIDs to multiple levels of subxacts.   Andres Freund, with
cleanupand adjustment for older branches by me.

we'd see
   Author: Andreas Freund   Date:   Fri Jul 23 00:43:00 2010 +0000   Avoid deep recursion when assigning XIDs to
multiplelevels of subxacts.   Path as originally submitted to commit fest

   Author: [Whomever the reviewer was]   Date:   Fri Jul 23 00:43:00 2010 +0000   Avoid deep recursion when assigning
XIDsto multiple levels of subxacts.   Patch marked read for committer by reviewer.

   Author: Robert Haas <robertmhaas@gmail.com>   Date:   Fri Jul 23 00:43:00 2010 +0000   Avoid deep recursion when
assigningXIDs to multiple levels of subxacts.   Patch as rewritten by committer.

For a complex enough patch with many authors, the reviewer could choose to
keep an extra author commit or two to credit the extra authors.

Re: antisocial things you can do in git (but not CVS)

From

Peter Eisentraut

Date:

24 July 2010, 14:28:47

On lör, 2010-07-24 at 07:02 -0700, Ron Mayer wrote:
> Instead of squashing every patch into a single commit, what if it got
> squashed into a perhaps 3 separate commits -- one as submitted, one
> as reviewed, and one as re-written by the committer.  History stays
> linear; and you keep the most relevant parts of the history,
> while dropping all the development brainstorming commits.

But then there would be commits in the main repository that were known
not good enough.

Re: antisocial things you can do in git (but not CVS)

From

Andrew Dunstan

Date:

24 July 2010, 14:49:02

Peter Eisentraut wrote:
> On lör, 2010-07-24 at 07:02 -0700, Ron Mayer wrote:
>   
>> Instead of squashing every patch into a single commit, what if it got
>> squashed into a perhaps 3 separate commits -- one as submitted, one
>> as reviewed, and one as re-written by the committer.  History stays
>> linear; and you keep the most relevant parts of the history,
>> while dropping all the development brainstorming commits.
>>     
>
> But then there would be commits in the main repository that were known
> not good enough.
>
>   

Yeah. Also, please bear in mind that our explicit aim here is to make 
this change with a minimal disruption to existing work flows. So to all 
those people who want to say "Look, you can now do all these cool 
things" my answer is "Maybe we'll do them later, but not right now."

cheers

andrew

Re: antisocial things you can do in git (but not CVS)

From

"Joshua D. Drake"

Date:

24 July 2010, 16:24:58

On Sat, 2010-07-24 at 13:48 -0400, Andrew Dunstan wrote:
>

> Yeah. Also, please bear in mind that our explicit aim here is to make
> this change with a minimal disruption to existing work flows. So to all
> those people who want to say "Look, you can now do all these cool
> things" my answer is "Maybe we'll do them later, but not right now."

Amen brother. Git is a universe different than CVS/SVN. Let's practice
KISS for a while shall we.

JD


--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt

Re: antisocial things you can do in git (but not CVS)

From

"Joshua D. Drake"

Date:

24 July 2010, 23:24:35

On Sat, 2010-07-24 at 13:48 -0400, Andrew Dunstan wrote:
> 

> Yeah. Also, please bear in mind that our explicit aim here is to make 
> this change with a minimal disruption to existing work flows. So to all 
> those people who want to say "Look, you can now do all these cool 
> things" my answer is "Maybe we'll do them later, but not right now."

Amen brother. Git is a universe different than CVS/SVN. Let's practice
KISS for a while shall we.

JD


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt