Thread: Re: PostgreSQL Developer meeting minutes up

Re: PostgreSQL Developer meeting minutes up

From
Andrew Dunstan
Date:
[moving this onto -hackers, where I think it belongs]

Tom Lane wrote:
> Huh?  The buildfarm will only prove that HEAD of the active branches
> builds.  What the concern was was whether we could correctly extract
> past states (particularly, but not solely, the tags corresponding to
> releases) from a converted git repository.  The testing I had in mind
> was to check out various tags and diff that tree against actual release
> tarballs.
>
>             
>   

It appears that our git repo is only picking up the branch tags (e.g. 
REL8_0_STABLE) , not all the release tags (e.g. REL8_0_5) . That needs 
to be fixed (if possible).

cheers

andrew


Re: PostgreSQL Developer meeting minutes up

From
Magnus Hagander
Date:
Andrew Dunstan wrote:
> 
> [moving this onto -hackers, where I think it belongs]
> 
> Tom Lane wrote:
>> Huh?  The buildfarm will only prove that HEAD of the active branches
>> builds.  What the concern was was whether we could correctly extract
>> past states (particularly, but not solely, the tags corresponding to
>> releases) from a converted git repository.  The testing I had in mind
>> was to check out various tags and diff that tree against actual release
>> tarballs.
>>
>>            
>>   
> 
> It appears that our git repo is only picking up the branch tags (e.g.
> REL8_0_STABLE) , not all the release tags (e.g. REL8_0_5) . That needs
> to be fixed (if possible).

Hmm. I looked through the source of the import script. It appears to
mention tags here and there, but doesn't seem to do it. There is a
comment that reads:
     # Previous CVS versions just added the tag to the current HEAD     # revision and didn't insert a dead revision on
thebranch with     # the same date, like it is happening now.     # This means history is unclear as we can't reliably
determine    # if the tagging happened at the same time as the addition to     # the branch.  For now, just assume it
did.    #     # XXX can't reproduce for now, disabling, as it breaks some     # things     #
 


Basically, it comes down to cvs tags not being actual first class
happening, but just metadata on files.

I'm sure we could script the creation of these tags fairly reliably on
*our* repository since we know which files are always updated when a tag
is added. I'm thinking we could just parse the log for configure.in and
grab the tags from there. Thoughts?

-- Magnus HaganderSelf: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: PostgreSQL Developer meeting minutes up

From
Aidan Van Dyk
Date:
Again,

This has been raised and ignored many times before on -hackers... The
reason is because the tags in the CVS repository are "broken" (i.e they
are such that it's impossible to actually create all the tags), so the
git "cvsimport" tools that try to tags all croak on the PG CVS repository.

The tool which doesn't croak doesn't try and import all the tags, just
the sticky "branch tags"...

Scripts to "fix" (actually, remove) the broken tags have also been
posted, along with requests that if somebody is "mucking" with the
actual repository, to make sure it's known about, and access is "denied"
during the mucking period (access being any rsync/anoncvs/mirroring of
the cvs root).

As long as the tags are broken, you aren't going to get the tags
imported.  

If you're going to fix the tags, warn everybody (because most people
doing automatic conversions must know - they may need to be very
careful to avoid a full re-import), do it, and let us know when it's
done.

a.

* Andrew Dunstan <andrew@dunslane.net> [090526 10:41]:
>
> [moving this onto -hackers, where I think it belongs]
>
> Tom Lane wrote:
>> Huh?  The buildfarm will only prove that HEAD of the active branches
>> builds.  What the concern was was whether we could correctly extract
>> past states (particularly, but not solely, the tags corresponding to
>> releases) from a converted git repository.  The testing I had in mind
>> was to check out various tags and diff that tree against actual release
>> tarballs.
>>
>>             
>>   
>
> It appears that our git repo is only picking up the branch tags (e.g.  
> REL8_0_STABLE) , not all the release tags (e.g. REL8_0_5) . That needs  
> to be fixed (if possible).
>
> cheers
>
> andrew
>
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: PostgreSQL Developer meeting minutes up

From
Tom Lane
Date:
Aidan Van Dyk <aidan@highrise.ca> writes:
> This has been raised and ignored many times before on -hackers... The
> reason is because the tags in the CVS repository are "broken" (i.e they
> are such that it's impossible to actually create all the tags), so the
> git "cvsimport" tools that try to tags all croak on the PG CVS repository.

> The tool which doesn't croak doesn't try and import all the tags, just
> the sticky "branch tags"...

> Scripts to "fix" (actually, remove) the broken tags have also been
> posted, along with requests that if somebody is "mucking" with the
> actual repository, to make sure it's known about, and access is "denied"
> during the mucking period (access being any rsync/anoncvs/mirroring of
> the cvs root).

Up to now I've always been of the opinion that fixing those tags wasn't
worth taking any risk for.  But if we are thinking of moving away from
CVS, then this clearly becomes one of the hurdles we have to jump on the
way.  Can you refresh our memory about which tags are problematic and
exactly what needs to be done about 'em?
        regards, tom lane


Re: PostgreSQL Developer meeting minutes up

From
Aidan Van Dyk
Date:
* Tom Lane <tgl@sss.pgh.pa.us> [090526 11:20]:
> Aidan Van Dyk <aidan@highrise.ca> writes:
> > This has been raised and ignored many times before on -hackers... The
> > reason is because the tags in the CVS repository are "broken" (i.e they
> > are such that it's impossible to actually create all the tags), so the
> > git "cvsimport" tools that try to tags all croak on the PG CVS repository.
> 
> > The tool which doesn't croak doesn't try and import all the tags, just
> > the sticky "branch tags"...
> 
> > Scripts to "fix" (actually, remove) the broken tags have also been
> > posted, along with requests that if somebody is "mucking" with the
> > actual repository, to make sure it's known about, and access is "denied"
> > during the mucking period (access being any rsync/anoncvs/mirroring of
> > the cvs root).
> 
> Up to now I've always been of the opinion that fixing those tags wasn't
> worth taking any risk for.  But if we are thinking of moving away from
> CVS, then this clearly becomes one of the hurdles we have to jump on the
> way.  Can you refresh our memory about which tags are problematic and
> exactly what needs to be done about 'em?

Specifically, it's 2 tags, and I just remove them:   REL7_1_BETA2   REL7_1_BETA3

Previous
threads:http://news.gmane.org/find-root.php?message_id=20080220225300.GE16099@yugib.highrise.cahttp://news.gmane.org/find-root.php?message_id=20081229155140.GP12094@yugib.highrise.ca

a.

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: PostgreSQL Developer meeting minutes up

From
Andrew Dunstan
Date:

Tom Lane wrote:
> Aidan Van Dyk <aidan@highrise.ca> writes:
>   
>> This has been raised and ignored many times before on -hackers... The
>> reason is because the tags in the CVS repository are "broken" (i.e they
>> are such that it's impossible to actually create all the tags), so the
>> git "cvsimport" tools that try to tags all croak on the PG CVS repository.
>>     
>
>   
>> The tool which doesn't croak doesn't try and import all the tags, just
>> the sticky "branch tags"...
>>     
>
>   
>> Scripts to "fix" (actually, remove) the broken tags have also been
>> posted, along with requests that if somebody is "mucking" with the
>> actual repository, to make sure it's known about, and access is "denied"
>> during the mucking period (access being any rsync/anoncvs/mirroring of
>> the cvs root).
>>     
>
> Up to now I've always been of the opinion that fixing those tags wasn't
> worth taking any risk for.  But if we are thinking of moving away from
> CVS, then this clearly becomes one of the hurdles we have to jump on the
> way.  Can you refresh our memory about which tags are problematic and
> exactly what needs to be done about 'em?
>
>             
>   

I think we need just to remove the two tags in question (they have long 
been irrelevant). Prudence suggests that we should do that some time 
(weeks, I think) after the 8.4 release, when reverting ,if we find any 
breakage, won't be too painful.

cheers

andrew



Re: PostgreSQL Developer meeting minutes up

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> Up to now I've always been of the opinion that fixing those tags wasn't
>> worth taking any risk for.  But if we are thinking of moving away from
>> CVS, then this clearly becomes one of the hurdles we have to jump on the
>> way.

> I think we need just to remove the two tags in question (they have long 
> been irrelevant). Prudence suggests that we should do that some time 
> (weeks, I think) after the 8.4 release, when reverting ,if we find any 
> breakage, won't be too painful.

I don't see a lot of point in waiting till after 8.4.0.  There is no
time, ever, where we are sure there will be no release for weeks ---
a security or data-loss bug could crop up at any time.  And not messing
up back branch update releases is even more important than not messing
up 8.4.0, because the back branches are much more likely to get dropped
straight into production.

Obviously we want a solid backup of the pre-modification CVS repository,
and we have to follow Aidan's advice about synchronizing the change with
mirror repositories, but I don't see a strong argument for waiting weeks
to do this.  I think we should get it over with, so people can get on
with the work that it's blocking.
        regards, tom lane


Re: PostgreSQL Developer meeting minutes up

From
Tom Lane
Date:
Aidan Van Dyk <aidan@highrise.ca> writes:
> Specifically, it's 2 tags, and I just remove them:
>     REL7_1_BETA2
>     REL7_1_BETA3

> Previous threads:
>     http://news.gmane.org/find-root.php?message_id=20080220225300.GE16099@yugib.highrise.ca
>     http://news.gmane.org/find-root.php?message_id=20081229155140.GP12094@yugib.highrise.ca

It looks like the ill-considered commit message mentioned in that first
thread hasn't been dealt with, either.
        regards, tom lane


Re: PostgreSQL Developer meeting minutes up

From
"Marc G. Fournier"
Date:
On Tue, 26 May 2009, Aidan Van Dyk wrote:

> * Tom Lane <tgl@sss.pgh.pa.us> [090526 11:20]:
>> Aidan Van Dyk <aidan@highrise.ca> writes:
>>> This has been raised and ignored many times before on -hackers... The
>>> reason is because the tags in the CVS repository are "broken" (i.e they
>>> are such that it's impossible to actually create all the tags), so the
>>> git "cvsimport" tools that try to tags all croak on the PG CVS repository.
>>
>>> The tool which doesn't croak doesn't try and import all the tags, just
>>> the sticky "branch tags"...
>>
>>> Scripts to "fix" (actually, remove) the broken tags have also been
>>> posted, along with requests that if somebody is "mucking" with the
>>> actual repository, to make sure it's known about, and access is "denied"
>>> during the mucking period (access being any rsync/anoncvs/mirroring of
>>> the cvs root).
>>
>> Up to now I've always been of the opinion that fixing those tags wasn't
>> worth taking any risk for.  But if we are thinking of moving away from
>> CVS, then this clearly becomes one of the hurdles we have to jump on the
>> way.  Can you refresh our memory about which tags are problematic and
>> exactly what needs to be done about 'em?
>
> Specifically, it's 2 tags, and I just remove them:
>    REL7_1_BETA2
>    REL7_1_BETA3

So, you are suggesting:

cvs -q tag -d REL7_1_BETA2 .
cvs -q tag -d REL7_1_BETA3 .

correct?

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org                              MSN . scrappy@hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664


Re: PostgreSQL Developer meeting minutes up

From
Peter Eisentraut
Date:
On Tuesday 26 May 2009 17:44:59 Magnus Hagander wrote:
> Hmm. I looked through the source of the import script. It appears to
> mention tags here and there, but doesn't seem to do it.

Which is part of the reason we use this script and not one of the other ones, 
because some of our tags are broken.  Any conversion will either have to drop 
or repair some tags; see my previous posts to hackers about this.


Re: PostgreSQL Developer meeting minutes up

From
"Markus Wanner"
Date:
Hi,

Quoting "Aidan Van Dyk" <aidan@highrise.ca>:
> This has been raised and ignored many times before on -hackers... The
> reason is because the tags in the CVS repository are "broken"

Please keep in mind that the amount of "brokenness" here depends a lot
on the tool used for the conversion to git. AFAIK 'git cvsimport' is
used for the conversion of the Postgres repository to git.
git-cvsimport uses cvsps, which is known for its deficiencies.

> (i.e they
> are such that it's impossible to actually create all the tags), so the
> git "cvsimport" tools that try to tags all croak on the PG CVS repository.
>
> The tool which doesn't croak doesn't try and import all the tags, just
> the sticky "branch tags"...

I cannot confirm that assertion. I've just tried with cvs2svn, which
converts the Postgres repository just fine, including 170 tags, among
them REL7_1_BETA2 and REL7_1_BETA3. A quick glance at the resulting
checkouts' diff looks pretty good as well.

I consider cvsps to be lacking rather than blaming the Postgres CVS
repository. (Of course that doesn't mean the Postgres CVS repository
is perfectly self-consistent - CVS repositories aren't by definition.
I'm just pointing out that there are tools with better heuristics than
those of cvsps.)

Has anybody ever tried using cvs2git? Being based on cvs2svn, it
should yield better results than cvsps. It's even recommended from the
issues section of the git-cvsimport man page [1]. And git-cvsimport
seems to be able to continue from an initial import with cvs2git.

Regards

Markus Wanner


Re: PostgreSQL Developer meeting minutes up

From
"Markus Wanner"
Date:
Hi,

Quoting "Peter Eisentraut" <peter_e@gmx.net>:
> .. some of our tags are broken.  Any conversion will either have to drop
> or repair some tags; see my previous posts to hackers about this.

Can you please point me to that post? I didn't find it.

In what way do you consider the tags "broken"? (As CVS does not  
guarantee any inter-file consistency, I don't think one can speak of  
brokenness at all. IMO it's rather just a matter of how to convert to  
another VCS's representation. Certainly, it's not always possible to  
convert without any kind of information loss. However, as just pointed  
out, there are certainly differences between converters).

Regards

Markus Wanner




Re: PostgreSQL Developer meeting minutes up

From
Magnus Hagander
Date:
Markus Wanner wrote:
> Hi,
> 
> Quoting "Aidan Van Dyk" <aidan@highrise.ca>:
>> This has been raised and ignored many times before on -hackers... The
>> reason is because the tags in the CVS repository are "broken"
> 
> Please keep in mind that the amount of "brokenness" here depends a lot
> on the tool used for the conversion to git. AFAIK 'git cvsimport' is
> used for the conversion of the Postgres repository to git. git-cvsimport
> uses cvsps, which is known for its deficiencies.

No, we use fromcvs, not "git cvsimport".

IIRC that was the only one people could make working with incremental
updates.


-- Magnus HaganderSelf: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: PostgreSQL Developer meeting minutes up

From
Heikki Linnakangas
Date:
Magnus Hagander wrote:
> Markus Wanner wrote:
>> Quoting "Aidan Van Dyk" <aidan@highrise.ca>:
>>> This has been raised and ignored many times before on -hackers... The
>>> reason is because the tags in the CVS repository are "broken"
>> Please keep in mind that the amount of "brokenness" here depends a lot
>> on the tool used for the conversion to git. AFAIK 'git cvsimport' is
>> used for the conversion of the Postgres repository to git. git-cvsimport
>> uses cvsps, which is known for its deficiencies.
> 
> No, we use fromcvs, not "git cvsimport".
> 
> IIRC that was the only one people could make working with incremental
> updates.

Right. When I looked at the converters last time, there was others that 
produce a better conversion, but they didn't work incrementally. If 
we're going to switch over the main repository, we should look at the 
alternatives.

OTOH, there's some value in staying with current GIT repository. In 
EnterpriseDB, we maintain all the Oracle-compatibility stuff in a GIT 
repository that's based on the PostgreSQL mirror. If PostgreSQL switches 
to a new GIT repository/mirror, I'll have to rebase all that, and I'm 
not sure how well that works with all the merges and stuff. I'm probably 
the one with most complex situation, but others who have 
work-in-progress patches in local repositories will face the same issue 
at a smaller scale.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: PostgreSQL Developer meeting minutes up

From
"Markus Wanner"
Date:
Hi,

Quoting "Magnus Hagander" <magnus@hagander.net>:
> No, we use fromcvs, not "git cvsimport".

Oh, thanks for the correction.

I haven't heard of fromcvs before, but solely judging by lines of  
code, it's hardly as elaborate as cvs2svn. So my arguments hold true  
for it as well.

> IIRC that was the only one people could make working with incremental
> updates.

Understood.

Regards

Markus Wanner



Re: PostgreSQL Developer meeting minutes up

From
Peter Eisentraut
Date:
On Wednesday 27 May 2009 13:25:14 Markus Wanner wrote:
> Hi,
>
> Quoting "Peter Eisentraut" <peter_e@gmx.net>:
> > .. some of our tags are broken.  Any conversion will either have to drop
> > or repair some tags; see my previous posts to hackers about this.
>
> Can you please point me to that post? I didn't find it.

http://archives.postgresql.org/pgsql-hackers/2008-12/msg01879.php

> In what way do you consider the tags "broken"?

The tag applies to different commits on different files.

> (As CVS does not
> guarantee any inter-file consistency, I don't think one can speak of
> brokenness at all.

Just because CVS doesn't guarantee it, it doesn't mean it's not broken.  
Otherwise, any possible random permutation of files would be a non-broken 
checkout by your definition.



Re: PostgreSQL Developer meeting minutes up

From
Peter Eisentraut
Date:
On Wednesday 27 May 2009 00:54:52 Marc G. Fournier wrote:
> So, you are suggesting:
>
> cvs -q tag -d REL7_1_BETA2 .
> cvs -q tag -d REL7_1_BETA3 .

Note that there are actually two different issues related to tags:

One is, the tags REL7_1_BETA2 and REL7_1_BETA3 cannot be parsed by cvsps.  But 
no one has analyzed why that is.  Nor is there any proof that they are wrong 
or broken.

The other is, the tag REL7_1 produces different files than were actually in 
the release.  cvsps warns about this.  I had posted a patch to fix this.


Re: PostgreSQL Developer meeting minutes up

From
Aidan Van Dyk
Date:
* Markus Wanner <markus@bluegap.ch> [090527 06:20]:

> Has anybody ever tried using cvs2git? Being based on cvs2svn, it should 
> yield better results than cvsps. It's even recommended from the issues 
> section of the git-cvsimport man page [1]. And git-cvsimport seems to be 
> able to continue from an initial import with cvs2git.

cvs2svn (and hence cvs2git  certainly has *oodles* of code explicitly to
try and deal with "weird" (non-linear) cvs histories (a la type of the
PG repo)...  I'm not sure I would take the reference to "the old cvs2git
tool" to be the cvs2git that's currently active and based on cvs2svn...
If anybody can confirm that the incremental git cvsimport can follow a
recent cvs2git conversion, that would definitely be awesome!  If I can
come across a few free hours some time, I might even try it myself!

a.

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: PostgreSQL Developer meeting minutes up

From
Aidan Van Dyk
Date:
* Marc G. Fournier <scrappy@hub.org> [090526 18:00]:

>> Specifically, it's 2 tags, and I just remove them:
>>    REL7_1_BETA2
>>    REL7_1_BETA3
>
> So, you are suggesting:
>
> cvs -q tag -d REL7_1_BETA2 .
> cvs -q tag -d REL7_1_BETA3 .
>
> correct?

Not directly, I claim *no* knowledge of the safety of any CVS commands
;-)

But whatever you do, please, lock us our of the repository first (by us,
I mean any publc access to it, via anoncvs, rsync, mirror, anything),
notify us first, (give us at *least* day warning so we can disable any
automatic conversions), and let us know when it's done, so we can watch
the first run after any conversion carefully.

a.

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: PostgreSQL Developer meeting minutes up

From
Aidan Van Dyk
Date:
* Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> [090527 07:29]:

> OTOH, there's some value in staying with current GIT repository. In  
> EnterpriseDB, we maintain all the Oracle-compatibility stuff in a GIT  
> repository that's based on the PostgreSQL mirror. If PostgreSQL switches  
> to a new GIT repository/mirror, I'll have to rebase all that, and I'm  
> not sure how well that works with all the merges and stuff. I'm probably  
> the one with most complex situation, but others who have  
> work-in-progress patches in local repositories will face the same issue  
> at a smaller scale.

But there are oodles of options in git available to handle a cutover
like that:
- grafts
- filter-branch
- rebase (the new rebase toolset can even attempt to rebase a DAG onto an existing DAG, not just linear patches))

So I'm whatever becomes the "official" git repo can simply be "grafted"
into your history, your your new development grafted on to the
"official" history...

But, if there is nothing wrong with the current repo (except that it
doesn't have tags), than we can easily add tags to it...

a.

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: PostgreSQL Developer meeting minutes up

From
Heikki Linnakangas
Date:
Aidan Van Dyk wrote:
> * Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> [090527 07:29]:
> 
>> OTOH, there's some value in staying with current GIT repository. In  
>> EnterpriseDB, we maintain all the Oracle-compatibility stuff in a GIT  
>> repository that's based on the PostgreSQL mirror. If PostgreSQL switches  
>> to a new GIT repository/mirror, I'll have to rebase all that, and I'm  
>> not sure how well that works with all the merges and stuff. I'm probably  
>> the one with most complex situation, but others who have  
>> work-in-progress patches in local repositories will face the same issue  
>> at a smaller scale.
> 
> But there are oodles of options in git available to handle a cutover
> like that:
> - grafts
> - filter-branch

Okay, your git-fu is stronger than mine, I had never heard of grafts 
before :-).

> - rebase (the new rebase toolset can even attempt to rebase a DAG onto an
>   existing DAG, not just linear patches))

That's interesting, I once tested git-rebase on the version I have 
installed on a similar scenario and it didn't handle merges. If it does 
now, that's great.

> But, if there is nothing wrong with the current repo (except that it
> doesn't have tags), than we can easily add tags to it...

Yep. There's not *that* many tags in the CVS repository, we could just 
add them manually.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: PostgreSQL Developer meeting minutes up

From
Markus Wanner
Date:
Hi,

Peter Eisentraut wrote:
> http://archives.postgresql.org/pgsql-hackers/2008-12/msg01879.php

Thanks for the link. I'm assuming you've adjusted the tags to fit a
single commit.

Out of curiosity: do you think (or have evidence that) this is the only
tag that spans multiple commits?

>> In what way do you consider the tags "broken"?
> 
> The tag applies to different commits on different files.

That's perfectly valid for CVS (and can be represented in subversion as
well). Such a tag cannot (easily) be converted to git, though (nor
mercurial or monotone), where tags are attached to a single commit.

>> (As CVS does not
>> guarantee any inter-file consistency, I don't think one can speak of
>> brokenness at all.
> 
> Just because CVS doesn't guarantee it, it doesn't mean it's not broken.

It depends on your understanding of what a tag is. CVS and subversion
certainly have a different understanding from yours (and sometimes tout
this as a feature): their tags can easily span multiple commits.

You (as well as myself, BTW) seem to think of a tag like something
that's attached to a single commit.

> Otherwise, any possible random permutation of files would be a non-broken 
> checkout by your definition.

Note that this is not necessarily my definition, rather CVS's (or that
of subversion). And yes, CVS repositories can be pretty badly screwed.

Regards

Markus Wanner



Re: PostgreSQL Developer meeting minutes up

From
Robert Haas
Date:
On Wed, May 27, 2009 at 11:44 AM, Markus Wanner <markus@bluegap.ch> wrote:
> Peter Eisentraut wrote:
>>> In what way do you consider the tags "broken"?
>> The tag applies to different commits on different files.
> That's perfectly valid for CVS (and can be represented in subversion as
> well). Such a tag cannot (easily) be converted to git, though (nor
> mercurial or monotone), where tags are attached to a single commit.
[...]
> You (as well as myself, BTW) seem to think of a tag like something
> that's attached to a single commit.

I think this is a semantic argument.  The problem isn't that we don't
understand how CVS behaves; it's that we find that behavior
undesirable, aka broken.  If we really care about having a tag that
contains the exact files that are tagged in CVS, we can create a
branch from one of the commits involved, and then apply a commit to
that branch that places it in the state that matches the contents of
the CVS tag.  AIUI, this is not very different from what you'd have to
do in Subversion, where a tag is a branch is a copy.

...Robert


Re: PostgreSQL Developer meeting minutes up

From
Aidan Van Dyk
Date:
* Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> [090527 07:29]:

> OTOH, there's some value in staying with current GIT repository. In
> EnterpriseDB, we maintain all the Oracle-compatibility stuff in a GIT
> repository that's based on the PostgreSQL mirror. If PostgreSQL switches
> to a new GIT repository/mirror, I'll have to rebase all that, and I'm
> not sure how well that works with all the merges and stuff. I'm probably
> the one with most complex situation, but others who have
> work-in-progress patches in local repositories will face the same issue
> at a smaller scale.

OK, so I took a gander at the git repositories out there, namely the
"official" one (I'll call gpo) and the "original" (I'll call git):
   gpo: from git://git.postgresql.org/git/postgresql.git   git: from git://repo.or.cz/PostgreSQL.git

For each branch, I diffed them to a cvs export of that sticky tag they
are on.  Diffs are available here:   http://people.ifax.com/~aidan/pg/pg-git-cvs/

Since I run the git version of the conversion, I took a close look at
that one... All the differences in it (excpet for the 7.3 ones I'll
mention in a minute) are $Keyword$ differences, stuff like:-# My2Pg \$Revision: 1.27 $ \translated dump+# My2Pg
\$Revision$\translated dump 
and-/*     $OpenBSD$       */+/*     $OpenBSD: blf.c,v 1.3 2000/06/17 23:36:22 provos Exp $  */
Here, - is git and + is cvs, so you can see, my conversion has some even
backwards... And some are ugly:-Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net
(o1/$Revision:1.2 $) with ESMTP id BAA27265 for <pgman@candle.pha.pa.us>; Sat, 10 Jun 2000 01:16:07 -0400
(EDT)+Received:from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision$) with ESMTP id
BAA27265 for <pgman@candle.pha.pa.us>; Sat, 10 Jun 2000 01:16:07 -0400 (EDT) 
My conversion left the $Revision$ alone in the mail message, and the cvs
checkout munged it... Actually, notice how stuff like
src/backend/optimizer/plan/README has keywords in it that CVS munches too,
ugh!)

And the $Log$ Keyword make for some huge differences ;-(

But there seems to be some files "missing" in my REL7_3_STABLE, namely:pg-git-REL7_3_STABLE.diff-diff --git
a/contrib/xml/README.pgxmlb/contrib/xml/README.pgxmlpg-git-REL7_3_STABLE.diff-diff --git a/contrib/xml/pgxml.sql.in
b/contrib/xml/pgxml.sql.inpg-git-REL7_3_STABLE.diff-diff--git a/contrib/xml/pgxml_dom.sql.in
b/contrib/xml/pgxml_dom.sql.inpg-git-REL7_3_STABLE.diff-diff--git a/src/bin/pg_controldata/po/zh_CN.po
b/src/bin/pg_controldata/po/zh_CN.popg-git-REL7_3_STABLE.diff-diff--git a/src/bin/pg_resetxlog/po/zh_CN.po
b/src/bin/pg_resetxlog/po/zh_CN.popg-git-REL7_3_STABLE.diff-diff--git a/src/port/getopt.c
b/src/port/getopt.cpg-git-REL7_3_STABLE.diff-diff--git a/src/test/regress/expected/geometry-bsd-precision.out
b/src/test/regress/expected/geometry-bsd-precision.out
I'm not really sure why, I can look into it if it's important...

But in all, the differneces are small and otherwise all keyword related:pg-git-MANUAL_DIST.diff        0 files
changedpg-git-PG95-DIST.diff         0 files changedpg-git-PG95_DIST.diff          2 files changed, 2 insertions(+), 2
deletions(-)pg-git-REL2_0B.diff           2 files changed, 2 insertions(+), 2 deletions(-)pg-git-REL6_4.diff
14 files changed, 53 insertions(+), 13 deletions(-)pg-git-REL6_5_PATCHES.diff     20 files changed, 67 insertions(+),
20deletions(-)pg-git-REL7_0_PATCHES.diff     7 files changed, 12 insertions(+), 7 deletions(-)pg-git-REL7_1_STABLE.diff
    24 files changed, 149 insertions(+), 146 deletions(-)pg-git-REL7_2_STABLE.diff      30 files changed, 153
insertions(+),150 deletions(-)pg-git-REL7_3_STABLE.diff      59 files changed, 1457 insertions(+), 167
deletions(-)pg-git-REL7_4_STABLE.diff     53 files changed, 172 insertions(+), 169 deletions(-)pg-git-REL8_0_0.diff
     26 files changed, 44 insertions(+), 28 deletions(-)pg-git-REL8_0_STABLE.diff      26 files changed, 44
insertions(+),28 deletions(-)pg-git-REL8_1_STABLE.diff      25 files changed, 26 insertions(+), 26
deletions(-)pg-git-REL8_2_STABLE.diff     24 files changed, 26 insertions(+), 26 deletions(-)pg-git-REL8_3_STABLE.diff
   23 files changed, 25 insertions(+), 25 deletions(-)pg-git-Release_1_0_3.diff      2 files changed, 2 insertions(+),
2deletions(-)pg-git-WIN32_DEV.diff          53 files changed, 172 insertions(+), 169
deletions(-)pg-git-ecpg_big_bison.diff    0 files changedpg-git-master.diff             81 files changed, 85
insertions(+),85 deletions(-)       TOTAL                          234 files changed, 2491 insertions(+), 1065
deletions(-)


But the gpo conversion seems to be in pretty bad shape:aidan@db1-dapper:/tmp$ grep 'new file' pg-gpo-* | wc    301
1245  14096 
pg-gpo-MANUAL_DIST.diff        0 files changedpg-gpo-PG95-DIST.diff          0 files changedpg-gpo-PG95_DIST.diff
  2 files changed, 2 insertions(+), 2 deletions(-)pg-gpo-REL2_0B.diff            2 files changed, 2 insertions(+), 2
deletions(-)pg-gpo-REL6_4.diff            12 files changed, 51 insertions(+), 11 deletions(-)pg-gpo-REL6_5_PATCHES.diff
   12 files changed, 59 insertions(+), 12 deletions(-)pg-gpo-REL7_0_PATCHES.diff     3 files changed, 8 insertions(+),
3deletions(-)pg-gpo-REL7_1_STABLE.diff      22 files changed, 147 insertions(+), 144
deletions(-)pg-gpo-REL7_2_STABLE.diff     30 files changed, 153 insertions(+), 150
deletions(-)pg-gpo-REL7_3_STABLE.diff     58 files changed, 925 insertions(+), 167
deletions(-)pg-gpo-REL7_4_STABLE.diff     136 files changed, 43302 insertions(+), 169 deletions(-)pg-gpo-REL8_0_0.diff
        31 files changed, 2288 insertions(+), 28 deletions(-)pg-gpo-REL8_0_STABLE.diff      206 files changed, 16289
insertions(+),11798 deletions(-)pg-gpo-REL8_1_STABLE.diff      155 files changed, 24386 insertions(+), 26
deletions(-)pg-gpo-REL8_2_STABLE.diff     24 files changed, 26 insertions(+), 26 deletions(-)pg-gpo-REL8_3_STABLE.diff
   23 files changed, 25 insertions(+), 25 deletions(-)pg-gpo-Release_1_0_3.diff      2 files changed, 2 insertions(+),
2deletions(-)pg-gpo-WIN32_DEV.diff          119 files changed, 16637 insertions(+), 169
deletions(-)pg-gpo-ecpg_big_bison.diff    0 files changedpg-gpo-master.diff             81 files changed, 85
insertions(+),85 deletions(-)       TOTAL                          627 files changed, 104387 insertions(+), 12819
deletions(-)

And actually looking at the history of the gpo repo, the branches are all
messed up with "merges" and stuff that I'm not sure where they are coming
from...  8.2, 8.3, and master(HEAD) are all the same as my gpo repo, but the
back branchs are very bad...


a.


--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: PostgreSQL Developer meeting minutes up

From
"Marc G. Fournier"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



- --On Wednesday, May 27, 2009 16:33:28 +0300 Peter Eisentraut <peter_e@gmx.net> 
wrote:

> On Wednesday 27 May 2009 00:54:52 Marc G. Fournier wrote:
>> So, you are suggesting:
>>
>> cvs -q tag -d REL7_1_BETA2 .
>> cvs -q tag -d REL7_1_BETA3 .
>
> Note that there are actually two different issues related to tags:
>
> One is, the tags REL7_1_BETA2 and REL7_1_BETA3 cannot be parsed by cvsps.
> But  no one has analyzed why that is.  Nor is there any proof that they are
> wrong  or broken.

I'm curious as to what is different about these vs all the other tags I've ever 
done, both before, and after ...

> The other is, the tag REL7_1 produces different files than were actually in
> the release.  cvsps warns about this.  I had posted a patch to fix this.

Please repost ...





- -- 
Marc G. Fournier        Hub.Org Hosting Solutions S.A. (http://www.hub.org)
Email . scrappy@hub.org                              MSN . scrappy@hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.11 (FreeBSD)

iEYEARECAAYFAkod5DYACgkQ4QvfyHIvDvPfcgCfTCGz5JG5KmCQrbdx9+37l8sT
nFAAnjcH3oL11J5CIKR5ZIVHRtSe+MVj
=SE8O
-----END PGP SIGNATURE-----



Re: PostgreSQL Developer meeting minutes up

From
"Kevin Grittner"
Date:
"Marc G. Fournier" <scrappy@hub.org> wrote:
> I'm curious as to what is different about these vs all the other
> tags I've ever done, both before, and after ...
Any chance you tagged, changes were committed, and you then tagged
files from such a later commit as part of the release, or moved the
tag to the later commit?  Those are perfectly reasonable things to do
under the CVS philosophy, and not in line with the philosophy of some
of the other products.
If there's a chance you did that on a couple beta releases in that
time frame, and on no others, that might explain it.
One product's flexibility is another product's "broken".
-Kevin


Re: PostgreSQL Developer meeting minutes up

From
Robert Haas
Date:
On Wed, May 27, 2009 at 5:22 PM, Aidan Van Dyk <aidan@highrise.ca> wrote:
>    gpo: from git://git.postgresql.org/git/postgresql.git
>    git: from git://repo.or.cz/PostgreSQL.git
[...]
> But the gpo conversion seems to be in pretty bad shape:
>        aidan@db1-dapper:/tmp$ grep 'new file' pg-gpo-* | wc
>            301    1245   14096
>
>        pg-gpo-MANUAL_DIST.diff        0 files changed
>        pg-gpo-PG95-DIST.diff          0 files changed
>        pg-gpo-PG95_DIST.diff          2 files changed, 2 insertions(+), 2 deletions(-)
>        pg-gpo-REL2_0B.diff            2 files changed, 2 insertions(+), 2 deletions(-)
>        pg-gpo-REL6_4.diff             12 files changed, 51 insertions(+), 11 deletions(-)
>        pg-gpo-REL6_5_PATCHES.diff     12 files changed, 59 insertions(+), 12 deletions(-)
>        pg-gpo-REL7_0_PATCHES.diff     3 files changed, 8 insertions(+), 3 deletions(-)
>        pg-gpo-REL7_1_STABLE.diff      22 files changed, 147 insertions(+), 144 deletions(-)
>        pg-gpo-REL7_2_STABLE.diff      30 files changed, 153 insertions(+), 150 deletions(-)
>        pg-gpo-REL7_3_STABLE.diff      58 files changed, 925 insertions(+), 167 deletions(-)
>        pg-gpo-REL7_4_STABLE.diff      136 files changed, 43302 insertions(+), 169 deletions(-)
>        pg-gpo-REL8_0_0.diff           31 files changed, 2288 insertions(+), 28 deletions(-)
>        pg-gpo-REL8_0_STABLE.diff      206 files changed, 16289 insertions(+), 11798 deletions(-)
>        pg-gpo-REL8_1_STABLE.diff      155 files changed, 24386 insertions(+), 26 deletions(-)
>        pg-gpo-REL8_2_STABLE.diff      24 files changed, 26 insertions(+), 26 deletions(-)
>        pg-gpo-REL8_3_STABLE.diff      23 files changed, 25 insertions(+), 25 deletions(-)
>        pg-gpo-Release_1_0_3.diff      2 files changed, 2 insertions(+), 2 deletions(-)
>        pg-gpo-WIN32_DEV.diff          119 files changed, 16637 insertions(+), 169 deletions(-)
>        pg-gpo-ecpg_big_bison.diff     0 files changed
>        pg-gpo-master.diff             81 files changed, 85 insertions(+), 85 deletions(-)
>        TOTAL                          627 files changed, 104387 insertions(+), 12819 deletions(-)
>
> And actually looking at the history of the gpo repo, the branches are all
> messed up with "merges" and stuff that I'm not sure where they are coming
> from...  8.2, 8.3, and master(HEAD) are all the same as my gpo repo, but the
> back branchs are very bad...

This is really quite horrible.  What is the best way forward here?

...Robert


Re: PostgreSQL Developer meeting minutes up

From
"Marc G. Fournier"
Date:
On Wed, 27 May 2009, Kevin Grittner wrote:

> "Marc G. Fournier" <scrappy@hub.org> wrote:
>
>> I'm curious as to what is different about these vs all the other
>> tags I've ever done, both before, and after ...
>
> Any chance you tagged, changes were committed, and you then tagged
> files from such a later commit as part of the release, or moved the
> tag to the later commit?  Those are perfectly reasonable things to do
> under the CVS philosophy, and not in line with the philosophy of some
> of the other products.
>
> If there's a chance you did that on a couple beta releases in that
> time frame, and on no others, that might explain it.

Actually, I have done that on at least one of the 8.x tags too, so if that 
is it, more then those two tags should be causing issues ...

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org                              MSN . scrappy@hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664


Re: PostgreSQL Developer meeting minutes up

From
Aidan Van Dyk
Date:
* Robert Haas <robertmhaas@gmail.com> [090527 21:30]:

> > And actually looking at the history of the gpo repo, the branches are all
> > messed up with "merges" and stuff that I'm not sure where they are coming
> > from...  8.2, 8.3, and master(HEAD) are all the same as my gpo repo, but the
> > back branchs are very bad...
>
> This is really quite horrible.  What is the best way forward here?

That depends entirely on what the project wants.

If you're just developing on HEAD with git to submit patches to official
CVS use, don't do anything...  HEAD is in good state in both repos.

If you're using git to rebase patches and changes between branches, in
order to submit patches for official CVS use, use the PostgreSQL.git I
publish on repo.or.cz.  It's back branches are in pretty good state too.

If the "project" wants to provide canonical "git" repository as a
cut-over from CVS, and CVS isn't going to be used anymore (other than
served as an anonymous "mirror" of the git repo by git cvsserver, or
have an old CVSROOT available as a public download for purely historical
inspection), then it's probably best to do a single CVS->git conversion
with one of the better tools (like parsecvs) that don't do incremental,
and publish a set of good "graft" point for those using gpo who want to
switch their current development onto the new git repo.

But, since git's a completely Dvcs at its core, and was built by people
thoroughly immersed in the tool-needs of vastly distributed VCS stuff,
no matter what the project deems the "official" git repository, people
will be able to continue to keep their current investment in git
development with stuff like grafts, filter-branch, rebase, etc.

a.

--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: PostgreSQL Developer meeting minutes up

From
Robert Haas
Date:
On Wed, May 27, 2009 at 10:09 PM, Aidan Van Dyk <aidan@highrise.ca> wrote:
> * Robert Haas <robertmhaas@gmail.com> [090527 21:30]:
>
>> > And actually looking at the history of the gpo repo, the branches are all
>> > messed up with "merges" and stuff that I'm not sure where they are coming
>> > from...  8.2, 8.3, and master(HEAD) are all the same as my gpo repo, but the
>> > back branchs are very bad...
>>
>> This is really quite horrible.  What is the best way forward here?
>
> That depends entirely on what the project wants.

I can't speak for anyone else, but what I want is for the git tree on
git.postgresql.org to match CVS.

...Robert


Re: PostgreSQL Developer meeting minutes up

From
"Markus Wanner"
Date:
Hi,

Quoting "Marc G. Fournier" <scrappy@hub.org>:
> Please repost ...

Peter referred to this message here:

http://archives.postgresql.org/pgsql-hackers/2008-12/msg01879.php

However, please be cautious before applying such a patch.

Regards

Markus Wanner



Re: PostgreSQL Developer meeting minutes up

From
"Markus Wanner"
Date:
Hi,

Quoting "Marc G. Fournier" <scrappy@hub.org>:
> Actually, I have done that on at least one of the 8.x tags too, so
> if that is it, more then those two tags should be causing issues ...

Not *every* such issue causes problems. An example that's perfectly fine:
 cvs commit -m "first commit" fileA cvs tag TEST filA cvs commit -m "second commit" fileB cvs tag TEST fileB

In such a situation, a converter can easily "push-down" the tag TEST
to the second commit, because fileA is the same (in that revision) as
after the first commit. After all, the results in the RCS files are
exactly the same as if you did the following:
 cvs commit -m "first commit" fileA cvs commit -m "second commit" fileB cvs tag TEST fileA fileB

A converter can't possibly distinguish these two.

However, if both files get committed the second time, but only one
gets tagged, it gets problematic (always assuming the commit actually
changes the file):
 cvs commit -m "first commit" fileA cvs tag TEST filA cvs commit -m "second commit" fileA fileB cvs tag TEST fileB

That's perfectly valid from CVS's point of view, unwanted for the
Postgres repository and hard to handle for a converter to git (or
mercurial, monotone, etc..), because the tag TEST is on the first
commit for fileA but on the second for fileB, while both of fileA and
fileB differ between the commits.

Regards

Markus Wanner



Re: PostgreSQL Developer meeting minutes up

From
"Markus Wanner"
Date:
Hi,

Quoting "Robert Haas" <robertmhaas@gmail.com>:
> I think this is a semantic argument.  The problem isn't that we don't
> understand how CVS behaves; it's that we find that behavior
> undesirable

I fully agree to that and find it undesirable as well.

> aka broken.

Well, for some it's a feature, for others a bug ;-)

My point was that other converters have better support for such
(undesirable, but still existent) tags that span multiple commits. If
that's unwanted anyway, it seems cleaner to fix the CVS repository,
yes. Has that been done now? Or is somebody going to do it? (See
Peter's patch he just linked again upthread).

> If we really care about having a tag that
> contains the exact files that are tagged in CVS, we can create a
> branch from one of the commits involved, and then apply a commit to
> that branch that places it in the state that matches the contents of
> the CVS tag.

Exactly (with the difference that with the branch you preserve the
history of changes, while the variant with the tag does not).

> AIUI, this is not very different from what you'd have to
> do in Subversion, where a tag is a branch is a copy.

I think so, too. I'd even state that subversion doesn't really support
tagging, instead it simulates tags with branches.

Regards

Markus Wanner


Re: PostgreSQL Developer meeting minutes up

From
Aidan Van Dyk
Date:
* Robert Haas <robertmhaas@gmail.com> [090527 22:43]:
> On Wed, May 27, 2009 at 10:09 PM, Aidan Van Dyk <aidan@highrise.ca> wrote:
> > * Robert Haas <robertmhaas@gmail.com> [090527 21:30]:
> >
> >> > And actually looking at the history of the gpo repo, the branches are all
> >> > messed up with "merges" and stuff that I'm not sure where they are coming
> >> > from...  8.2, 8.3, and master(HEAD) are all the same as my gpo repo, but the
> >> > back branchs are very bad...
> >>
> >> This is really quite horrible.  What is the best way forward here?
> >
> > That depends entirely on what the project wants.
>
> I can't speak for anyone else, but what I want is for the git tree on
> git.postgresql.org to match CVS.

Well, sure, but I think the "way forward" part implied recognition that
the current tree at git.postgresql.org *doesn't* match CVS very closely
(for back branches), and that people currently rely on it and use it.

So, again, the answer to the question really does depend on what the
"canonical" VCS of the project is.  As of now, it's *still* CVS, and
those using either git repo can still develop and submit patches to CVS
easily.

When the project switches, there will probably need to be a more
canonical conversion, with one of the tools that doesn't support
incremental imports, and then people will have to adjust their current
repo with any of rebase/graft/filter-branch to adjust their work
history onto the "official" tree...

All that based on the assumption that when the project switches to git,
they actually want all the CVS history in their official tree.  Its
certainly not necessary, and possibly not even desirable...  PostgreSQL
could just as easily to a "linus" style switch when they switch to git,
and just "import" the latest release in each branch as the starting
point for each branch.  The git repository will have no history, and
people can choose which history they want to graft in...  CVSROOT can be
made available as a historical download.

a.

--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: PostgreSQL Developer meeting minutes up

From
Robert Haas
Date:
On Thu, May 28, 2009 at 8:59 AM, Aidan Van Dyk <aidan@highrise.ca> wrote:
> All that based on the assumption that when the project switches to git,
> they actually want all the CVS history in their official tree.  Its
> certainly not necessary, and possibly not even desirable...  PostgreSQL
> could just as easily to a "linus" style switch when they switch to git,
> and just "import" the latest release in each branch as the starting
> point for each branch.  The git repository will have no history, and
> people can choose which history they want to graft in...  CVSROOT can be
> made available as a historical download.

That would suck for me.  I use git log a lot to see how things have
changed over time.

...Robert


Re: PostgreSQL Developer meeting minutes up

From
Aidan Van Dyk
Date:
* Robert Haas <robertmhaas@gmail.com> [090528 09:49]:
> On Thu, May 28, 2009 at 8:59 AM, Aidan Van Dyk <aidan@highrise.ca> wrote:
> > All that based on the assumption that when the project switches to git,
> > they actually want all the CVS history in their official tree.  Its
> > certainly not necessary, and possibly not even desirable...  PostgreSQL
> > could just as easily to a "linus" style switch when they switch to git,
> > and just "import" the latest release in each branch as the starting
> > point for each branch.  The git repository will have no history, and
> > people can choose which history they want to graft in...  CVSROOT can be
> > made available as a historical download.
>
> That would suck for me.  I use git log a lot to see how things have
> changed over time.

No, the whole point is that you graft whatever history *you* want in...
So if PostgreSQL "offical" git only starts when the offical VCS was in
git, you graft on gpo, or git, or some personal one-time cvs2git or
parsecvs history you want in...

It would be the projects way of saying basically "None of the current
cvs imports are perfect and we recognize that.  So we're starting fresh,
use whatever historical cvs import *you* find best for your history and
graft it in".   Just the linux kernel has a few "historical" repos
available for people to graft into linus's tree which only started in
2.6.12.

If you have work that requires the history of the current gpo repo, you
keep using it.  If you have work requring the current git repo, you keep
using it.  If you have no work, but you're a stickler for perfect
imports, you start working on parsecvs and cvs2git, and make a new
history every time you find another quirk...

a.


--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: PostgreSQL Developer meeting minutes up

From
Andrew Dunstan
Date:

Robert Haas wrote:
> On Thu, May 28, 2009 at 8:59 AM, Aidan Van Dyk <aidan@highrise.ca> wrote:
>   
>> All that based on the assumption that when the project switches to git,
>> they actually want all the CVS history in their official tree.  Its
>> certainly not necessary, and possibly not even desirable...  PostgreSQL
>> could just as easily to a "linus" style switch when they switch to git,
>> and just "import" the latest release in each branch as the starting
>> point for each branch.  The git repository will have no history, and
>> people can choose which history they want to graft in...  CVSROOT can be
>> made available as a historical download.
>>     
>
> That would suck for me.  I use git log a lot to see how things have
> changed over time.
>
>
>   

Indeed. Losing the history is not an acceptable option.

cheers

andrew


Re: PostgreSQL Developer meeting minutes up

From
Robert Haas
Date:
On Thu, May 28, 2009 at 10:18 AM, Aidan Van Dyk <aidan@highrise.ca> wrote:
> * Robert Haas <robertmhaas@gmail.com> [090528 09:49]:
>> On Thu, May 28, 2009 at 8:59 AM, Aidan Van Dyk <aidan@highrise.ca> wrote:
>> > All that based on the assumption that when the project switches to git,
>> > they actually want all the CVS history in their official tree.  Its
>> > certainly not necessary, and possibly not even desirable...  PostgreSQL
>> > could just as easily to a "linus" style switch when they switch to git,
>> > and just "import" the latest release in each branch as the starting
>> > point for each branch.  The git repository will have no history, and
>> > people can choose which history they want to graft in...  CVSROOT can be
>> > made available as a historical download.
>>
>> That would suck for me.  I use git log a lot to see how things have
>> changed over time.
>
> No, the whole point is that you graft whatever history *you* want in...
> So if PostgreSQL "offical" git only starts when the offical VCS was in
> git, you graft on gpo, or git, or some personal one-time cvs2git or
> parsecvs history you want in...

I want the project infrastructure to do this for me so I don't have to
do anything except git clone.  It's not a big deal for me to port my
WIP over to a new git repo if this one is busted, which it sounds like
it is.  But I'm not interested in rolling my own history.

...Robert


Re: PostgreSQL Developer meeting minutes up

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Robert Haas wrote:
>> That would suck for me.  I use git log a lot to see how things have
>> changed over time.

> Indeed. Losing the history is not an acceptable option.

I think the same.  If git is not able to maintain our project history
then it is not mature enough to be considered as our official VCS.
This is not a negotiable requirement.
        regards, tom lane


Re: PostgreSQL Developer meeting minutes up

From
Greg Stark
Date:
On Thu, May 28, 2009 at 3:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> Robert Haas wrote:
>>> That would suck for me.  I use git log a lot to see how things have
>>> changed over time.
>
>> Indeed. Losing the history is not an acceptable option.
>
> I think the same.  If git is not able to maintain our project history
> then it is not mature enough to be considered as our official VCS.
> This is not a negotiable requirement.

I think the idea is that you could choose, for example, the level of
granularity you want to keep. That could be interesting in the future
-- someone who submitted a patch (or anyone who was working in that
area) might want to keep all their intermediate commits and not just
the one big commit for the whole feature.

But it's not like we have a lot of choices for our history. Only a few
patches were maintained in a distributed vc system so far and I don't
think many people followed them. Also given the massive changes
patches have tended to get when being committed keeping the history of
the patch development seems kind of pointless.

--
greg


Re: PostgreSQL Developer meeting minutes up

From
"Markus Wanner"
Date:
Hi,

Quoting "Tom Lane" <tgl@sss.pgh.pa.us>:
> I think the same.  If git is not able to maintain our project history
> then it is not mature enough to be considered as our official VCS.

As Aidan pointed out, the question is not *if* git can represent it.
It's rather *how*. Especially WRT changes of historical information in
the CVS repository underneath.

Heikki is considered about having to merge WIP branches in case the
(CVS and git repository) history changes, so he'd like to maintain the
old history as well as the changed one. OTOH Robert doesn't want to
fiddle with multiple histories and expects to have just exactly one
history. Obviously one can't have both. Either one has to rebase/merge
his changes onto the new history, or continue with multiple histories.

Being a monotone fan, I have to admit that git definitely provides the
most options on *how* to handle these cases, see Aidan's mail upthread.

Knowing most of the corruptions of CVS in use in the wild (by fiddling
with cvs_import for monotone) I now consider git (and svn, hg, bzr,
mtn..) to be more mature than CVS, certainly much more consistent. So
if maturity (not age) is your major concern, I'd rather flee from CVS
now than tomorrow.

Regards

Markus Wanner


Re: PostgreSQL Developer meeting minutes up

From
Robert Haas
Date:
On Thu, May 28, 2009 at 11:04 AM, Greg Stark <stark@enterprisedb.com> wrote:
> On Thu, May 28, 2009 at 3:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Andrew Dunstan <andrew@dunslane.net> writes:
>>> Robert Haas wrote:
>>>> That would suck for me.  I use git log a lot to see how things have
>>>> changed over time.
>>
>>> Indeed. Losing the history is not an acceptable option.
>>
>> I think the same.  If git is not able to maintain our project history
>> then it is not mature enough to be considered as our official VCS.
>> This is not a negotiable requirement.
>
> I think the idea is that you could choose, for example, the level of
> granularity you want to keep. That could be interesting in the future
> -- someone who submitted a patch (or anyone who was working in that
> area) might want to keep all their intermediate commits and not just
> the one big commit for the whole feature.

I don't think that was the idea - Aidan floated the idea of just
checking the current version of each branch into git, rather than
importing the full history from CVS (and letting indivdual cloners fix
their own history if they were so inclined).  I think that's a
non-starter.

I'm still not sure who is going to take responsibility for fixing the
git tree we have now.  I don't think it's going to work for us to
leave it broken until we're ready to do "the cutover", and then do one
monolithic move.  If the tools we're using to do the import now have
broken our tree, then we need to fix it, and them.  Ideally I'd like
to get a bi-directional conversion working, so that committers could
commit via either CVS or GIT during the transition, but I'm not sure
whether that's feasible.

...Robert


Re: PostgreSQL Developer meeting minutes up

From
Robert Haas
Date:
On Thu, May 28, 2009 at 11:40 AM, Markus Wanner <markus@bluegap.ch> wrote:
> Quoting "Tom Lane" <tgl@sss.pgh.pa.us>:
>> I think the same.  If git is not able to maintain our project history
>> then it is not mature enough to be considered as our official VCS.
>
> As Aidan pointed out, the question is not *if* git can represent it. It's
> rather *how*. Especially WRT changes of historical information in the CVS
> repository underneath.
>
> Heikki is considered about having to merge WIP branches in case the (CVS and
> git repository) history changes, so he'd like to maintain the old history as
> well as the changed one. OTOH Robert doesn't want to fiddle with multiple
> histories and expects to have just exactly one history. Obviously one can't
> have both. Either one has to rebase/merge his changes onto the new history,
> or continue with multiple histories.

My understanding is that the histories of some of the branches we have
now are flat-out wrong.  I don't have a problem keeping those
alongside the corrected history for ease of rebasing and porting
commits, but I don't want to punt the problem of figuring out what the
one, true, and correct history is to the user.  The canonical
repository needs to provide that, and if it provides other alternative
timelines (a la Star Trek) for the convenience of people in Heikki's
situation, that's OK too, as long as they are clearly labeled as such.I think ideally we'd phase those out and garbage
collectthem 
eventually, but we can certainly keep them for a while.

...Robert


Re: PostgreSQL Developer meeting minutes up

From
"Markus Wanner"
Date:
Hi,

Quoting "Robert Haas" <robertmhaas@gmail.com>:
> I don't think that was the idea - Aidan floated the idea of just
> checking the current version of each branch into git, rather than
> importing the full history from CVS (and letting indivdual cloners fix
> their own history if they were so inclined).  I think that's a
> non-starter.

I'd say it depends on how hard it is to "fix" one's history. If it's
just a config option instructing git to fetch everything before
revision X from repository Y...

OTOH, it would certainly be nicer to have a "default" history, where
only people who require another history would need such a config
option. I'm not quite sure what's possible there.

> I don't think it's going to work for us to
> leave it broken until we're ready to do "the cutover", and then do one
> monolithic move.

Agreed. However, I'm pretty certain this won't be the last time we
have to "fix" the git repository. Conversion from a bunch of RCS files
is just way too ambiguous.

> If the tools we're using to do the import now have
> broken our tree, then we need to fix it, and them.

..and the CVS repository.

Regards

Markus Wanner



Re: PostgreSQL Developer meeting minutes up

From
"Markus Wanner"
Date:
Hi,

Quoting "Robert Haas" <robertmhaas@gmail.com>:
> My understanding is that the histories of some of the branches we have
> now are flat-out wrong.

AFAIU only the latest revisions of the branches have been compared.
Keeping history and future in mind, that's not telling much, IMO. In
my experience, there's much more wrong with converted CVS repositories
- the latest revisions are often just the tip of the iceberg.
Depending on your definition of "wrong", of course.

> I don't have a problem keeping those
> alongside the corrected history for ease of rebasing and porting
> commits, but I don't want to punt the problem of figuring out what the
> one, true, and correct history is to the user.

Understood and agreed. (In a distributed VCS, you cannot "delete"
history by definition, because every user is free to keep his version).

However, I'm pretty certain this is not the last "flat-out wrong"
thing we find in the CVS or in the converted git repository. Going to
fix and rebase every time might be pretty annoying and time consuming.
Thus alternatives like those mentioned by Aidan sound interesting to me.

Regards

Markus Wanner


Re: PostgreSQL Developer meeting minutes up

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I'm still not sure who is going to take responsibility for fixing the
> git tree we have now.  I don't think it's going to work for us to
> leave it broken until we're ready to do "the cutover", and then do one
> monolithic move.  If the tools we're using to do the import now have
> broken our tree, then we need to fix it, and them.  Ideally I'd like
> to get a bi-directional conversion working, so that committers could
> commit via either CVS or GIT during the transition, but I'm not sure
> whether that's feasible.

I fear the latter is probably pie in the sky, unfortunately --- to take
just one minor point, which commit timestamp is authoritative?  I think
we will have to make a clean cutover from "CVS is authoritative" to
"CVS is dead and git is authoritative", and do a fresh repository
conversion at that instant.  What we should be doing to get prepared for
that is testing various conversion tools to see which one gives us the
best conversion.  And fixing anything in the CVS repository that is
preventing getting a sane conversion.

The existing git mirror is an unofficial service and is not going to be
the basis of the future authoritative repository.  Folks who have cloned
it will have to re-clone.  Sorry about that, but maintaining continuity
with that repository is just too far down the list of priorities
... especially when we already know it's broken.

I am hoping that git's cvs server emulation is complete enough that you
can commit through it --- anybody know?  But that will be just a
stopgap.

BTW, can anyone comment on whether and how we can maintain the current
split between master repository (that's not even accessible to
non-committers) and a public mirror?  If only from a standpoint of
security paranoia, I'd rather like to preserve that split, but I don't
know how well git will play with it.
        regards, tom lane


Re: PostgreSQL Developer meeting minutes up

From
Robert Haas
Date:
On Thu, May 28, 2009 at 12:10 PM, Markus Wanner <markus@bluegap.ch> wrote:
> Hi,
>
> Quoting "Robert Haas" <robertmhaas@gmail.com>:
>>
>> My understanding is that the histories of some of the branches we have
>> now are flat-out wrong.
>
> AFAIU only the latest revisions of the branches have been compared. Keeping
> history and future in mind, that's not telling much, IMO. In my experience,
> there's much more wrong with converted CVS repositories - the latest
> revisions are often just the tip of the iceberg. Depending on your
> definition of "wrong", of course.

That's not the best news I've had today...

>> I don't have a problem keeping those
>> alongside the corrected history for ease of rebasing and porting
>> commits, but I don't want to punt the problem of figuring out what the
>> one, true, and correct history is to the user.
>
> Understood and agreed. (In a distributed VCS, you cannot "delete" history by
> definition, because every user is free to keep his version).
>
> However, I'm pretty certain this is not the last "flat-out wrong" thing we
> find in the CVS or in the converted git repository. Going to fix and rebase
> every time might be pretty annoying and time consuming. Thus alternatives
> like those mentioned by Aidan sound interesting to me.

To me they sound complex and inconvenient.  I guess I'm kind of
mystified by why we can't make this work reliably.  Other than the
"broken tags" issue we've discussed, it seems like the only real issue
should be how to group changes to different files into a single
commit.  Once you do that, you should be able to construct a
well-defined, total function f : <cvs-file, cvs-revision> -> <git
commit> which is surjective on the space of git commits.  In fact it
might be a good idea to explicitly construct this mapping and drop it
into a database table somewhere so that people can sanity check it as
much as they wish.  Why is this harder than I think it is?

...Robert


Re: PostgreSQL Developer meeting minutes up

From
Robert Haas
Date:
On Thu, May 28, 2009 at 12:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I'm still not sure who is going to take responsibility for fixing the
>> git tree we have now.  I don't think it's going to work for us to
>> leave it broken until we're ready to do "the cutover", and then do one
>> monolithic move.  If the tools we're using to do the import now have
>> broken our tree, then we need to fix it, and them.  Ideally I'd like
>> to get a bi-directional conversion working, so that committers could
>> commit via either CVS or GIT during the transition, but I'm not sure
>> whether that's feasible.
>
> I fear the latter is probably pie in the sky, unfortunately --- to take
> just one minor point, which commit timestamp is authoritative?

That's just a question of deciding on a date when git becomes
authoritative and CVS ceases to be.

> I think
> we will have to make a clean cutover from "CVS is authoritative" to
> "CVS is dead and git is authoritative", and do a fresh repository
> conversion at that instant.  What we should be doing to get prepared for
> that is testing various conversion tools to see which one gives us the
> best conversion.  And fixing anything in the CVS repository that is
> preventing getting a sane conversion.

That might work, but then we better be pretty darn confident that that
"fresh conversion" is actually correct.  I'd rather have them going
side-by-side so that we can verify everything before shutting the old
system off.

> The existing git mirror is an unofficial service and is not going to be
> the basis of the future authoritative repository.  Folks who have cloned
> it will have to re-clone.  Sorry about that, but maintaining continuity
> with that repository is just too far down the list of priorities
> ... especially when we already know it's broken.
>
> I am hoping that git's cvs server emulation is complete enough that you
> can commit through it --- anybody know?  But that will be just a
> stopgap.
>
> BTW, can anyone comment on whether and how we can maintain the current
> split between master repository (that's not even accessible to
> non-committers) and a public mirror?  If only from a standpoint of
> security paranoia, I'd rather like to preserve that split, but I don't
> know how well git will play with it.

You can set up one repository to mirror another.

...Robert


Re: PostgreSQL Developer meeting minutes up

From
Greg Smith
Date:
On Thu, 28 May 2009, Robert Haas wrote:

> My understanding is that the histories of some of the branches we have
> now are flat-out wrong.  I don't have a problem keeping those
> alongside the corrected history for ease of rebasing and porting
> commits, but I don't want to punt the problem of figuring out what the
> one, true, and correct history is to the user.

Right.  There has to be "one true repo" for the history here, and if it 
takes another repo conversion to do it that's unfortunate for people 
already using the existing repo, but as pointed out there are tools 
available to help them out.  You can't prioritize users of this early test 
repo ahead of the long-term goals here, and making it easier for new 
people to quickly start hacking on the codebase is very much a motivating 
factor behind the conversion.

Because the mapping of CVS commits into git ones has a bit of fuzziness to 
it, it's possible to turn fine-tuning the repo history into an endless 
project.  Wandering down that road helps no one.

The best way to control the scope creep here is to avoid doing that, and 
instead focus on what you really need from the repo conversion.  In this 
case, it's a hard requirement that current and back branches that are 
still maintained must produce a checked out result that is identical to if 
you were to check that version out of CVS.  There's already been some spot 
checking of that already, it may make sense to write up an official QA 
spec here.

Reconversion of the old history needs to happen as many times as necessary 
until that goal is reached for git to be adopted by the project one day. 
Because I think that's going to require an iterative process 
(convert/test/fix/repeat) I'm not sure what value there is to the better 
conversion tools that can't be used incrementally here.

If the goalposts are moved to "every ancient tag/release ever must build 
perfectly and have sane history no matter how nasty its CVS history was", 
history conversion is doomed.  I don't think it's unrealistic to plan 
reaching a point where you can say "we've confirmed every release build 
from 7.4 forward builds identically from git; older releases, betas, and 
similarly early builds should instead be built from the deprecated CVS 
repo".  If the scope of the conversion has higher standards than that, and 
I can't imagine why it should, there's going to be an enormous amount of 
time wasted playing around with tags that results in no benefit to users 
of the software.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


Re: PostgreSQL Developer meeting minutes up

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, May 28, 2009 at 12:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think
>> we will have to make a clean cutover from "CVS is authoritative" to
>> "CVS is dead and git is authoritative", and do a fresh repository
>> conversion at that instant. �What we should be doing to get prepared for
>> that is testing various conversion tools to see which one gives us the
>> best conversion. �And fixing anything in the CVS repository that is
>> preventing getting a sane conversion.

> That might work, but then we better be pretty darn confident that that
> "fresh conversion" is actually correct.

Well, yeah, which is one of several reasons why this isn't happening
tomorrow ;-).  Whatever tool we use should have survived a good deal
of advance testing.
        regards, tom lane


Re: PostgreSQL Developer meeting minutes up

From
Alvaro Herrera
Date:
Robert Haas escribió:

> To me they sound complex and inconvenient.  I guess I'm kind of
> mystified by why we can't make this work reliably.  Other than the
> "broken tags" issue we've discussed, it seems like the only real issue
> should be how to group changes to different files into a single
> commit.

There's another issue which is that of the $Id$ and similar tags.  We
have to decide what we want to do with them.  If we're not going to have
them in the Git repository, then they are only causing trouble right now
and it would be better to get rid of them completely for the conversion,
to avoid the noise that they will invariably cause.

We could, for example, say that a conversion process is supposed to
un-expand them (say sed -e 's/$Revision:[^$]*$/$Revision$/' and so on;
obviously it's a lot more complex for $Log$) *before* attempting to
analyze any revision.  I think that would make further munging a lot
simpler.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: PostgreSQL Developer meeting minutes up

From
Tom Lane
Date:
Greg Smith <gsmith@gregsmith.com> writes:
> The best way to control the scope creep here is to avoid doing that, and 
> instead focus on what you really need from the repo conversion.  [...]
> If the goalposts are moved to "every ancient tag/release ever must build 
> perfectly and have sane history no matter how nasty its CVS history was", 
> history conversion is doomed.

Right.  Shall we try to spec out exactly what our conversion
requirements are?  Here's a shot:

* Head of each active branch must check out the same as it does from CVS
(modulo $PostgreSQL$ and similar tags, which we've already agreed we can
abandon).

* Each released minor version tag must check out the same as from CVS,
at least back to some specified point (perhaps 7.4.0).  I'd really
prefer to insist on that all the way back.

* Each commit message in the CVS history must be retrievable from the
git history, and should correspond to the same file changes.  However,
we are okay with git sometimes treating "one" CVS commit as two or more
events with similar messages.  (I'm basing this on the behavior of
cvs2cl, which sometimes does that depending on how time-extended the
individual file updates were.)  Also, we won't be too picky about
whether the "same" commits on different branches are treated as one
event or multiple events.

Comments?  Other considerations?
        regards, tom lane


Re: PostgreSQL Developer meeting minutes up

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> There's another issue which is that of the $Id$ and similar tags.  We
> have to decide what we want to do with them.  If we're not going to have
> them in the Git repository, then they are only causing trouble right now
> and it would be better to get rid of them completely for the conversion,
> to avoid the noise that they will invariably cause.

What was in the back of my mind was that we'd go around and mass-remove
$PostgreSQL$ (and any other lurking tags), but only from HEAD and only
after the repo conversion.  Although just before it would be okay too.
The stickier part of this is what to do about back branches;
particularly whether we are okay with checked-out versions of past
releases not matching the actual shipped tarballs on this point.
        regards, tom lane


Re: PostgreSQL Developer meeting minutes up

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Right.  Shall we try to spec out exactly what our conversion
> requirements are?  Here's a shot:
[...]
> Comments?  Other considerations?

Certainly sounds reasonable to me.  I'd be really suprised if that's
really all that hard to accomplish.  I'd be happy to help with some
testing too if we feel that the current git repo is in reasonable shape
to do that testing against (or someone has another).

+1
Thanks,
    Stephen

Re: PostgreSQL Developer meeting minutes up

From
Robert Haas
Date:
On Thu, May 28, 2009 at 12:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> There's another issue which is that of the $Id$ and similar tags.  We
>> have to decide what we want to do with them.  If we're not going to have
>> them in the Git repository, then they are only causing trouble right now
>> and it would be better to get rid of them completely for the conversion,
>> to avoid the noise that they will invariably cause.
>
> What was in the back of my mind was that we'd go around and mass-remove
> $PostgreSQL$ (and any other lurking tags), but only from HEAD and only
> after the repo conversion.  Although just before it would be okay too.
> The stickier part of this is what to do about back branches;
> particularly whether we are okay with checked-out versions of past
> releases not matching the actual shipped tarballs on this point.

Mass-deleting these tags from HEAD and the current head of each
back-branch seems like a good place to start.

...Robert


Re: PostgreSQL Developer meeting minutes up

From
Alvaro Herrera
Date:
Tom Lane escribió:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > There's another issue which is that of the $Id$ and similar tags.  We
> > have to decide what we want to do with them.  If we're not going to have
> > them in the Git repository, then they are only causing trouble right now
> > and it would be better to get rid of them completely for the conversion,
> > to avoid the noise that they will invariably cause.
> 
> What was in the back of my mind was that we'd go around and mass-remove
> $PostgreSQL$ (and any other lurking tags), but only from HEAD and only
> after the repo conversion.  Although just before it would be okay too.
> The stickier part of this is what to do about back branches;
> particularly whether we are okay with checked-out versions of past
> releases not matching the actual shipped tarballs on this point.

You mean we would remove them from CVS?  I don't think that's
necessarily a good idea; it'd be massive changes for no good reason.  My
idea was to remove them from the repository that would be used for the
conversion (I think that means editing the ,v files), and not put that
change back to the "real" CVS repo.  Then the conversion to Git gets a
lot simpler; and the checking of this modified repo against copies
checked out from Git would be simpler.

Since this change is supposed to be scriptable, the script should be
available so potential testers of the conversion can get a converted
repository too.  (Or maybe we should just provide access to the modified
copy of the repo).

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: PostgreSQL Developer meeting minutes up

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane escribi�:
>> What was in the back of my mind was that we'd go around and mass-remove
>> $PostgreSQL$ (and any other lurking tags), but only from HEAD and only
>> after the repo conversion.  Although just before it would be okay too.

> You mean we would remove them from CVS?  I don't think that's
> necessarily a good idea; it'd be massive changes for no good reason.

Uh, how is it different from any other mass edit, such as our annual
copyright-year updates, or pgindent runs?

> My idea was to remove them from the repository that would be used for the
> conversion (I think that means editing the ,v files),

Ick ... I'm willing to tolerate a few small manual ,v edits if we have
to do it to make tags consistent or something like that.  I don't think
we should be doing massive edits of that kind.

But anyway, that's not the interesting point.  The interesting point is
what about the historical aspect of it, not whether we want to dispense
with the tags going forward.  Should our repo conversion try to
represent the historical states of the files including the tag strings?
        regards, tom lane


Re: PostgreSQL Developer meeting minutes up

From
Greg Smith
Date:
On Thu, 28 May 2009, Tom Lane wrote:

> Each released minor version tag must check out the same as from CVS, at 
> least back to some specified point (perhaps 7.4.0).  I'd really prefer 
> to insist on that all the way back.

We'd all like to hope that conversion process that works for everything 
back to 7.4.0 would would also give useful results for all the old ones, 
too.  And it's worth testing as far back as possible.  I think it's just 
unrealistic to set the bar too high in the off chance that one of these 
old releases has something that's harder to fix than producing that 
version is worth.  That might be the case for some of the 7.1 stuff 
mentioned upthread for example.  If there are only a few stragglers that 
won't play nice, it might be easier to just publish a "git errata" list of 
those releases and move on.

In related news, I wanted to make it a bit easier to track followup on the 
whole "Action Item" list from the meeting.  I converted those to the 
standard format we were already using on the ToDo list, which provides a 
way to check off items that are done.  It may be worth breaking those out 
from the rest of the minutes, so that it's easier to extend them with 
things like these fleshed out git requirements.  Example: 
http://wiki.postgresql.org/wiki/PgCon_2009_Developer_Meeting#Source_Code_Management

Thoughts?

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


Re: PostgreSQL Developer meeting minutes up

From
Aidan Van Dyk
Date:
* Aidan Van Dyk <aidan@highrise.ca> [090527 17:22]:
> And actually looking at the history of the gpo repo, the branches are all
> messed up with "merges" and stuff that I'm not sure where they are coming
> from...  8.2, 8.3, and master(HEAD) are all the same as my gpo repo, but the
> back branchs are very bad...

Ok, so seeing the interest in having a "good conversion", I took a stab at
parsecvs this afternoon, probably what I consider the leading "static"
conversion tool.  I put the following patch into it to teach it about
$PostgreSQL$ and make the date formats match what my cvs export had:aidan@db1-dapper:~/test/parsecvs$ git diffdiff
--gita/rcs2git.c b/rcs2git.cindex c13c1f4..de6841d 100644--- a/rcs2git.c+++ b/rcs2git.c@@ -52,7 +52,7 @@ struct diffcmd
{const int initial_out_buffer_size = 1024; char const ciklog[] = "checked in with -k by ";
 
-#define KEYLENGTH 8 /* max length of any of the above keywords */+#define KEYLENGTH 10 /* max length of any of the
abovekeywords */ #define KDELIM '$' /* keyword delimiter */ #define VDELIM ':' /* separates keywords from values */
#defineSDELIM '@' /* string delimiter */@@ -61,10 +61,10 @@ char const ciklog[] = "checked in with -k by ";
 
 char const *const Keyword[] = {    0, "Author", "Date", "Header", "Id", "Locker", "Log",-       "Name", "RCSfile",
"Revision","Source", "State"+       "Name", "RCSfile", "Revision", "Source", "State", "PostgreSQL" }; enum markers {
Nomatch,Author, Date, Header, Id, Locker, Log,-       Name, RCSfile, Revision, Source, State };+       Name, RCSfile,
Revision,Source, State, PostgreSQL }; enum stringwork {ENTER, EDIT};
 
 enum expand_mode {EXPANDKKV, EXPANDKKVL, EXPANDKK, EXPANDKV, EXPANDKO, EXPANDKB};@@ -492,7 +492,7 @@ static void
keyreplace(enummarkers marker)    char const *sp = Keyword[(int)marker];
 
    strftime(date_string, 25,-               "%Y/%m/%d %H:%M:%S", localtime(&Gversion->date));+               "%Y-%m-%d
%H:%M:%S",localtime(&Gversion->date));
 
    if (exp != EXPANDKV)        out_printf("%c%s", KDELIM, sp);

It takes about 10 minutes to run my old xeon.

And a comparison between it's conversions and my cvs checkouts:pg-parsecvs-MANUAL_DIST.diff   0 files
changedpg-parsecvs-REL2_0B.diff      0 files changedpg-parsecvs-REL6_4.diff        0 files
changedpg-parsecvs-REL6_5_PATCHES.diff0 files changedpg-parsecvs-REL7_0_PATCHES.diff 0 files
changedpg-parsecvs-REL7_1_STABLE.diff0 files changedpg-parsecvs-REL7_2_STABLE.diff 0 files
changedpg-parsecvs-REL7_3_STABLE.diff0 files changedpg-parsecvs-REL7_4_STABLE.diff 0 files
changedpg-parsecvs-REL8_0_0.diff     4 files changed, 5053 insertions(+), 35326
deletions(-)pg-parsecvs-REL8_0_STABLE.diff0 files changedpg-parsecvs-REL8_1_STABLE.diff 0 files
changedpg-parsecvs-REL8_2_STABLE.diff1 file changed, 1 insertion(+), 1 deletion(-)pg-parsecvs-REL8_3_STABLE.diff 1 file
changed,1 insertion(+), 1 deletion(-)pg-parsecvs-Release_1_0_3.diff 0 files changedpg-parsecvs-WIN32_DEV.diff     0
fileschangedpg-parsecvs-ecpg_big_bison.diff 0 files changedpg-parsecvs-master.diff        1 file changed, 1
insertion(+),1 deletion(-)
 

Much better!!!

The REL8_0_0 branch seem funny yet: src/backend/po/ru.po                  | 8416 ++++++++++------
src/backend/parser/gram.c            |12088 ------------------------ src/interfaces/ecpg/preproc/pgc.c     | 2887 -----
src/interfaces/ecpg/preproc/preproc.c|16988 ---------------------------------- 4 files changed, 5053 insertions(+),
35326deletions(-)
 

3 files are in the REL8_0_0 conversion but not on the cvs branch anymore (but
looking at the CVS ,v files, I'm partial to thinking there was probably some
CVS file hackery around those versions), and it's missing an update to ru.po.

The other difference is the same line for REL8_2_STABLE/REL8_3_STABLE/master:aidan@db1-dapper:~/test/pg$ cat
/tmp/pg-parsecvs-master.diffdiff--git a/src/tools/backend/index.html b/src/tools/backend/index.htmlindex
ec2dcb8..846f002100644--- a/src/tools/backend/index.html+++ b/src/tools/backend/index.html@@ -1,4 +1,4 @@-<!--
$PostgreSQL:pgsql/src/tools/backend/index.html,v 1.35 2006/03/11 04:38:41 momjian Exp $ -->+<!-- $PostgreSQL$ -->
<!DOCTYPEhtml PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml">
 

Inspecting $CVSROOT/pgsql/src/tools/backend/index.html,v shows it's actually
got a strange $PostgreSQL$ tag:

The tag came into existence here:1.35log@Add CVS tag lines to files that were lacking them.@text@d1 1a1 1<!--
$PostgreSQL:pgsql/src/backend/utils/misc/guc.c,v 1.314 2006/03/07 02:54:23 momjian Exp $ -->
 
Note the bogus file name and version.

And then it was updated once since then:1.36log@Improve backend flowchart to show more detail.@text@<!-- $PostgreSQL:
pgsql/src/tools/backend/index.html,v1.35 2006/03/11 04:38:41 momjian Exp $ -->
 



But, you have all your branches and tags: REL2_0B               01038eb Here's a little patch to keep the compiler
quietwhen compiling PostgreSQL V6.0 on the SPARC Solaris2 platform. REL6_4                218f738 Retrofit hashtable
andshared-mem-size-estimation bug fixes into REL6_4. REL6_5_PATCHES        07af5e4 Back-patch critical fixes for
NUMERICvalues in plpgsql functions. REL7_0_PATCHES        ddb6d9f Back-patch password leak fix for Vaschenko.
REL7_1_STABLE        c1e5c6e Remove stray semicolons in old ecpg preproc grammar ... modern bison versions won't
compileit at all with those there.  Probably of only academic interest now, but ... REL7_2_STABLE         695a260
Removeregistration message in all the supported back branches; we had decided to drop it for 7.4, and no one misses it.
REL7_3_STABLE        9ff59c7 Stamp release 7.3.21. REL7_4_STABLE         4257497 Split the release notes into a
separatefile for each (active) major branch, as per my recent proposal.  release.sgml itself is now just a stub that
shouldchange rarely; ideally, only once per major release to add a new include line. Most editing work will occur in
therelease-N.N.sgml files.  To update a back branch for a minor release, just copy the appropriate release-N.N.sgml
file(s)into the back branch. REL8_0_0              1df3a89 its that time ... tag it for release REL8_0_STABLE
a52c648Split the release notes into a separate file for each (active) major branch, as per my recent proposal.
release.sgmlitself is now just a stub that should change rarely; ideally, only once per major release to add a new
includeline. Most editing work will occur in the release-N.N.sgml files.  To update a back branch for a minor release,
justcopy the appropriate release-N.N.sgml file(s) into the back branch. REL8_1_STABLE         04b339b Update relpages
andreltuples estimates in stand-alone ANALYZE, even if there's no analyzable attributes or indexes. We also used to
report0 live and dead tuples for such tables, which messed with autovacuum threshold calculations. REL8_2_STABLE
e2fee50 Update relpages and reltuples estimates in stand-alone ANALYZE, even if there's no analyzable attributes or
indexes.We also used to report 0 live and dead tuples for such tables, which messed with autovacuum threshold
calculations.REL8_3_STABLE         18b7ff5 Fix LIKE's special-case code for % followed by _.  I'm not entirely sure
thatthis case is worth a special code path, but a special code path that gets the boundary condition wrong is
definitelyno good.  Per bug #4821 from Andrew Gierth. Release_1_0_3         21f8ea0 A small patch from Andrew for the
linuxport in v1.09 WIN32_DEV             0f1714d Change Win32 rename/unlink timeout to 3 seconds. ecpg_big_bison
3cb2aa3Synced yet again. Deactivated backend prepare/execute/deallocate for the time being. master
fd02d25Fix compiler warnings on Sun Studio of the sort master-UNNAMED-BRANCH 4cc264d Make the world at least somewhat
safefor zero-column tables, and remove the special case in ALTER DROP COLUMN to prohibit dropping a table's last
column.
 tags/MANUAL_1_0       0f52f25 Import of PostgreSQL User Manual tags/PG95-1_01        4d809f0 Postgres95 1.01
Distribution- Virgin Sources tags/REL2_0           401afd3 Remove include of libpq-fe.h.  This file has nothing to do
withlibpq. tags/REL6_5           744f5e3 Update TODO list. tags/REL7_0           ab6f4fd Change HISTORY to show outer
joinsin 7.1 or 7.2. tags/REL7_1           4aa3f7b Remove as-of from HISTORY file. tags/REL7_1_BETA      63fbd7a Fix
bogusmakefiles ... these didn't build on platforms that are sticky about being given accurate references to referenced
libraries... tags/REL7_1_BETA2     a233e6c tag configure as beta2 .. tags/REL7_1_BETA3     9ef1bcb jump version to
beta3... beta2 was created and pulled due to a couple of large-ish bugs that Tom and Vadim were able to fix, but to
avoidany confusion, beta2 was removed ... and for tag'ng purposes, beta3 is being created ... tags/REL7_1_2
7b7fbe9Correct recently-broken avg(interval) definition.  We can't force an initdb to fix this in 7.1 installations,
butit seems better to be shipping a correct entry than a wrong one.
 
 tags/REL7_2_BETA1     74644d4 Code cleanup. tags/REL7_2_BETA2     f0d9d23 Update for latest version of horology test.
tags/REL7_2_BETA3    c122aae Some minor tweaks of REINDEX processing: grab exclusive lock a little earlier, make error
checksmore uniform. tags/REL7_2_BETA4     374d3a2 tag it as b4, with all the changes that have gone on ...
tags/REL7_2_BETA5    ebb23f8 tag as beta 5 for *hopefully* a very very short beta cycle on this one? tags/REL7_2_RC1
  30f1836 okay, sorry for delay all ... here is the tag for RC1 ... tags/REL7_2_RC2       7291790 let's roll up rc2 ..
tags/REL7_2          b8e3d8f Stamp configure/configure.in for 7.2, already did register.txt and bug.template.
tags/REL7_2_3        8ddb964 Brand 7.2.3. tags/REL7_2_4         bfb3ddc Brand 7.2.4. tags/REL7_2_5         473091d
Update7.2 regression tests to match what you get when using a modern version of Bison. tags/REL7_2_6         26c477e
Stamprelease 7.2.6. tags/REL7_2_7         cdb721b Recommend security@postgresql.org as the contact point for
security-relatedbugs. tags/REL7_2_8         21f3e3a Update release notes for upcoming re-releases.
 
 tags/REL7_3_2         652e3c8 Add mention of CURRENT_SCHEMA for object creation. tags/REL7_3_4         9a46450 Fix
timestamp_datefor HAVE_INT64_TIMESTAMP case. tags/REL7_3_5         ebeb0c0 Brand 7.3.5. tags/REL7_3_6         718dda1
Brand7.3.6. tags/REL7_3_7         e73b50d Wups, seem to have used an ungood version of lynx to generate this.
tags/REL7_3_8        2291a3f Stamp release 7.3.8. tags/REL7_3_9         cdb721b Recommend security@postgresql.org as
thecontact point for security-related bugs. tags/REL7_3_10        21f3e3a Update release notes for upcoming
re-releases.tags/REL7_3_11        10c4262 Stamp release 7.3.11. tags/REL7_3_12        d65fbbd Stamp 7.3.12.
tags/REL7_3_13       92ee1dc Release-note updates and copy editing. tags/REL7_3_14        bbfa98c Stamp 7.3.14.
tags/REL7_3_15       b954d6d Stamp release 7.3.15. tags/REL7_3_16        7ec7706 Stamp 7.3.16. tags/REL7_3_17
ee65483Fix markup because older releases couldn't like to refernce pages. tags/REL7_3_18        2edf36f Stamp release
7.3.18.tags/REL7_3_19        552a435 Update configure.in for release tags/REL7_3_20        f3ab989 Update release notes
forlast-minute fix. tags/REL7_3_21        9ff59c7 Stamp release 7.3.21.
 
 tags/REL7_4_BETA1     9060870 can't mix and match .gz and .bz2 in here ... won't build tags/REL7_4_BETA2     f2e55e5
updateto beta2 tags/REL7_4_BETA3     c2481d7 tag her for beta3, as announced on Friday  ... tags/REL7_4_BETA4
1df1740brand her beta4 tags/REL7_4_BETA5     6803ce7 up configure to beta5 tags/REL7_4_RC1       5715d2b tag it Release
Candidate1, as previously discussed tags/REL7_4_RC2       441a921 autoconf tags/REL7_4           b1f8cc4 k, tag the
releasetags/REL7_4_1         9d947b0 Update HISTORY for 7.4.1 release. tags/REL7_4_2         3462f95 Some editorial
workon 7.4.2 release notes. tags/REL7_4_3         fc86d4b Remove README.CVS when making a distribution. tags/REL7_4_4
     a9771b2 Stamp 7.4.4. tags/REL7_4_5         6ff4540 Brand 7.4.5 ... now that was our shortest-lived release ever
...tags/REL7_4_6         0833822 Stamp release 7.4.6. tags/REL7_4_7         cdb721b Recommend security@postgresql.org
asthe contact point for security-related bugs. tags/REL7_4_8         21f3e3a Update release notes for upcoming
re-releases.tags/REL7_4_9         959dacc COPY's test for read-only transaction was backward; it prohibited COPY TO
whereit should prohibit COPY FROM.  Found by Alon Goldshuv. tags/REL7_4_10        b250647 Translation updates
tags/REL7_4_11       92ee1dc Release-note updates and copy editing. tags/REL7_4_12        7fb4b1d Stamp 7.4.12.
tags/REL7_4_13       d6d136f Stamp release 7.4.13. tags/REL7_4_14        fedacf2 Stamp 7.4.14. tags/REL7_4_15
dd03a59commit before tag ... tags/REL7_4_16        c06c90d Stamp release 7.4.16. tags/REL7_4_17        61fc168 Update
configurein for new release tags/REL7_4_18        f3ab989 Update release notes for last-minute fix. tags/REL7_4_19
 01d0a31 Stamp release 7.4.19. tags/REL7_4_20        f17483e Remove link that pre-8.2 doc tools don't support.
tags/REL7_4_21       7814f48 tag 7.4.21 tags/REL7_4_22        8bde44d tag for 7.4.22 tags/REL7_4_23        0b96d5e tag
7.4.23tags/REL7_4_24        204cde4 tag 7.4.24 tags/REL7_4_25        3a5e48a tag 7.4.25
 
 tags/REL8_0_0BETA1    6eb9cb3 Fix Win32 pg_dumpall check. tags/REL8_0_0BETA2    9b01845 tag configure beta2
tags/REL8_0_0BETA3   3be70cd update for beta3, and update Copyright date to 2004 tags/REL8_0_0BETA4    4e175ff make
surewe tag configure.in as beta4 as well ... tags/REL8_0_0BETA5    2cdd35b update us to beta5 tags/REL8_0_0RC1
da9f52dtag configure for rc1 .. tags/REL8_0_0RC2      19e8edf tag files for rc2 tags/REL8_0_0RC3      be040da forgot to
autoconfafter tag'ng configure.in with rc3 tags/REL8_0_0RC4      7da46c4 upgrade tags to rc4 tags/REL8_0_0RC5
81f6bcbup release to rc5 tags/REL8_0_1         cdb721b Recommend security@postgresql.org as the contact point for
security-relatedbugs. tags/REL8_0_2         61f3737 Stamp 8.0.2. tags/REL8_0_3         04b942e Rename encryption
section.tags/REL8_0_4         959dacc COPY's test for read-only transaction was backward; it prohibited COPY TO where
itshould prohibit COPY FROM.  Found by Alon Goldshuv. tags/REL8_0_5         b250647 Translation updates tags/REL8_0_6
     92ee1dc Release-note updates and copy editing. tags/REL8_0_7         65edf6e Stamp 8.0.7. tags/REL8_0_8
991d699Stamp release 8.0.8. tags/REL8_0_9         0d007fb Stamp 8.0.9. tags/REL8_0_10        f9dba8f tag it
tags/REL8_0_11       12f2780 Stamp release 8.0.11. tags/REL8_0_12        9988796 Stamp releases notes for 8.2.3, 8.1.8,
8.0.12.tags/REL8_0_13        5032c16 Update configure for release tags/REL8_0_14        f3ab989 Update release notes
forlast-minute fix. tags/REL8_0_15        89453bc Stamp release 8.0.15. tags/REL8_0_16        f17483e Remove link that
pre-8.2doc tools don't support. tags/REL8_0_17        ba9d7d7 tag 8.0.17 tags/REL8_0_18        8611315 tag for 8.0.18
tags/REL8_0_19       e077640 tag for 8.0.19 tags/REL8_0_20        9f80110 commit first then tag 8.0.20 tags/REL8_0_21
    13651cb tag 8.0.21
 
 tags/REL8_1_0BETA1    1762d42 fix up a few references to 8.1devel -> 8.1beta1 tags/REL8_1_0BETA2    6c005da tag it all
beta2... tags/REL8_1_0BETA3    972df1d must commit *after* autoconf, not before tags/REL8_1_0BETA4    7b100fc update
configureand bugtemplate for beta 4 ... tags/REL8_1_0RC1      8dd55b3 tag it for rc1 tags/REL8_1_0         cdecffb Tag
everythingfor 8.1.0 ... Finally, a relesae on scheduale!! tags/REL8_1_1         5f6994f Remove incorrect increment of
lineno,per David Fetter. Sync HEAD and 8.1 branches of pgbench. tags/REL8_1_2         92ee1dc Release-note updates and
copyediting. tags/REL8_1_3         da0a737 Stamp 8.1.3. tags/REL8_1_4         d17b9c1 Stamp release 8.1.4.
tags/REL8_1_5        21eeb6a Stamp 8.1.5. tags/REL8_1_6         5558874 Links to GUC variables from HISTORY don't work
inback branches... tags/REL8_1_7         0349a82 Stamp release 8.1.7. tags/REL8_1_8         9988796 Stamp releases
notesfor 8.2.3, 8.1.8, 8.0.12. tags/REL8_1_9         2f934c8 Update configure.in for release tags/REL8_1_10
f3ab989Update release notes for last-minute fix. tags/REL8_1_11        9ae878a Stamp release 8.1.11. tags/REL8_1_12
  f17483e Remove link that pre-8.2 doc tools don't support. tags/REL8_1_13        b2b4697 tag 8.1.13 tags/REL8_1_14
  7cb6c06 tag for 8.1.14 tags/REL8_1_15        1123c54 tag 8.1.15 tags/REL8_1_16        d59f5d8 tagging 8.1.16
tags/REL8_1_17       9ad74fc tag 8.1.17
 
 tags/REL8_2_BETA1     d1511a2 Tag us Beta1 tags/REL8_2_BETA2     2630594 Stamp 8.2beta2. tags/REL8_2_BETA3     f7ea773
Tagas Beta3 ... two outstanding *known* bugs before RC1 ... tags/REL8_2_RC1       33061a5 update for rc1 tags/REL8_2_0
      7d3db29 v8.2.0 is now released ... tags/REL8_2_1         b878fc4 tag configure tags/REL8_2_2         538ed7c
Stamprelease 8.2.2. tags/REL8_2_3         9988796 Stamp releases notes for 8.2.3, 8.1.8, 8.0.12. tags/REL8_2_4
96acc85Fix markup. tags/REL8_2_5         f3ab989 Update release notes for last-minute fix. tags/REL8_2_6
019e5eaStamp release 8.2.6. tags/REL8_2_7         08a055a Translation updates tags/REL8_2_8         34b0b9f tag 8.2.8
tags/REL8_2_9        b117db0 tag 8.2.9 tags/REL8_2_10        25afb91 tag for 8.2.10 tags/REL8_2_11        aedcad2 tag
8.2.11tags/REL8_2_12        37a9110 tag 8.2.12 tags/REL8_2_13        9f8fa50 tag 8.2.13 tags/REL8_3_BETA1     0d74383
tagit 8.3beta1 ... the beta cycle begins tags/REL8_3_BETA2     28343c6 Stamp 8.3beta2. tags/REL8_3_BETA3     0b5494d
Fixmarkup that doesn't work in HISTORY generation. tags/REL8_3_BETA4     60becb8 Stamp 8.3beta4. tags/REL8_3_RC1
b28079bStamp release 8.3RC1. tags/REL8_3_RC2       17a82cf must commit after autoconf ... and yes, I used the right
autoconftags/REL8_3_0         0126093 configure tag'd 8.3.0 and built witih autoconf 2.59 tags/REL8_3_1         7e8f8b2
Fixinappropriately-timed memory context switch in autovacuum_do_vac_analyze. This accidentally failed to failbefore
8.3,because the context we were switching back to was long-lived anyway; but it sure looks risky as can be now.  Well
spottedby Pavan Deolasee. tags/REL8_3_2         2343441 tag for 8.3.2 tags/REL8_3_3         84d7f08 tag 8.3.3
tags/REL8_3_4        08f7e02 tag for 8.3.4 tags/REL8_3_5         aed4eac commit for 8.3.5 tags/REL8_3_6         8e9babe
Fixplpgsql to not treat INSERT INTO as an INTO-variables clause anywhere in the string, not just at the start.  Per bug
#4629from Martin Blazek. tags/REL8_3_7         98b774c tag 8.3.7
 
 tags/REL8_4_BETA1     6c313e8 commit and tag beta1 tags/REL8_4_BETA2     6e61436 commit for BETA2
 tags/Release-1-6-0    7e5f5ea creation for postgresql-6.1 tags/Release_1_0_2    f720353 Okay...*last* commit, now to
createa release... tags/Release_2_0      c081f4a | |Here is a fix for the psql alignment problem.  It turns out that
libpq|was trying to determine if the column contained only numeric values so |it could right justify it.  The 'e'
valueswere taked as exponient |values and all columns were considerednumeric. | |The patch excludes 'e' and 'E' as
beingvalid first-column numeric |values. | tags/Release_2_0_0    b185bb0 changed missed err() change to err_out()
tags/SUPPORT         2ffc02d Support Docs & Contrib tags/creation         7e5f5ea creation for postgresql-6.1
tags/release-6-3     5b4a2e7 One last change to configure for 'non-gcc' compiler
 

a.


-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: PostgreSQL Developer meeting minutes up

From
Andres Freund
Date:
Hi,

On 05/28/2009 06:19 PM, Tom Lane wrote:
> I am hoping that git's cvs server emulation is complete enough that you
> can commit through it --- anybody know?  But that will be just a
> stopgap.
Comitting is no problem - you can't tag, branch or merge through it 
though (Not really surprisingly I think).

> BTW, can anyone comment on whether and how we can maintain the current
> split between master repository (that's not even accessible to
> non-committers) and a public mirror?  If only from a standpoint of
> security paranoia, I'd rather like to preserve that split, but I don't
> know how well git will play with it.
Absolutely not a problem (Doing such things is one of the strengths of git).

Andres


Re: PostgreSQL Developer meeting minutes up

From
Aidan Van Dyk
Date:
* Andres Freund <andres@anarazel.de> [090528 16:07]:

>> BTW, can anyone comment on whether and how we can maintain the current
>> split between master repository (that's not even accessible to
>> non-committers) and a public mirror?  If only from a standpoint of
>> security paranoia, I'd rather like to preserve that split, but I don't
>> know how well git will play with it.

> Absolutely not a problem (Doing such things is one of the strengths of git).

In fact, that's generally the standing procedure for git... The
"master" is "pushed" to by a select few, controlled by SSH key access to
accounts with write access to the files/directories of the git repo
(much like I'm assuming the current CVS master is).  And that's pushed
out to any number of other places, either from cron, or post-receive
hooks.

a.

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: PostgreSQL Developer meeting minutes up

From
Aidan Van Dyk
Date:
* Aidan Van Dyk <aidan@highrise.ca> [090528 15:56]:
> Ok, so seeing the interest in having a "good conversion", I took a stab at
> parsecvs this afternoon, probably what I consider the leading "static"
> conversion tool.
> It takes about 10 minutes to run my old xeon.
> 
> And a comparison between it's conversions and my cvs checkouts:
>     pg-parsecvs-MANUAL_DIST.diff   0 files changed
>     pg-parsecvs-REL2_0B.diff       0 files changed
>     pg-parsecvs-REL6_4.diff        0 files changed
>     pg-parsecvs-REL6_5_PATCHES.diff 0 files changed
>     pg-parsecvs-REL7_0_PATCHES.diff 0 files changed
>     pg-parsecvs-REL7_1_STABLE.diff 0 files changed
>     pg-parsecvs-REL7_2_STABLE.diff 0 files changed
>     pg-parsecvs-REL7_3_STABLE.diff 0 files changed
>     pg-parsecvs-REL7_4_STABLE.diff 0 files changed
>     pg-parsecvs-REL8_0_0.diff      4 files changed, 5053 insertions(+), 35326 deletions(-)
>     pg-parsecvs-REL8_0_STABLE.diff 0 files changed
>     pg-parsecvs-REL8_1_STABLE.diff 0 files changed
>     pg-parsecvs-REL8_2_STABLE.diff 1 file changed, 1 insertion(+), 1 deletion(-)
>     pg-parsecvs-REL8_3_STABLE.diff 1 file changed, 1 insertion(+), 1 deletion(-)
>     pg-parsecvs-Release_1_0_3.diff 0 files changed
>     pg-parsecvs-WIN32_DEV.diff     0 files changed
>     pg-parsecvs-ecpg_big_bison.diff 0 files changed
>     pg-parsecvs-master.diff        1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Much better!!!

And for those interested in looking at that repo:git clone git://code.highrise.ca/~mountie/pg-static.git

It won't be around forever, and it's *not* incremental.

a.

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: PostgreSQL Developer meeting minutes up

From
Robert Haas
Date:
On Thu, May 28, 2009 at 3:56 PM, Aidan Van Dyk <aidan@highrise.ca> wrote:
> Ok, so seeing the interest in having a "good conversion", I took a stab at

Awesome!

> Much better!!!
>
> The REL8_0_0 branch seem funny yet:
>         src/backend/po/ru.po                  | 8416 ++++++++++------
>         src/backend/parser/gram.c             |12088 ------------------------
>         src/interfaces/ecpg/preproc/pgc.c     | 2887 -----
>         src/interfaces/ecpg/preproc/preproc.c |16988 ----------------------------------
>         4 files changed, 5053 insertions(+), 35326 deletions(-)
>
> 3 files are in the REL8_0_0 conversion but not on the cvs branch anymore (but
> looking at the CVS ,v files, I'm partial to thinking there was probably some
> CVS file hackery around those versions), and it's missing an update to ru.po.

So we should probably look into fixing this...

> The other difference is the same line for REL8_2_STABLE/REL8_3_STABLE/master:
>        aidan@db1-dapper:~/test/pg$ cat /tmp/pg-parsecvs-master.diff
>        diff --git a/src/tools/backend/index.html b/src/tools/backend/index.html
>        index ec2dcb8..846f002 100644
>        --- a/src/tools/backend/index.html
>        +++ b/src/tools/backend/index.html
>        @@ -1,4 +1,4 @@
>        -<!-- $PostgreSQL: pgsql/src/tools/backend/index.html,v 1.35 2006/03/11 04:38:41 momjian Exp $ -->
>        +<!-- $PostgreSQL$ -->
>         <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
>             "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
>         <html xmlns="http://www.w3.org/1999/xhtml">
>
> Inspecting $CVSROOT/pgsql/src/tools/backend/index.html,v shows it's actually
> got a strange $PostgreSQL$ tag:
>
> The tag came into existence here:
>        1.35
>        log
>        @Add CVS tag lines to files that were lacking them.
>        @
>        text
>        @d1 1
>        a1 1
>        <!-- $PostgreSQL: pgsql/src/backend/utils/misc/guc.c,v 1.314 2006/03/07 02:54:23 momjian Exp $ -->
> Note the bogus file name and version.
>
> And then it was updated once since then:
>        1.36
>        log
>        @Improve backend flowchart to show more detail.
>        @
>        text
>        @<!-- $PostgreSQL: pgsql/src/tools/backend/index.html,v 1.35 2006/03/11 04:38:41 momjian Exp $ -->

...and this.

...Robert


Re: PostgreSQL Developer meeting minutes up

From
"Markus Wanner"
Date:
Hi,

Quoting "Robert Haas" <robertmhaas@gmail.com>:
> That might work, but then we better be pretty darn confident that that
> "fresh conversion" is actually correct.  I'd rather have them going
> side-by-side so that we can verify everything before shutting the old
> system off.

I agree, as long as you take non-incremental converters into account
as well. Otherwise, we'd mostly test functionality we don't need later
on (incremental updates).

>> BTW, can anyone comment on whether and how we can maintain the current
>> split between master repository (that's not even accessible to
>> non-committers) and a public mirror?  If only from a standpoint of
>> security paranoia, I'd rather like to preserve that split, but I don't
>> know how well git will play with it.
>
> You can set up one repository to mirror another.

Yes, that's the point of a distributed VCS. The good thing about it is
that everybody is free to work (including committing) *on his own
copy* of the branch and then provide a patch (or patches) for
committers (or gain commit rights and upload his work later on). That
fits pretty well with the Postgres development process, AFAICT.

Regards

Markus Wanner


Re: PostgreSQL Developer meeting minutes up

From
"Markus Wanner"
Date:
Hi,

Quoting "Robert Haas" <robertmhaas@gmail.com>:
> That's not the best news I've had today...

Sorry :-(

> To me they sound complex and inconvenient.  I guess I'm kind of
> mystified by why we can't make this work reliably.  Other than the
> "broken tags" issue we've discussed, it seems like the only real issue
> should be how to group changes to different files into a single
> commit.  Once you do that, you should be able to construct a
> well-defined, total function f : <cvs-file, cvs-revision> -> <git
> commit> which is surjective on the space of git commits.  In fact it
> might be a good idea to explicitly construct this mapping and drop it
> into a database table somewhere so that people can sanity check it as
> much as they wish.  Why is this harder than I think it is?

Well, as CVS doesn't guarantee any consistency between files, you end
up with silly situations more often than you think. One of the
simplest possible example is something like:
  commit 1: fileA @ 1.1, fileB @ 1.2  commit 2: fileA @ 1.2, fileB @ 1.1

Seen from fileA, it's obvious that commit 1 (@1.1) comes before commit
2 (@1.2), but seen from fileB it's the exact opposite. The most
promising approach to solve these problems seems to be based on Graph
Theory, where you work with a graph of dependencies from fileA @ 1.1
to fileA @ 1.2.

To resolve the above situation, you'd have "split" a blob of
single-file commits into two end-result commits (for monotone / git).
In the above example, you'd have two options to resolve the conflict:
  commit 1a: fileA @ 1.1  commit 2:  fileA @ 1.2, fileB @ 1.1  commit 1b: fileA @ 1.2

Or:
  commit 2a: fileB @ 1.1  commit 1: fileA @ 1.1, fileB @ 1.2  commit 2b: fileB @ 1.2

(Note that often enough, these have actually been separate commits in
CVS as well, there's just no way to represent that. And no, timestamps
are simply not reliable enough).

Now add tags, branches and cyclic dependencies involving many files
and many 100 commits to the example above and you start to get an idea
of the complexity of the problem in general.

See my description and diagrams of the steps used for cvs_import in
monotone at [1] or follow descriptions of how cvs2svn works internally.

A few numbers about a conversion I'm trying for testing my algorithm
and heuristics. It's converting a pretty recent snapshot of the
Postgres repository:
 * running at 100% CPU time since: April, 17 * Total number of files involved: 6'847 * total number of blobs (before
splitting):28'010 * blobs split due to cyclic dependencies: 12'801 

Admittedly, my algorithm isn't optimized at all. However, I'm focusing
on good results rather than speed of conversion.

Also note, that monotone uses SQLite, so it actually stores the
results of this conversion in an SQL database, as you proposed.
Recently, a git_export command has been added, so that's definitely
worth a try for converting CVS to git. However, I fear cvs2git is more
mature.

Regards

Markus Wanner

[1]: a description of the various steps in conversion from CVS to monotone:
http://www.monotone.ca/wiki/CvsImport/



Re: PostgreSQL Developer meeting minutes up

From
Peter Eisentraut
Date:
On Thursday 28 May 2009 20:03:38 Stephen Frost wrote:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> > Right.  Shall we try to spec out exactly what our conversion
> > requirements are?  Here's a shot:
>
> [...]
>
> > Comments?  Other considerations?
>
> Certainly sounds reasonable to me.  I'd be really suprised if that's
> really all that hard to accomplish.  I'd be happy to help with some
> testing too if we feel that the current git repo is in reasonable shape
> to do that testing against (or someone has another).

Sounds like writing a comprehensive test suite against Tom's spec would be the 
first step.  And then this test suite can be run against various conversion 
tools and configurations thereof.



Re: PostgreSQL Developer meeting minutes up

From
Robert Haas
Date:
On Fri, May 29, 2009 at 2:41 AM, Markus Wanner <markus@bluegap.ch> wrot> Hi,
> Quoting "Robert Haas" <robertmhaas@gmail.com>:
>> Why is this harder than I think it is?
>
> One of the simplest possible example is something like:

Thanks for the explanation, I understand it better now.  I'm still
dismayed, but at least I know why I'm dismayed.

...Robert


Re: PostgreSQL Developer meeting minutes up

From
"Markus Wanner"
Date:
Hi,

Quoting "Aidan Van Dyk" <aidan@highrise.ca>:
>> Ok, so seeing the interest in having a "good conversion", I took a stab at
>> parsecvs this afternoon, probably what I consider the leading "static"
>> conversion tool.

Here are some results from a conversion with cvs2git.

>> It takes about 10 minutes to run my old xeon.

The conversion with cvs2git certainly took a bit longer, however, I
don't think that matters at all. Everything below a day or two is good
enough, IMO. What counts is the result.

The first step is running cvs2git itself:

cvs2svn Statistics:
------------------
Total CVS Files:              6873
Total CVS Revisions:        140191
Total CVS Branches:          36057
Total CVS Tags:             457515
Total Unique Tags:             171
Total Unique Branches:          21
CVS Repos Size in KB:       377337
Total SVN Commits:           32889
First Revision Date:    Tue Jul  9 08:21:07 1996
Last Revision Date:     Thu May 28 22:02:10 2009

(number of files matches pretty well with my own algorithm, however,
total svn commits is a bit lower, compared to the ~ 40'000 blobs I got).

The output of cvs2git can then be imported with git fast-import:

git-fast-import statistics:
---------------------------------------------------------------------
Alloc'd objects:     350000
Total objects:       349405 (     19563 duplicates                  )      blobs  :       132672 (      3255 duplicates
   119032 deltas)      trees  :       183967 (     16308 duplicates     165582 deltas)      commits:        32766 (
   0 duplicates          0 deltas)      tags   :            0 (         0 duplicates          0 deltas) 
Total branches:         194 (       664 loads     )      marks:     1073741824 (    168693 unique    )      atoms:
    5280 
Memory total:         16532 KiB       pools:          2860 KiB     objects:         13671 KiB
---------------------------------------------------------------------
pack_report: getpagesize()            =       4096
pack_report: core.packedGitWindowSize = 1073741824
pack_report: core.packedGitLimit      = 8589934592
pack_report: pack_used_ctr            =     124414
pack_report: pack_mmap_calls          =       3674
pack_report: pack_open_windows        =          1 /          1
pack_report: pack_mapped              =  199500913 /  199500913
---------------------------------------------------------------------


The resulting repository contains the following branches. The
unlabeled ones contain only 1-2 files and seem rather irrelevant. In a
next try, I'd disable their creation completely, just wanted to check.
  REL2_0B  REL6_4  REL6_5_PATCHES  REL7_0_PATCHES  REL7_1_STABLE  REL7_2_STABLE  REL7_3_STABLE  REL7_4_STABLE  REL8_0_0
REL8_0_STABLE  REL8_1_STABLE  REL8_2_STABLE  REL8_3_STABLE  Release_1_0_3  WIN32_DEV  ecpg_big_bison 
* master  unlabeled-1.44.2   -> from src/backend/commands/tablecmds.c  unlabeled-1.51.2   -> from
src/test/regress/expected/alter_table.out unlabeled-1.59.2   -> from src/backend/executor/execTuples.c
unlabeled-1.87.2  -> from src/backend/executor/nodeAgg.c  unlabeled-1.90.2   -> from src/backend/parser/parse_target.c
and                            src/backend/access/common/tupdesc.c 

Comparison of the head of each branch between git and CVS (modulo CVS
keyword expansion, which I've filtered out):

ecpg_big_bison.diff:      0 files changed
master.diff:              0 files changed
REL2_0B.diff:             0 files changed
REL6_4.diff:              0 files changed
REL6_5_PATCHES.diff:      0 files changed
REL7_0_PATCHES.diff:      0 files changed
REL7_1_STABLE.diff:       0 files changed
REL7_2_STABLE.diff:       0 files changed
REL7_3_STABLE.diff:       0 files changed
REL7_4_STABLE.diff:       0 files changed
REL8_0_0.diff:            0 files changed
REL8_0_STABLE.diff:       0 files changed
REL8_1_STABLE.diff:       0 files changed
REL8_2_STABLE.diff:       0 files changed
REL8_3_STABLE.diff:       0 files changed
Release_1_0_3.diff:       0 files changed
WIN32_DEV.diff:           0 files changed

I plan to compare the tags as well and test what branch they are in,
but so far cvs2git seems to hold its promises. I'll report back again
within the next few days.

Regards

Markus Wanner


Re: PostgreSQL Developer meeting minutes up

From
Aidan Van Dyk
Date:
* Markus Wanner <markus@bluegap.ch> [090529 11:06]:
> Hi,

> Comparison of the head of each branch between git and CVS (modulo CVS  
> keyword expansion, which I've filtered out):

How did you filter it out, and without the filtering out, how does it
do?

> I plan to compare the tags as well and test what branch they are in, but 
> so far cvs2git seems to hold its promises. I'll report back again within 
> the next few days.

It definitely seems to have figured out the REL8_0_0 confusing that
tripped up parsecvs.  If I'm stuck on another windows project some time
in the near future, I'll try and look into why parsecvs trips up on
those 3 files from REL8_0_0 branch ;-)

a.

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: PostgreSQL Developer meeting minutes up

From
"Markus Wanner"
Date:
Hi,

Quoting "Aidan Van Dyk" <aidan@highrise.ca>:
> * Markus Wanner <markus@bluegap.ch> [090529 11:06]:
>> Comparison of the head of each branch between git and CVS (modulo CVS
>> keyword expansion, which I've filtered out):
>
> How did you filter it out

With perl some regexes.

> and without the filtering out, how does it do?

Uh.. why is that of interest? With content hashing, these keywords do  
more harm than good.

I'd have to check again, but there certainly are differences here and there.

Regards

Markus Wanner



Re: PostgreSQL Developer meeting minutes up

From
Aidan Van Dyk
Date:
* Markus Wanner <markus@bluegap.ch> [090529 11:18]:
> Hi,
>
> Quoting "Aidan Van Dyk" <aidan@highrise.ca>:
>> * Markus Wanner <markus@bluegap.ch> [090529 11:06]:
>>> Comparison of the head of each branch between git and CVS (modulo CVS
>>> keyword expansion, which I've filtered out):
>>
>> How did you filter it out
>
> With perl some regexes.
>
>> and without the filtering out, how does it do?
>
> Uh.. why is that of interest? With content hashing, these keywords do  
> more harm than good.

Yes, but the point is you want an exact replica of CVS right?  You're
git repo should have $PostgreSQL$ and the cvs export/checkout (you do
use -kk right) should also have $PostgreSQL$.

The 3 parsecvs errors were that it *didn't* recognoze the strange
$PostgreSQL ... Exp $ expansion that cvs did.

But it's important, because on *some* files you *do* want expanded
"keywords" (like the $OpenBSD ... Exp $.  One of the reasons pg CVS went
to the $PostgreSQL$ keyword (I'm guessing) was so they could explictly
de-couple them from other keywords that they didn't want munging on.

So, I wouldn't consider any conversion good unless it had all these:parsecvs-master:contrib/pgcrypto/crypt-des.c: *
$FreeBSD:src/secure/lib/libcrypt/crypt-des.c,v 1.12 1999/09/20 12:39:20 markm Exp
$parsecvs-master:contrib/pgcrypto/crypt-md5.c:* $FreeBSD: src/lib/libcrypt/crypt-md5.c,v 1.5 1999/12/17 20:21:45 peter
Exp$parsecvs-master:contrib/pgcrypto/md5.c:/*          $KAME: md5.c,v 1.3 2000/02/22 14:01:17 itojun Exp $
*/parsecvs-master:contrib/pgcrypto/md5.h:/*         $KAME: md5.h,v 1.3 2000/02/22 14:01:18 itojun Exp $
*/parsecvs-master:contrib/pgcrypto/rijndael.c:/* $OpenBSD: rijndael.c,v 1.6 2000/12/09 18:51:34 markus Exp $
*/parsecvs-master:contrib/pgcrypto/rijndael.h:*  $OpenBSD: rijndael.h,v 1.3 2001/05/09 23:01:32 markus Exp $
*/parsecvs-master:contrib/pgcrypto/sha1.c:/*        $KAME: sha1.c,v 1.3 2000/02/22 14:01:18 itojun Exp $
*/parsecvs-master:contrib/pgcrypto/sha1.h:/*        $KAME: sha1.h,v 1.4 2000/02/22 14:01:18 itojun Exp $
*/parsecvs-master:contrib/pgcrypto/sha2.c:/*     $OpenBSD: sha2.c,v 1.6 2004/05/03 02:57:36 millert Exp $
*/parsecvs-master:contrib/pgcrypto/sha2.h:/*     $OpenBSD: sha2.h,v 1.2 2004/04/28 23:11:57 millert Exp $
*/parsecvs-master:src/backend/port/darwin/system.c:* $FreeBSD: src/lib/libc/stdlib/system.c,v 1.6 2000/03/16 02:14:41
jasoneExp $parsecvs-master:src/port/crypt.c:/*     $NetBSD: crypt.c,v 1.18 2001/03/01 14:37:35 wiz Exp $
*/parsecvs-master:src/port/crypt.c:__RCSID("$NetBSD:crypt.c,v 1.18 2001/03/01 14:37:35 wiz Exp
$");parsecvs-master:src/port/qsort.c:/*    $NetBSD: qsort.c,v 1.13 2003/08/07 16:43:42 agc Exp $
*/parsecvs-master:src/port/qsort_arg.c:/*$NetBSD: qsort.c,v 1.13 2003/08/07 16:43:42 agc Exp $
*/parsecvs-master:src/port/strlcat.c:*   $OpenBSD: strlcat.c,v 1.13 2005/08/08 08:05:37 espie Exp $
*/parsecvs-master:src/port/strlcpy.c:/*  $OpenBSD: strlcpy.c,v 1.11 2006/05/05 15:27:38 millert Exp $    */
 

As well as stuff like:parsecvs-master:src/backend/access/index/genam.c: *
$PostgreSQL$parsecvs-master:src/backend/access/index/indexam.c:*
$PostgreSQL$parsecvs-master:src/backend/access/nbtree/Makefile:#
$PostgreSQL$parsecvs-master:src/backend/access/nbtree/README:$PostgreSQL$parsecvs-master:src/backend/access/nbtree/nbtcompare.c:
*        $PostgreSQL$parsecvs-master:src/backend/access/nbtree/nbtinsert.c: *
$PostgreSQL$parsecvs-master:src/backend/access/nbtree/nbtpage.c:*
$PostgreSQL$parsecvs-master:src/backend/access/nbtree/nbtree.c:*
$PostgreSQL$parsecvs-master:src/backend/access/nbtree/nbtsearch.c:*          $PostgreSQL$
 

Basically, identical what to a cvs export/checkout/update gives you with
a "-kk".

But I'm picky ;-)

a.

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: PostgreSQL Developer meeting minutes up

From
Alvaro Herrera
Date:
Aidan Van Dyk wrote:

> Yes, but the point is you want an exact replica of CVS right?  You're
> git repo should have $PostgreSQL$ and the cvs export/checkout (you do
> use -kk right) should also have $PostgreSQL$.
> 
> The 3 parsecvs errors were that it *didn't* recognoze the strange
> $PostgreSQL ... Exp $ expansion that cvs did.

Huh, no -- I agree that $OpenBSD$ etc should remain (we don't munge them
anyway), but $PostgreSQL$, $Id$, $Revision$ etc tags are best gone
because, as Markus says, their expansion interferes with content hashing.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: PostgreSQL Developer meeting minutes up

From
Alvaro Herrera
Date:
Tom Lane escribió:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Tom Lane escribi�:
> >> What was in the back of my mind was that we'd go around and mass-remove
> >> $PostgreSQL$ (and any other lurking tags), but only from HEAD and only
> >> after the repo conversion.  Although just before it would be okay too.
> 
> > You mean we would remove them from CVS?  I don't think that's
> > necessarily a good idea; it'd be massive changes for no good reason.
> 
> Uh, how is it different from any other mass edit, such as our annual
> copyright-year updates, or pgindent runs?

Well, the other mass edits have a purpose.  This one would be only to
help the migration.

> > My idea was to remove them from the repository that would be used for the
> > conversion (I think that means editing the ,v files),
> 
> Ick ... I'm willing to tolerate a few small manual ,v edits if we have
> to do it to make tags consistent or something like that.  I don't think
> we should be doing massive edits of that kind.

Yeah, that idea wasn't all that great after all.

> But anyway, that's not the interesting point.  The interesting point is
> what about the historical aspect of it, not whether we want to dispense
> with the tags going forward.  Should our repo conversion try to
> represent the historical states of the files including the tag strings?

Since we're going to lose them functionally after the conversion, it
doesn't seem that they serve any purpose.  After all, they will not
represent anything on the new repository.

The problem is that they are a problem for the conversion.  Are they
expanded before or after the commit?  Because the very expansion causes
the file to change identity, files being identified by the SHA1 sum of
their contents.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: PostgreSQL Developer meeting minutes up

From
Aidan Van Dyk
Date:
* Alvaro Herrera <alvherre@commandprompt.com> [090529 11:45]:
> Aidan Van Dyk wrote:
> 
> > Yes, but the point is you want an exact replica of CVS right?  You're
> > git repo should have $PostgreSQL$ and the cvs export/checkout (you do
> > use -kk right) should also have $PostgreSQL$.
> > 
> > The 3 parsecvs errors were that it *didn't* recognoze the strange
> > $PostgreSQL ... Exp $ expansion that cvs did.
> 
> Huh, no -- I agree that $OpenBSD$ etc should remain (we don't munge them
> anyway), but $PostgreSQL$, $Id$, $Revision$ etc tags are best gone
> because, as Markus says, their expansion interferes with content hashing.

I *think* you're actually agreeing with me.  *Hiding* the diffs that
include munching of keywords is not what we want.  We want the
conversion to *not* munge "keyword-like" things (No, $OpenBSD$ is *not*
a keyword in the PostgreSQL CVS repository.  But $PostgreSQL$ *is*.

So we want the conversion to be identical to:    cvs export -kk -r $tag

That will have *keywords* be unexpanded; namely these specific
ones:AuthorDateHeaderIdLockerLogNameRCSfileRevisionSourceStatePostgreSQL
but *not* "keyword-like" entries, like:$ NetBSD ... Exp $$ FreeBSD ... Exp $$ OpenBSD ... Exp $$ KAME ... Exp $
which are *not* CVS keywords in the PostgreSQL repository.  

i.e. Just like I said, "identical to cvs checkout/export -kk.


Now, and intersting question, do you want the "perfect" conversion to
contain *other* keyword un-expansion possiblities that would have happened
on any commits on Nov 29/30 2003 when CVSROOT/options contained:+tagexpand=iPostgreSQL
If you had checked out something on that day, even with a -kk, $Log$
would have been expanded, because for that day, $Log$ was *not* an
eligable keyword on the PostgreSQL CVS repository.

Whooee... Fun with CVS history....

a.


-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: PostgreSQL Developer meeting minutes up

From
Markus Wanner
Date:
Hi,

a newish conversion with cvs2git is available to check here:

  git://www.bluegap.ch/

(it's not incremental and will only stay for a few days)


For everybody interested, please check the committer names and emails.
I'm missing the names and email addresses for these committers:

    'barry' : ('barry??', ''),
    'dennis' : ('Dennis??', ''),
    'inoue' : ('inoue??', ''),
    'jurka' : ('jurka??', ''),
    'pjw' : ('pjw??', ''),

And I'm guessing that 'peter' is the same as 'petere':

    'peter' : ('Peter Eisentraut (?)', 'peter_e@gmx.net'),


I've compared all branch heads and all tags with a cvs checkout. The
only differences are keyword expansion errors. Most commonly the RCS
version "1.1" is used in the resulting git repository, instead of
version "1.1.1.1". This also leads to getting dates wrong ($Date keyword).

I'm unsure on how to test Tom's requirement that every commit and its
log message is included in the resulting git repository. Feel free to
clone and inspect the mentioned git repository and propose improvements
on the cvs2git options used.

Aidan Van Dyk wrote:
> Yes, but the point is you want an exact replica of CVS right?  You're
> git repo should have $PostgreSQL$ and the cvs export/checkout (you do
> use -kk right) should also have $PostgreSQL$.

No, I'm testing against cvs checkout, as that's what everybody is used to.

> But it's important, because on *some* files you *do* want expanded
> "keywords" (like the $OpenBSD ... Exp $.  One of the reasons pg CVS went
> to the $PostgreSQL$ keyword (I'm guessing) was so they could explictly
> de-couple them from other keywords that they didn't want munging on.

I don't care half as much about the keyword expansion stuff - that's
doomed to disappear anyway.

What I'm much more interested in is correctness WRT historic contents,
i.e. that git log, git blame, etc.. deliver correct results. That's
certainly harder to check.

In my experience, cvs2svn (or cvs2git) does a pretty decent job at that,
even in case of some corruptions. Plus it offers lots of options to fine
tune the conversion, see the attached configuration I've used.

> So, I wouldn't consider any conversion good unless it had all these:
>
> As well as stuff like:
>     parsecvs-master:src/backend/access/index/genam.c: *       $PostgreSQL$

I disagree here and find it more convenient for the git repository to
keep the "old" RCS versions - as in the source tarballs that got (and
still get) shipped. Just before switching over to git one can (and
should, IMO) remove these tags to avoid confusion.

Regards

Markus Wanner
# (Be in -*- mode: python; coding: utf-8 -*- mode.)

import re

from cvs2svn_lib import config
from cvs2svn_lib import changeset_database
from cvs2svn_lib.common import CVSTextDecoder
from cvs2svn_lib.log import Log
from cvs2svn_lib.project import Project
from cvs2svn_lib.git_revision_recorder import GitRevisionRecorder
from cvs2svn_lib.git_output_option import GitRevisionMarkWriter
from cvs2svn_lib.git_output_option import GitOutputOption
from cvs2svn_lib.revision_manager import NullRevisionRecorder
from cvs2svn_lib.revision_manager import NullRevisionExcluder
from cvs2svn_lib.fulltext_revision_recorder \
     import SimpleFulltextRevisionRecorderAdapter
from cvs2svn_lib.rcs_revision_manager import RCSRevisionReader
from cvs2svn_lib.cvs_revision_manager import CVSRevisionReader
from cvs2svn_lib.checkout_internal import InternalRevisionRecorder
from cvs2svn_lib.checkout_internal import InternalRevisionExcluder
from cvs2svn_lib.checkout_internal import InternalRevisionReader
from cvs2svn_lib.symbol_strategy import AllBranchRule
from cvs2svn_lib.symbol_strategy import AllTagRule
from cvs2svn_lib.symbol_strategy import BranchIfCommitsRule
from cvs2svn_lib.symbol_strategy import ExcludeRegexpStrategyRule
from cvs2svn_lib.symbol_strategy import ForceBranchRegexpStrategyRule
from cvs2svn_lib.symbol_strategy import ForceTagRegexpStrategyRule
from cvs2svn_lib.symbol_strategy import ExcludeTrivialImportBranchRule
from cvs2svn_lib.symbol_strategy import ExcludeVendorBranchRule
from cvs2svn_lib.symbol_strategy import HeuristicStrategyRule
from cvs2svn_lib.symbol_strategy import UnambiguousUsageRule
from cvs2svn_lib.symbol_strategy import HeuristicPreferredParentRule
from cvs2svn_lib.symbol_strategy import SymbolHintsFileRule
from cvs2svn_lib.symbol_transform import ReplaceSubstringsSymbolTransform
from cvs2svn_lib.symbol_transform import RegexpSymbolTransform
from cvs2svn_lib.symbol_transform import IgnoreSymbolTransform
from cvs2svn_lib.symbol_transform import NormalizePathsSymbolTransform
from cvs2svn_lib.property_setters import AutoPropsPropertySetter
from cvs2svn_lib.property_setters import CVSBinaryFileDefaultMimeTypeSetter
from cvs2svn_lib.property_setters import CVSBinaryFileEOLStyleSetter
from cvs2svn_lib.property_setters import CVSRevisionNumberSetter
from cvs2svn_lib.property_setters import DefaultEOLStyleSetter
from cvs2svn_lib.property_setters import EOLStyleFromMimeTypeSetter
from cvs2svn_lib.property_setters import ExecutablePropertySetter
from cvs2svn_lib.property_setters import KeywordsPropertySetter
from cvs2svn_lib.property_setters import MimeMapper
from cvs2svn_lib.property_setters import SVNBinaryFileKeywordsPropertySetter

Log().log_level = Log.NORMAL
ctx.revision_recorder = SimpleFulltextRevisionRecorderAdapter(
    CVSRevisionReader(cvs_executable=r'cvs'),
    GitRevisionRecorder('cvs2git-tmp/git-blob.dat'),
    )

ctx.revision_excluder = NullRevisionExcluder()

ctx.revision_reader = None

ctx.sort_executable = r'sort'

ctx.trunk_only = False

ctx.cvs_author_decoder = CVSTextDecoder(
    ['ascii', 'latin1'],
    )
ctx.cvs_log_decoder = CVSTextDecoder(
    ['ascii', 'latin1'],
    )
ctx.cvs_filename_decoder = CVSTextDecoder(
    ['ascii', 'latin1'],
    )

ctx.initial_project_commit_message = (
    'Standard project directories initialized by cvs2git.'
    )

ctx.post_commit_message = (
    'This commit was generated by cvs2git to track changes on a CVS '
    'vendor branch.'
    )

ctx.symbol_commit_message = (
    "This commit was manufactured by cvs2git to create %(symbol_type)s "
    "'%(symbol_name)s'."
    )

ctx.decode_apple_single = False

ctx.symbol_info_filename = None

global_symbol_strategy_rules = [
    ExcludeTrivialImportBranchRule(),
    UnambiguousUsageRule(),
    BranchIfCommitsRule(),
    HeuristicStrategyRule(),

    # Convert all ambiguous symbols as branches:
    AllBranchRule(),
    # Convert all ambiguous symbols as tags:
    AllTagRule(),

    # The last rule is here to choose the preferred parent of branches
    # and tags, that is, the line of development from which the symbol
    # sprouts.
    HeuristicPreferredParentRule(),
    ]

ctx.username = 'cvs2git'

ctx.svn_property_setters.extend([
    CVSBinaryFileEOLStyleSetter(),
    CVSBinaryFileDefaultMimeTypeSetter(),
    DefaultEOLStyleSetter(None),
    SVNBinaryFileKeywordsPropertySetter(),
    KeywordsPropertySetter(config.SVN_KEYWORDS_VALUE),
    ExecutablePropertySetter(),
    ])

ctx.tmpdir = r'cvs2git-tmp'

ctx.cross_project_commits = False
ctx.cross_branch_commits = False
ctx.keep_cvsignore = True

ctx.retain_conflicting_attic_files = True

author_transforms={

    'adunstan' : ('Androw Dunstan', 'andrew@dunslane.net'),
    'alvherre' : ('Alvaro Herrera', 'alvherre@commandprompt.com'),
    'barry' : ('barry??', ''),
    'bryanh' : ('Bryan Henderson', 'bryanh@giraffe.netgate.net'),
    'darcy' : ('D\'Arcy J.M. Cain', 'darcy@druid.net'),
    'dennis' : ('Dennis??', ''),
    'heikki' : ('Heikki Linnakangas', 'heikki.linnakangas@enterprisedb.com'),
    'inoue' : ('inoue??', ''),
    'ishii' : ('Tatsuo Ishii', 'ishii@sraoss.co.jp'),
    'joe' : ('Joe Conway', 'mail@joeconway.com'),
    'jurka' : ('jurka??', ''),
    'meskes' : ('Michael Meskes', 'meskes@postgresql.org'),
    'mha': ('Magnus Hagander', 'magnus@hagander.net'),
    'momjian' : ('Bruce Momjian', 'bruce@momjian.us'),
    'neilc' : ('Neil Conway', 'neil.conway@gmail.com'),
    'petere' : ('Peter Eisentraut', 'peter_e@gmx.net'),
    'peter' : ('Peter Eisentraut (?)', 'peter_e@gmx.net'),
    'pjw' : ('pjw??', ''),
    'scrappy' : ('Marc G. Fournier', 'scrappy@postgresql.org'),
    'teodor' : ('Teodor Sigaev', 'teodor@sigaev.ru'),
    'tgl' : ('Tom Lane', 'tgl@sss.pgh.pa.us'),
    'vadim' : ('Vadim B. Mikheev', 'vadim4o@yahoo.com'),
    'wieck' : ('Jan Wieck', 'JanWieck@yahoo.com'),

    'cvs2git' : ('cvs2git', 'admin@postgresql.org'),
    }

# This is the main option that causes cvs2svn to output to git rather
# than Subversion:
ctx.output_option = GitOutputOption(
    'cvs2git-tmp/git-dump.dat',
    GitRevisionMarkWriter(),
    max_merges=None,
    author_transforms=author_transforms,
    )

run_options.profiling = False


changeset_database.use_mmap_for_cvs_item_to_changeset_table = True

run_options.set_project(
    r'../postgresql.org/pgsql',

    symbol_transforms=[
        ReplaceSubstringsSymbolTransform('\\','/'),
        NormalizePathsSymbolTransform(),
        ],

    symbol_strategy_rules=global_symbol_strategy_rules,
    )


Re: PostgreSQL Developer meeting minutes up

From
Tom Lane
Date:
Markus Wanner <markus@bluegap.ch> writes:
> I'm missing the names and email addresses for these committers:

>     'barry' : ('barry??', ''),

Barry Lind, formerly one of the JDBC bunch, been inactive for awhile

>     'dennis' : ('Dennis??', ''),

I suppose this must be Dennis Bj�rklund, but I didn't realize he
used to be a committer.

>     'inoue' : ('inoue??', ''),

Hiroshi Inoue, still active, but ODBC is not part of core anymore

>     'jurka' : ('jurka??', ''),

Kris Jurka, still active, but JDBC is not part of core anymore

>     'pjw' : ('pjw??', ''),

Philip Warner, inactive (still reads the lists though)

> And I'm guessing that 'peter' is the same as 'petere':

>     'peter' : ('Peter Eisentraut (?)', 'peter_e@gmx.net'),

No, that would be Peter Mount, also a retired JDBC hacker.
        regards, tom lane


Re: PostgreSQL Developer meeting minutes up

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Markus Wanner <markus@bluegap.ch> writes:

> >     'dennis' : ('Dennis??', ''),
> 
> I suppose this must be Dennis Bj�rklund, but I didn't realize he
> used to be a committer.

IIRC he was given commit privs for translation files.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: PostgreSQL Developer meeting minutes up

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> I suppose this must be Dennis Bj�rklund, but I didn't realize he
>> used to be a committer.

> IIRC he was given commit privs for translation files.

Ah, right, that does ring a bell now.

BTW, Markus: you do realize "thomas" is not me but Tom Lockhart?
        regards, tom lane


Re: PostgreSQL Developer meeting minutes up

From
Marko Kreen
Date:
On 6/1/09, Markus Wanner <markus@bluegap.ch> wrote:
>  a newish conversion with cvs2git is available to check here:
>
>   git://www.bluegap.ch/
>
>  (it's not incremental and will only stay for a few days)

+1 for the idea of replacing CVS usernames with full names.

The knowledge about CVS usernames will be increasingly obscure.

Also worth mentioning is that there is no need to assign absolutely
up-to-date email addresses, it's enough if they uniquely identify
person.

>  Aidan Van Dyk wrote:
>  > Yes, but the point is you want an exact replica of CVS right?  You're
>  > git repo should have $PostgreSQL$ and the cvs export/checkout (you do
>  > use -kk right) should also have $PostgreSQL$.
>
>
> No, I'm testing against cvs checkout, as that's what everybody is used to.
>
>
>  > But it's important, because on *some* files you *do* want expanded
>  > "keywords" (like the $OpenBSD ... Exp $.  One of the reasons pg CVS went
>  > to the $PostgreSQL$ keyword (I'm guessing) was so they could explictly
>  > de-couple them from other keywords that they didn't want munging on.
>
>
> I don't care half as much about the keyword expansion stuff - that's
>  doomed to disappear anyway.

But this is one aspect we need to get right for the conversion.

So preferably we test it sooner not later.

I think Aidan got it right - expand $PostgreSQL$ and others that are
actually expanded on current repo, but not $OpenBSD$ and others
coming from external sources.

>  What I'm much more interested in is correctness WRT historic contents,
>  i.e. that git log, git blame, etc.. deliver correct results. That's
>  certainly harder to check.
>
>  In my experience, cvs2svn (or cvs2git) does a pretty decent job at that,
>  even in case of some corruptions. Plus it offers lots of options to fine
>  tune the conversion, see the attached configuration I've used.
>
>
>  > So, I wouldn't consider any conversion good unless it had all these:
>  >
>
> > As well as stuff like:
>  >       parsecvs-master:src/backend/access/index/genam.c: *       $PostgreSQL$
>
>
> I disagree here and find it more convenient for the git repository to
>  keep the "old" RCS versions - as in the source tarballs that got (and
>  still get) shipped. Just before switching over to git one can (and
>  should, IMO) remove these tags to avoid confusion.

I'd prefer we immediately test full conversion and not leave some
steps to last moment.

-- 
marko


Re: PostgreSQL Developer meeting minutes up

From
Marko Kreen
Date:
On 6/2/09, Marko Kreen <markokr@gmail.com> wrote:
> On 6/1/09, Markus Wanner <markus@bluegap.ch> wrote:
>  >  a newish conversion with cvs2git is available to check here:
>  >
>  >   git://www.bluegap.ch/
>  >
>  >  (it's not incremental and will only stay for a few days)

Btw this conversion seems broken as it contains random merge commits.

parsecvs managed to do it without them.

-- 
marko


Re: PostgreSQL Developer meeting minutes up

From
"Markus Wanner"
Date:
Hi,

Quoting "Marko Kreen" <markokr@gmail.com>:
>> I don't care half as much about the keyword expansion stuff - that's
>>  doomed to disappear anyway.
>
> But this is one aspect we need to get right for the conversion.

What's your definition of "right"? I personally prefer the keyword
expansion to match a cvs checkout as closely as possible.

> So preferably we test it sooner not later.

I actually *am* testing against that. As mentioned, the only
differences are insignificant, IMO. For example having "1.1.1.1"
instead of "1.1" (or vice versa, I don't remember).

> I think Aidan got it right - expand $PostgreSQL$ and others that are
> actually expanded on current repo, but not $OpenBSD$ and others
> coming from external sources.

AFAIU Aidan proposed the exact opposite.

I'm proposing to leave both expanded, as in a CVS checkout and as
shipped in the source release tarballs.

> I'd prefer we immediately test full conversion and not leave some
> steps to last moment.

IMO that would equal to changing history, so that a checkout from git
doesn't match a released tarball as good as possible.

What you call "leave(ing) some steps to last moment" is IMO not part
of the conversion. It's rather a conscious decision to drop these
keywords as soon as we switch to git. This step should be represented
in history as a separate commit, IMO.

What do others think?

Regards

Markus Wanner



Re: PostgreSQL Developer meeting minutes up

From
Marko Kreen
Date:
On 6/2/09, Markus Wanner <markus@bluegap.ch> wrote:
>  Quoting "Marko Kreen" <markokr@gmail.com>:
> > > I don't care half as much about the keyword expansion stuff - that's
> > >  doomed to disappear anyway.
> > >
> >
> > But this is one aspect we need to get right for the conversion.
> >
>
>  What's your definition of "right"? I personally prefer the keyword
> expansion to match a cvs checkout as closely as possible.

This is Definitely Wrong (tm).  You seem to be thinking that comparing
GIT checkout to random parallel CVS checkout (eg. from .tgz.) is the
main use-case.  It is not.  Browsing history and looking and diffs
between versions is.  And expanded CVS keywords would be total PITA
for that.

> > So preferably we test it sooner not later.
> >
>
>  I actually *am* testing against that. As mentioned, the only differences
> are insignificant, IMO. For example having "1.1.1.1" instead of "1.1" (or
> vice versa, I don't remember).

Why have those at all...

> > I think Aidan got it right - expand $PostgreSQL$ and others that are
> > actually expanded on current repo, but not $OpenBSD$ and others
> > coming from external sources.
> >
>
>  AFAIU Aidan proposed the exact opposite.

Ah, sorry, my thinko.  s/expanded/stripped/.  Take Aidan's description
as authoritative.. :)

>  I'm proposing to leave both expanded, as in a CVS checkout and as shipped
> in the source release tarballs.

No, the noise they add to history would seriously hurt usability.

> > I'd prefer we immediately test full conversion and not leave some
> > steps to last moment.
> >
>
>  IMO that would equal to changing history, so that a checkout from git
> doesn't match a released tarball as good as possible.

We need to compare against tarballs only when checking the conversion.
And only then.  Writing few scripts for that should not be a problem.

>  What you call "leave(ing) some steps to last moment" is IMO not part of the
> conversion. It's rather a conscious decision to drop these keywords as soon
> as we switch to git. This step should be represented in history as a
> separate commit, IMO.

The question is how they should appear in historical commits.

I have no strong opinion whether to edit them out or not in the future.
Doing it during the periodic reindent would be good moment tho'.

-- 
marko


Re: PostgreSQL Developer meeting minutes up

From
"Markus Wanner"
Date:
Hi,

Quoting "Marko Kreen" <markokr@gmail.com>:
> Btw this conversion seems broken as it contains random merge commits.

Well, that's a feature, not a bug ;-)

When a commit adds a file to the master *and* then to the branch as
well, cvs2git prefers to represent this as a merge from the master
branch, instead of adding the file twice, once on the master and once
on the branch.

This way the target VCS knows it's the *same* file, originating from
one single commit. This may be important for later merges - otherwise
you may suddenly end up with duplicated files after a merge, because
the VCS doesn't know they are in fact the same.

(Okay, git assumes two files to have the same origin/history as long
as they have the same filename. But just rename one of the two, and
you are have the same troubles, again).

Also note that these situations occur rather frequently in the
Postgres CVS repository. Every back-patch which adds files ends up as
a merge. (One could even argue that in the perfect conversion *all*
back-patches should be represented as merges, rather than as separate
commits).

> parsecvs managed to do it without them.

Now, I'm not calling it broken, but cvs2git's output is arguably
better in that regard.

As you certainly see by now, conversion from CVS is neither simple nor
unambiguous.

Regards

Markus Wanner


Re: PostgreSQL Developer meeting minutes up

From
"Markus Wanner"
Date:
Hi,

Quoting "Marko Kreen" <markokr@gmail.com>:
> This is Definitely Wrong (tm).  You seem to be thinking that comparing
> GIT checkout to random parallel CVS checkout (eg. from .tgz.) is the
> main use-case.  It is not.  Browsing history and looking and diffs
> between versions is.  And expanded CVS keywords would be total PITA
> for that.

That's an agrument. Point taken. I'll check if cvs2git supports that as well.

Regards

Markus Wanner




Re: PostgreSQL Developer meeting minutes up

From
Marko Kreen
Date:
On 6/2/09, Markus Wanner <markus@bluegap.ch> wrote:
>  Quoting "Marko Kreen" <markokr@gmail.com>:
> > Btw this conversion seems broken as it contains random merge commits.
> >
>
>  Well, that's a feature, not a bug ;-)
>
>  When a commit adds a file to the master *and* then to the branch as well,
> cvs2git prefers to represent this as a merge from the master branch, instead
> of adding the file twice, once on the master and once on the branch.
>
>  This way the target VCS knows it's the *same* file, originating from one
> single commit. This may be important for later merges - otherwise you may
> suddenly end up with duplicated files after a merge, because the VCS doesn't
> know they are in fact the same.
>
>  (Okay, git assumes two files to have the same origin/history as long as
> they have the same filename. But just rename one of the two, and you are
> have the same troubles, again).

Not a problem for git I think - it assumes they are same if they have
same contents...

>  Also note that these situations occur rather frequently in the Postgres CVS
> repository. Every back-patch which adds files ends up as a merge. (One could
> even argue that in the perfect conversion *all* back-patches should be
> represented as merges, rather than as separate commits).

Well, such behaviour may be a feature for some repo with complex CVS
usage, but currently we should aim for simple and clear conversion.

The question is - do such merges make any sense to human looking at
history - and the answer is no, as no VCS level merge was happening,
just some copying around (if your description is correct).  And
we don't need to add noise for the benefit of GIT as it works fine
without any fake merges.

Our target should be each branch having simple linear history,
without any fake merges.  This will result in minimal confusion
to both humans looking history and also GIT itself.

So please turn the merge logic off.  If this cannot be turned off,
cvs2git is not usable for conversion.

> > parsecvs managed to do it without them.
> >
>
>  Now, I'm not calling it broken, but cvs2git's output is arguably better in
> that regard.

Seems it contains more complex logic to handle more complex CVS usage
cases, but seems like overkill for us if it creates a mess of history.

>  As you certainly see by now, conversion from CVS is neither simple nor
> unambiguous.

I know, thats why I'm discussing the tradeoffs.  Simple+clear vs.
complex+messy. :)

-- 
marko


Re: PostgreSQL Developer meeting minutes up

From
Aidan Van Dyk
Date:
* Markus Wanner <markus@bluegap.ch> [090602 07:08]:
> Hi,
>
> Quoting "Marko Kreen" <markokr@gmail.com>:
>>> I don't care half as much about the keyword expansion stuff - that's
>>>  doomed to disappear anyway.
>>
>> But this is one aspect we need to get right for the conversion.
>
> What's your definition of "right"? I personally prefer the keyword  
> expansion to match a cvs checkout as closely as possible.

> AFAIU Aidan proposed the exact opposite.
>
> I'm proposing to leave both expanded, as in a CVS checkout and as  
> shipped in the source release tarballs.

Well, since I have -kk set in my .cvsrc, "mine" matches exactly the CVS
checkout l-)

Basically, I want the git to be identical to the cvs checkout.  If you
use -kk, that means the "PostgreSQL CVS repository keywords" *aren't*
expanded.  If you like -kv, that means they are.

Pick your poison (after all, it's CVS), either way, I think the 2 of
*us* are going to disagree which is best here ;-)

But, which ever way (exact to -kk or exact to -kv), the conversion
should be exact, and there should be no reason to "filter out
keyword-like stuff" in the diffs.

> What you call "leave(ing) some steps to last moment" is IMO not part of 
> the conversion. It's rather a conscious decision to drop these keywords 
> as soon as we switch to git. This step should be represented in history 
> as a separate commit, IMO.
>
> What do others think?

I'm assuming they will get removed from the source eventually too - but
that step is *outside* the conversion.  Somebody could do it now in CVS
before the conversion, or afterwards, but it's still outside the
conversion.


-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: PostgreSQL Developer meeting minutes up

From
"Markus Wanner"
Date:
Hi,

Quoting "Aidan Van Dyk" <aidan@highrise.ca>:
> Pick your poison (after all, it's CVS), either way, I think the 2 of
> *us* are going to disagree which is best here ;-)

Marko already convinced me of -kk, I'm trying that with cvs2git.

> But, which ever way (exact to -kk or exact to -kv), the conversion
> should be exact, and there should be no reason to "filter out
> keyword-like stuff" in the diffs.

I just really didn't want to care about keyword expansion. Besides  
lacking consistency, it's one of the worst misfeatures of CVS, IMNSHO.  
;-)

I'll let you know how cvs2git behaves WRT -kk.

Regards

Markus Wanner




Re: PostgreSQL Developer meeting minutes up

From
Aidan Van Dyk
Date:
* Markus Wanner <markus@bluegap.ch> [090602 09:37]:

> Marko already convinced me of -kk, I'm trying that with cvs2git.

Good ;-)

> I just really didn't want to care about keyword expansion. Besides  
> lacking consistency, it's one of the worst misfeatures of CVS, IMNSHO.  
> ;-)

Absolutely...  And one of the reasons I've had -kk in my .cvsrc for
years, even before I started with git.

> I'll let you know how cvs2git behaves WRT -kk.

Cool..

a.

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: PostgreSQL Developer meeting minutes up

From
"Markus Wanner"
Date:
Hi,

Quoting "Marko Kreen" <markokr@gmail.com>:
> Not a problem for git I think

Knowing that git doesn't track files as "hard" as monotone, I
certainly doubt that.

> - it assumes they are same if they have
> same contents...

Why do you assume they have the same contents? Obviously these are
different branches, where files can (and will!) have different contents.

> Well, such behaviour may be a feature for some repo with complex CVS
> usage, but currently we should aim for simple and clear conversion.

First of all, we should aim for a correct one.

> The question is - do such merges make any sense to human looking at
> history - and the answer is no, as no VCS level merge was happening,
> just some copying around (if your description is correct).  And
> we don't need to add noise for the benefit of GIT as it works fine
> without any fake merges.

For low expectations of "it works", maybe yes. However if you don't
tell git, it has no chance of knowing that two (different) files
should actually be the same.

Try the following:
 git init echo "base" > basefile git add basefile git commit -m "base commit" git checkout -b branch echo "hello,
world"> testfile git add testfile git commit testfile -m "addition on branch" git checkout master echo "hello world" >
testfilegit add testfile git commit testfile -m "addition on master" 
 # here we are a similar point like after a lacking conversion, having two # distinct, i.e. historically independent
filescalled "testfile" 
 git mv testfile movedfile git commit -m "file moved" git checkout branch git merge master ls
 # Bang, you suddenly have 'testfile' and 'movedfile', go figure!


I leave it as an exercise for the reader to try the same with a single
historic origin of the file, as cvs2git does the conversion.

> Our target should be each branch having simple linear history,
> without any fake merges.  This will result in minimal confusion
> to both humans looking history and also GIT itself.

I don't consider the above a "minimal confusion". And concerning
humans... you get used to merge commits pretty quickly. I for one am
more confused by a linear history which in fact is not.

As mentioned before, I'd personally favor *all* of the back-ports to
actually be merges of some sort, because that's what they effectively
are. However, that also bring up the question of how we are going to
do back-patches in the future with git.

> So please turn the merge logic off.  If this cannot be turned off,
> cvs2git is not usable for conversion.

As far as I know, it cannot be turned off. Use parsecvs if you want to
get silly side effects later on in history. ;-)

> Seems it contains more complex logic to handle more complex CVS usage
> cases, but seems like overkill for us if it creates a mess of history.

You consider it a mess, I consider it a better and more valid
representation of the mess that CVS is.

Regards

Markus Wanner


Re: PostgreSQL Developer meeting minutes up

From
Aidan Van Dyk
Date:
* Markus Wanner <markus@bluegap.ch> [090602 10:23]:

>  # Bang, you suddenly have 'testfile' and 'movedfile', go figure!
>
> I leave it as an exercise for the reader to try the same with a single  
> historic origin of the file, as cvs2git does the conversion.

Sure, and we can all construct example where that move is both right and
wrong...  But the point is that in PostgreSQL, (and that may be mainly
because we're using CVS), merges *aren't* something that happens.
Patches are written against HEAD (master) and then back-patched...

If you want to turn PostgreSQL devellopment on it's head, then we can
switch this around, so that patches are always done on the oldest
branch, and fixes always merged "forward"...

I'm not going to be the one that pushes that though ;-)

> I don't consider the above a "minimal confusion". And concerning  
> humans... you get used to merge commits pretty quickly. I for one am  
> more confused by a linear history which in fact is not.

But the fact is, everyone using CVS wants a "linear history"..... All
they care about is "cvs update...wait...cvs update ... time ... cvs
update ..."... Everything *was* linear to them.  Any "merge" type things
certaily wasn't intentional in CVS...

> As mentioned before, I'd personally favor *all* of the back-ports to  
> actually be merges of some sort, because that's what they effectively  
> are. However, that also bring up the question of how we are going to do 
> back-patches in the future with git.

Well, if people get comfortable with it, I expect that "backports" don't
happenen.. Bugs are fixed where they happen, and "merged" forward into
all affected "later development" based on the bugged area.

> As far as I know, it cannot be turned off. Use parsecvs if you want to  
> get silly side effects later on in history. ;-)

Ya, that's one of the reasons I considered parsecvs the leading
candidate...  And why I went thouth, and showed that with the exception
of the one REL_8_0_0 tip, it *was* and exact copy of the current CVS
repository (minus the 1 messed up tag in the repository).

> You consider it a mess, I consider it a better and more valid  
> representation of the mess that CVS is.

So much better that it makes the history as useless as CVS... I think
one of the reasons people are wanting tomove from CVS to git is that it
makes things *better*...  The "exact" history will *always* be
available, right in CVS if people need it.  I thin the goal is to make
the "git" history as close to CVS as possible, such that it's useful.  I
mean, if we want it to be a "more valid" representation, then really, we
should be doing every file change in a single commit, and "merging" that
file commit into the branch *every* *single* *time*... I don't think
anybody wants our conversion to be that much "better and move valid
representation of the mess that CVS is"...

It's a balance...  We're moving because we want *better* tools and
access, not the same "mess that CVS is".

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: PostgreSQL Developer meeting minutes up

From
Alvaro Herrera
Date:
Aidan Van Dyk escribió:
> * Markus Wanner <markus@bluegap.ch> [090602 10:23]:
> 
> >  # Bang, you suddenly have 'testfile' and 'movedfile', go figure!
> >
> > I leave it as an exercise for the reader to try the same with a single  
> > historic origin of the file, as cvs2git does the conversion.
> 
> Sure, and we can all construct example where that move is both right and
> wrong...  But the point is that in PostgreSQL, (and that may be mainly
> because we're using CVS), merges *aren't* something that happens.
> Patches are written against HEAD (master) and then back-patched...
> 
> If you want to turn PostgreSQL devellopment on it's head, then we can
> switch this around, so that patches are always done on the oldest
> branch, and fixes always merged "forward"...

The Monotone folk call this "daggy fixes" and it seems a clean way to
handle things.

http://www.monotone.ca/wiki/DaggyFixes/

However,

> I'm not going to be the one that pushes that though ;-)

I'm not either.  Maybe someday we'll be familiar enough with the tools
to make things this way, but I think just after the migration we'll
mainly want to be able to press on with development and not waste too
much time learning the new toys.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: PostgreSQL Developer meeting minutes up

From
Tom Lane
Date:
Aidan Van Dyk <aidan@highrise.ca> writes:
> * Markus Wanner <markus@bluegap.ch> [090602 10:23]:
>> You consider it a mess, I consider it a better and more valid  
>> representation of the mess that CVS is.

> So much better that it makes the history as useless as CVS... I think
> one of the reasons people are wanting tomove from CVS to git is that it
> makes things *better*...

FWIW, the tool that I customarily use (cvs2cl) considers commits on
different branches to be "the same" if they have the same commit message
and occur sufficiently close together (within a few minutes).  My
committing habits have been designed around that behavior for years,
and I believe other PG committers have been doing likewise.

I would consider a git conversion to be less useful to me, not more,
if it insists on showing me such cases as separate commits --- and if
it then adds useless "merge" messages on top of that, I'd start to get
seriously annoyed.

What we want here is a readable equivalent of the CVS history, not
necessarily something that is theoretically an exact equivalent.
        regards, tom lane


Re: PostgreSQL Developer meeting minutes up

From
"Markus Wanner"
Date:
Hi,

Quoting "Aidan Van Dyk" <aidan@highrise.ca>:
> Sure, and we can all construct example where that move is both right and
> wrong...

Huh? The problem is the file duplication. The move is an action of a
committer - it's neither right nor wrong in this example.

I cannot see any use case for seemingly random files poping up out of
nowhere, just because git doesn't know how to merge two files after a
mv and a merge.

> But the point is that in PostgreSQL, (and that may be mainly
> because we're using CVS), merges *aren't* something that happens.
> Patches are written against HEAD (master) and then back-patched...

..which can (and better is) represented as a merge in git (for the
sake of comfortable automated merging).

> If you want to turn PostgreSQL devellopment on it's head, then we can
> switch this around, so that patches are always done on the oldest
> branch, and fixes always merged "forward"...

I'd consider that good use of tools, yes. However, I realize that this
probably is pipe-dreaming...

> But the fact is, everyone using CVS wants a "linear history"..... All
> they care about is "cvs update...wait...cvs update ... time ... cvs
> update ..."... Everything *was* linear to them.  Any "merge" type things
> certaily wasn't intentional in CVS...

..no, it just wasn't possible in CVS. Switching to git, people soon
want "merge type things". Heck, it's probably *the* reason for
switching to git.

> So much better that it makes the history as useless as CVS... I think
> one of the reasons people are wanting tomove from CVS to git is that it
> makes things *better*...

Yes, especially merging. Please don't cripple that ability just
because CVS once upon a time enforced a linear history.

> The "exact" history will *always* be
> available, right in CVS if people need it.

Agreed. Please note that I mostly talk about a more correct
representation *of history*, as it happened. This has nothing to do
with single commits per file.

> It's a balance...  We're moving because we want *better* tools and
> access, not the same "mess that CVS is".

Agreed. And please cut as many of its burdens of the past, like
linearity. History is not linear and has never been. But I'm stopping
now before getting overly philosophic...

Regards

Markus Wanner


Re: PostgreSQL Developer meeting minutes up

From
Greg Stark
Date:
On Tue, Jun 2, 2009 at 4:02 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
>
>
> The Monotone folk call this "daggy fixes" and it seems a clean way to
> handle things.
>
> http://www.monotone.ca/wiki/DaggyFixes/

Is this like what git calls an octopus? I've been wondering what the
point of such things were.

Or maybe not. I thought an octopus was two patches with the same
parent -- ie, two patches that could independently be applied in any
order.

-- 
greg


Re: PostgreSQL Developer meeting minutes up

From
Marko Kreen
Date:
On 6/2/09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Aidan Van Dyk <aidan@highrise.ca> writes:
>  > * Markus Wanner <markus@bluegap.ch> [090602 10:23]:
>
> >> You consider it a mess, I consider it a better and more valid
>  >> representation of the mess that CVS is.
>
>  > So much better that it makes the history as useless as CVS... I think
>  > one of the reasons people are wanting tomove from CVS to git is that it
>  > makes things *better*...
>
>
> FWIW, the tool that I customarily use (cvs2cl) considers commits on
>  different branches to be "the same" if they have the same commit message
>  and occur sufficiently close together (within a few minutes).  My
>  committing habits have been designed around that behavior for years,
>  and I believe other PG committers have been doing likewise.
>
>  I would consider a git conversion to be less useful to me, not more,
>  if it insists on showing me such cases as separate commits --- and if
>  it then adds useless "merge" messages on top of that, I'd start to get
>  seriously annoyed.

They cannot be same commits in GIT as the resulting tree is different.
You could tie them with some sort of merge commits, but doubt the
result would be worth the noise.

Also I doubt there is tool grokking such commits anyway, the merge
discussion above was for full files with exact contents appearing
in several branches.

>  What we want here is a readable equivalent of the CVS history, not
>  necessarily something that is theoretically an exact equivalent.

I suggest setting the goal to be simple and clear representation
of CVS history that we can make sense later, instead of revising
CVS history to look like we used some better VCS system...

-- 
marko


Re: PostgreSQL Developer meeting minutes up

From
Marko Kreen
Date:
On 6/2/09, Markus Wanner <markus@bluegap.ch> wrote:
> [academic nitpicking]

Sorry, not going there.  Just look at the state of VCS systems
that have prioritized academic issues insead of practicality...
(arch/darcs/monotone/etc..)

> > So please turn the merge logic off.  If this cannot be turned off,
> > cvs2git is not usable for conversion.
> >
>
>  As far as I know, it cannot be turned off. Use parsecvs if you want to get
> silly side effects later on in history. ;-)

--no-cross-branch-commits seems sort of that direction?

And what silly side effects are you talking about?  I see only cvs2git
doing silly things...

(I'm talking about only in context of Postgres CVS repo, not in general.)

> > Seems it contains more complex logic to handle more complex CVS usage
> > cases, but seems like overkill for us if it creates a mess of history.
> >
>
>  You consider it a mess, I consider it a better and more valid
> representation of the mess that CVS is.

Note that merge is no file-level but tree level.  Also note we don't
use branches for feature developement but for major version maintenance.

So how can single file appearing in 2 branches means merge of 2 trees?
How can that be valid?

-- 
marko


Re: PostgreSQL Developer meeting minutes up

From
Robert Haas
Date:
On Tue, Jun 2, 2009 at 11:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Aidan Van Dyk <aidan@highrise.ca> writes:
>> * Markus Wanner <markus@bluegap.ch> [090602 10:23]:
>>> You consider it a mess, I consider it a better and more valid
>>> representation of the mess that CVS is.
>
>> So much better that it makes the history as useless as CVS... I think
>> one of the reasons people are wanting tomove from CVS to git is that it
>> makes things *better*...
>
> FWIW, the tool that I customarily use (cvs2cl) considers commits on
> different branches to be "the same" if they have the same commit message
> and occur sufficiently close together (within a few minutes).  My
> committing habits have been designed around that behavior for years,
> and I believe other PG committers have been doing likewise.

Interesting.  I was wondering why all your commit messages always show
up simultaneously for all the back branches.

> I would consider a git conversion to be less useful to me, not more,
> if it insists on showing me such cases as separate commits --- and if
> it then adds useless "merge" messages on top of that, I'd start to get
> seriously annoyed.

There's no help for them being separate commits, but I agree that
useless merge commits are a bad thing.  There are plenty of ways to
avoid that, though; I've been using git cherry-pick a lot recently,
and I think git rebase --onto also has some potential.

...Robert


Re: PostgreSQL Developer meeting minutes up

From
"Markus Wanner"
Date:
Hi,

Quoting "Tom Lane" <tgl@sss.pgh.pa.us>:
> FWIW, the tool that I customarily use (cvs2cl) considers commits on
> different branches to be "the same" if they have the same commit message
> and occur sufficiently close together (within a few minutes).  My
> committing habits have been designed around that behavior for years,
> and I believe other PG committers have been doing likewise.

Yeah, that's how I see things as well.

> I would consider a git conversion to be less useful to me, not more,
> if it insists on showing me such cases as separate commits --- and if
> it then adds useless "merge" messages on top of that, I'd start to get
> seriously annoyed.

Hm.. well, in git, there's no such thing as a commit that spans
multiple branches. So it's impossible to fulfill both of your wishes
here.

parsecvs creates multiple independent commits in such a case.

cvs2git creates a single commit and propagates this to the back
branches with merge commits (however, only if new files are added,
otherwise it does the same as parsecvs).

> What we want here is a readable equivalent of the CVS history, not
> necessarily something that is theoretically an exact equivalent.

Understood. However, readability depends on the user's habits. But
failing to merge due to a lacking conversion potentially hurts
everybody who wants to merge.

Having used merging (in combination with renaming) often enough, I'd
certainly be pretty annoyed if merges suddenly begin to bring up
spurious file duplicates.

Regards

Markus Wanner


Re: PostgreSQL Developer meeting minutes up

From
"Markus Wanner"
Date:
Hi,

Quoting "Marko Kreen" <markokr@gmail.com>:
> Sorry, not going there.  Just look at the state of VCS systems
> that have prioritized academic issues insead of practicality...
> (arch/darcs/monotone/etc..)

I already am there. And I don't want to go back, thanks. But my bias
for monotone certainly shines through, yes ;-)

> --no-cross-branch-commits seems sort of that direction?

Yes, that could lead to the same defect. Uhm.. thank you for pointing
that out, I'm not gonna try it, sorry.

> And what silly side effects are you talking about?

I'm talking about spurious file duplicates popping up after a rename
and a merge, see my example in this thread.

>>  You consider it a mess, I consider it a better and more valid
>> representation of the mess that CVS is.
>
> Note that merge is no file-level but tree level.

Depends on your point of view. Each file gets merged pretty
indivitually, but the result ends up in a single commit, yes.

> Also note we don't
> use branches for feature developement but for major version maintenance.

So? You think you are never going to merge?

> So how can single file appearing in 2 branches means merge of 2 trees?
> How can that be valid?

I'm not sure what you are questioning here.

I find it perfectly reasonable to build something on top of
REL8_3_STABLE and later on wanting to merge to REL8_4_STABLE. And I
don't want to manually merge my changes, just because of a rename in
8.4 and a bad decision during the migration to git.

(And no, I don't think any of the other git tools will help with this,
due to the academic-nitpick-reasons above).

Regards

Markus Wanner


Re: PostgreSQL Developer meeting minutes up

From
Marko Kreen
Date:
On 6/2/09, Markus Wanner <markus@bluegap.ch> wrote:
>  Quoting "Marko Kreen" <markokr@gmail.com>:
> > And what silly side effects are you talking about?
> >
>
>  I'm talking about spurious file duplicates popping up after a rename and a
> merge, see my example in this thread.

The example was not actual case from Postgres CVS history,
but hypotetical situation without checking if it already works
with GIT.

> > Also note we don't
> > use branches for feature developement but for major version maintenance.
> >
>
>  So? You think you are never going to merge?
>
>
> > So how can single file appearing in 2 branches means merge of 2 trees?
> > How can that be valid?
> >
>
>  I'm not sure what you are questioning here.
>
>  I find it perfectly reasonable to build something on top of REL8_3_STABLE
> and later on wanting to merge to REL8_4_STABLE. And I don't want to manually
> merge my changes, just because of a rename in 8.4 and a bad decision during
> the migration to git.
>
>  (And no, I don't think any of the other git tools will help with this, due
> to the academic-nitpick-reasons above).

Merging between branches with GIT is fine workflow in the future.

But we are currently discussing how to convert CVS history to GIT.
My point is that we should avoid fake merges, to avoid obfuscating
history.

-- 
marko


Re: PostgreSQL Developer meeting minutes up

From
Ron Mayer
Date:
Aidan Van Dyk wrote:
> * Markus Wanner <markus@bluegap.ch> [090602 10:23]:
>> As mentioned before, I'd personally favor *all* of the back-ports to  
>> actually be merges of some sort, because that's what they effectively  
>> are. However, that also bring up the question of how we are going to do 
>> back-patches in the future with git.
> 
> Well, if people get comfortable with it, I expect that "backports" don't
> happenen.. Bugs are fixed where they happen, and "merged" forward into
> all affected "later development" based on the bugged area.

I imagine the closest thing to existing practices would be that people
would to use "git-cherry-pick -x -n" to backport only the commits they
wanted from the current branch into the back branches.

AFAICT, this doesn't record a merge in the GIT history, but looks a lot
like the linear history from CVS - with the exception that the comment
added by "-x" explicitly refers to the exact commit from the main branch.




Re: PostgreSQL Developer meeting minutes up

From
"Markus Wanner"
Date:
Hi,

Quoting "Marko Kreen" <markokr@gmail.com>:
> The example was not actual case from Postgres CVS history,
> but hypotetical situation without checking if it already works
> with GIT.

Of course it is a simplified example, but it resembles what could
happen i.e. to the file doc/src/sgml/generate_history.pl, which got
added from a backported patch after forking off REL8_3_STABLE.

If you create separate commits during the conversion, rename that file
on the master branch and then - for whatever reason - try to merge the
two branches, you will end up having that file twice. That's what I'm
warning about. Changes on either or both sides of the merge make the
situation worse.

> Merging between branches with GIT is fine workflow in the future.

Do you consider the above scenario a fine merge?

> My point is that we should avoid fake merges, to avoid obfuscating
> history.

Understood. It looks like I'm pretty much the only one who cares more
about merge capability than nice looking history :-(

Attached is my current options file for cvs2git, it includes requested
changes by Alvaro and additional names and emails as given by Tom
(thanks again). A current conversion with cvs2git (and with the
merges) results in a repository with exactly 0 differences against any
branch or tag symbol compared to cvs checkout -kk.

Regards

Markus Wanner

Attachment

Re: PostgreSQL Developer meeting minutes up

From
Greg Stark
Date:
On Wed, Jun 3, 2009 at 12:10 PM, Markus Wanner <markus@bluegap.ch> wrote:
> If you create separate commits during the conversion, rename that file on
> the master branch

This is all completely irrelevant to the CVS import. I don't think
we've ever renamed files because CVS can't handle it cleanly.

It does sound to me like we really ought to have merge commits marking
the bug fixes in old releases as merged in the equivalent commits to
later branches based on Tom's commit messages.

That would make the git history match Tom's "same commit message"
implicit CVS history that cvs2pcl was giving him. I find git-log's
output including merge commits kind of strange and annoying myself but
having them at least gives us a chance to have a tool that understands
them output something like cvs2pcl. Throwing away that information
because we don't like the clutter in the tool output seems like a
short-sighted plan.

That said, the commit log message isn't being lost. We could always
import the history linearly and add the merge commits later if we
decide having them would help some tool implement cvs2pcl summaries.

-- 
greg


Re: PostgreSQL Developer meeting minutes up

From
Andres Freund
Date:
Hi,

On 06/03/2009 02:08 PM, Greg Stark wrote:
> On Wed, Jun 3, 2009 at 12:10 PM, Markus Wanner<markus@bluegap.ch>  wrote:
> That would make the git history match Tom's "same commit message"
> implicit CVS history that cvs2pcl was giving him. I find git-log's
> output including merge commits kind of strange and annoying myself but
> having them at least gives us a chance to have a tool that understands
> them output something like cvs2pcl.
"git log --no-merges" hides the actual merge commits if that is what you 
want.

Andres


Re: PostgreSQL Developer meeting minutes up

From
Greg Stark
Date:
On Wed, Jun 3, 2009 at 1:19 PM, Andres Freund <andres@anarazel.de> wrote:
> "git log --no-merges" hides the actual merge commits if that is what you
> want.

Ooh! Life seems so much sweeter now!

Given that we don't have to see them then I'm all for marking bug fix
patches which were applied to multiple branches as merges. That seems
like it would make it easier for tools like gitk or to show useful
information analogous to the cvs2pcl info.

Given that Tom's been intentionally marking the commits with identical
commit messages we ought to be able to find *all* of them and mark
them properly. That would be way better than only finding patches that
are absolutely identical.

I'm not sure whether we should mark the old branches getting merges
down or the new branches getting merged up. I suspect I'm missing
something but I don't see any reason one is better than the other.

-- 
greg


Re: PostgreSQL Developer meeting minutes up

From
Magnus Hagander
Date:
Greg Stark wrote:
> On Wed, Jun 3, 2009 at 1:19 PM, Andres Freund <andres@anarazel.de> wrote:
>> "git log --no-merges" hides the actual merge commits if that is what you
>> want.
> 
> Ooh! Life seems so much sweeter now!
> 
> Given that we don't have to see them then I'm all for marking bug fix
> patches which were applied to multiple branches as merges. That seems
> like it would make it easier for tools like gitk or to show useful
> information analogous to the cvs2pcl info.

Right, if it adds additional metadata that lets the tools do their magic
better, and it's still easy to filter out, I don't see a downside.


> Given that Tom's been intentionally marking the commits with identical
> commit messages we ought to be able to find *all* of them and mark
> them properly. That would be way better than only finding patches that
> are absolutely identical.

Just to be clear, not just Tom. All committers. I was told to do that
right after my first backpatch which *didn't* do it :-)

So it's an established project practice. That has other advantages as
well, of course..


> I'm not sure whether we should mark the old branches getting merges
> down or the new branches getting merged up. I suspect I'm missing
> something but I don't see any reason one is better than the other.

If you go from older to newer, the automatic merge algorithms have a
better chance of doing something smart since they can track previous
changes. At least I think that's how it works.

But I think for most of the changes it wouldn't make a huge difference,
though - manual merging would be needed anyway.

//Magnus


Re: PostgreSQL Developer meeting minutes up

From
Marko Kreen
Date:
On 6/3/09, Greg Stark <stark@enterprisedb.com> wrote:
> On Wed, Jun 3, 2009 at 1:19 PM, Andres Freund <andres@anarazel.de> wrote:
>  > "git log --no-merges" hides the actual merge commits if that is what you
>  > want.
>
>
> Ooh! Life seems so much sweeter now!
>
>  Given that we don't have to see them then I'm all for marking bug fix
>  patches which were applied to multiple branches as merges. That seems
>  like it would make it easier for tools like gitk or to show useful
>  information analogous to the cvs2pcl info.
>
>  Given that Tom's been intentionally marking the commits with identical
>  commit messages we ought to be able to find *all* of them and mark
>  them properly. That would be way better than only finding patches that
>  are absolutely identical.
>
>  I'm not sure whether we should mark the old branches getting merges
>  down or the new branches getting merged up. I suspect I'm missing
>  something but I don't see any reason one is better than the other.

Although "mark Tom's back-branch fixes as merges" makes much more
sense than "mark new files as merges", it is quite a step up from
"do tags match official releases".

It seems to require noticeable development effort to get a importer
to a level it can do it.  Will this be a requirement for import?
Or just a good thing to have?  Also how to check if all such merges
are sensible?

And note that such effort will affect only old imported history,
it will not make easier to handle back-branch fixes in the future...

Various scenarios with git cherry-pick and similar tools would still
result in duplicate commits, so we would need a git log post-processor
anyway if we want to somehow group them together for eg. weekly commit
summary.  And such post-processor would work on old history too.

Maybe that's better direction to work on, than to potentially risk in
messy history in GIT?

-- 
marko


Re: PostgreSQL Developer meeting minutes up

From
Robert Haas
Date:
On Wed, Jun 3, 2009 at 10:13 AM, Magnus Hagander <magnus@hagander.net> wrote:
> Greg Stark wrote:
>> On Wed, Jun 3, 2009 at 1:19 PM, Andres Freund <andres@anarazel.de> wrote:
>>> "git log --no-merges" hides the actual merge commits if that is what you
>>> want.
>>
>> Ooh! Life seems so much sweeter now!
>>
>> Given that we don't have to see them then I'm all for marking bug fix
>> patches which were applied to multiple branches as merges. That seems
>> like it would make it easier for tools like gitk or to show useful
>> information analogous to the cvs2pcl info.
>
> Right, if it adds additional metadata that lets the tools do their magic
> better, and it's still easy to filter out, I don't see a downside.
>
>> I'm not sure whether we should mark the old branches getting merges
>> down or the new branches getting merged up. I suspect I'm missing
>> something but I don't see any reason one is better than the other.
>
> If you go from older to newer, the automatic merge algorithms have a
> better chance of doing something smart since they can track previous
> changes. At least I think that's how it works.
>
> But I think for most of the changes it wouldn't make a huge difference,
> though - manual merging would be needed anyway.

In practice, isn't it more likely that you would develop the change on
the newest branch and then try to back-port it?  However you do the
import, you're going to want to do subsequent things the same way.

...Robert


Re: PostgreSQL Developer meeting minutes up

From
Marko Kreen
Date:
On 6/3/09, Magnus Hagander <magnus@hagander.net> wrote:
> Robert Haas wrote:
>  > On Wed, Jun 3, 2009 at 10:13 AM, Magnus Hagander <magnus@hagander.net> wrote:
>
> >>> I'm not sure whether we should mark the old branches getting merges
>  >>> down or the new branches getting merged up. I suspect I'm missing
>  >>> something but I don't see any reason one is better than the other.
>  >> If you go from older to newer, the automatic merge algorithms have a
>  >> better chance of doing something smart since they can track previous
>  >> changes. At least I think that's how it works.
>  >>
>  >> But I think for most of the changes it wouldn't make a huge difference,
>  >> though - manual merging would be needed anyway.
>  >
>  > In practice, isn't it more likely that you would develop the change on
>  > the newest branch and then try to back-port it?  However you do the
>  > import, you're going to want to do subsequent things the same way.
>
>
> That's definitely the order in which *I* work, and I think that's how
>  most others do it as well.

Thats true, but it's not representable in VCS, unless you use cherry-pick,
which is just UI around patch transport.  But considering separate
local trees (with can optionally contain local per-fix branches),
it is possible to separate the fix-developement from final representation.

-- 
marko


Re: PostgreSQL Developer meeting minutes up

From
Aidan Van Dyk
Date:
* Magnus Hagander <magnus@hagander.net> [090603 10:13]:
> 
> Right, if it adds additional metadata that lets the tools do their magic
> better, and it's still easy to filter out, I don't see a downside.

Note, that it could (and likely will) have a downside when you get to
doing real merge-based development... A "merge" means that *all* changes
in *both* parents have been combined in *this* commit.  And all merge
tools depend on this.  That's the directed part of the "DAG" in git.  So
if you want to be working in a way that the merge tools work, you
*don't* have master/HEAD merged into REL8_2_STABLE.  You can have
REL8_2_STABLE merged into master/head.

I'll concede that in GIT, it's flexible (some say arbitrary) enough that
you can *construct* the DAG otherwise, but then you've done something in
such a fashion that the DAG has no bearing on "real merging", and thus
you loose all the power of DAGs "merge tracking" when working on new
real merging....

a.

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: PostgreSQL Developer meeting minutes up

From
Aidan Van Dyk
Date:
* Marko Kreen <markokr@gmail.com> [090603 10:26]:
> Thats true, but it's not representable in VCS, unless you use cherry-pick,
> which is just UI around patch transport.  But considering separate
> local trees (with can optionally contain local per-fix branches),
> it is possible to separate the fix-developement from final representation.

I'll note that in git, cherry-pick is *more* than just "patch
transport".  I would more call it "patch commute".  It does actually
look at the history between the "pick"ed patch, and the current
tree, any merge/fork points, and the differences on each path that lead
to the changes in the current tree and the picked patch.

a.
-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: PostgreSQL Developer meeting minutes up

From
Magnus Hagander
Date:
Robert Haas wrote:
> On Wed, Jun 3, 2009 at 10:13 AM, Magnus Hagander <magnus@hagander.net> wrote:
>>> I'm not sure whether we should mark the old branches getting merges
>>> down or the new branches getting merged up. I suspect I'm missing
>>> something but I don't see any reason one is better than the other.
>> If you go from older to newer, the automatic merge algorithms have a
>> better chance of doing something smart since they can track previous
>> changes. At least I think that's how it works.
>>
>> But I think for most of the changes it wouldn't make a huge difference,
>> though - manual merging would be needed anyway.
> 
> In practice, isn't it more likely that you would develop the change on
> the newest branch and then try to back-port it?  However you do the
> import, you're going to want to do subsequent things the same way.

That's definitely the order in which *I* work, and I think that's how
most others do it as well.


-- Magnus HaganderSelf: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: PostgreSQL Developer meeting minutes up

From
Robert Haas
Date:
On Wed, Jun 3, 2009 at 10:20 AM, Marko Kreen <markokr@gmail.com> wrote:
> Various scenarios with git cherry-pick and similar tools would still
> result in duplicate commits, so we would need a git log post-processor
> anyway if we want to somehow group them together for eg. weekly commit
> summary.  And such post-processor would work on old history too.
>
> Maybe that's better direction to work on, than to potentially risk in
> messy history in GIT?

I think it is.  cherry-picking seems like a much better way of
back-patching than merging, so putting a lot of effort into making
merges "work" doesn't seem like a good expenditure of effort.

It seems pretty clear that searching through the histories of each
branch for duplicate commit messages and producing a unified report is
pretty straightforward if we assume that the commit messages are
byte-for-byte identical (or even modulo whitespace changes).  But I
wonder if it would make more sense to include some kind of metadata in
the commit message (or some other property of the commit?  does git
support that?) to make it not depend on that.  I suppose Tom et. al.
like the way they do it now, so maybe we should just stick with text
comparison, but it seems a bit awkward to me.

...Robert


Re: PostgreSQL Developer meeting minutes up

From
Marko Kreen
Date:
On 6/3/09, Aidan Van Dyk <aidan@highrise.ca> wrote:
> * Marko Kreen <markokr@gmail.com> [090603 10:26]:
>  > Thats true, but it's not representable in VCS, unless you use cherry-pick,
>  > which is just UI around patch transport.  But considering separate
>  > local trees (with can optionally contain local per-fix branches),
>  > it is possible to separate the fix-developement from final representation.
>
>
> I'll note that in git, cherry-pick is *more* than just "patch
>  transport".  I would more call it "patch commute".  It does actually
>  look at the history between the "pick"ed patch, and the current
>  tree, any merge/fork points, and the differences on each path that lead
>  to the changes in the current tree and the picked patch.

Well, thats good to know, but this also seems to mean it's rather bad
tool for back-patching, as you risk including random unwanted commits
too that happened in the HEAD meantime.  But also, it's very good
tool for forward-patching.

But my point was not about that - rather I was pointing out that
this "patch-commute" will result in duplicate commits, that have
no ties in DAG.

-- 
marko


Re: PostgreSQL Developer meeting minutes up

From
Aidan Van Dyk
Date:
* Marko Kreen <markokr@gmail.com> [090603 11:12]:
> Well, thats good to know, but this also seems to mean it's rather bad
> tool for back-patching, as you risk including random unwanted commits
> too that happened in the HEAD meantime.  But also, it's very good
> tool for forward-patching.

It doesn't "pull in commits" in the sense that darcs does... But rather,
its more like "the patch changes $XXX in $file, but that $file was
really $old_file at the common point between the 2 commits, and
$old_file is still $old file in the commit I'm trying to apply the patch
to".

It looks at the history of the changes to figure out why (or why
not) they apply, and see if they should still be applied to the same
file, or another file (in case of a rename/moved file in 1 branch), or
if the changed area has been moved drastically in the file in one
branch, and the change should be applied there instead.

> But my point was not about that - rather I was pointing out that
> this "patch-commute" will result in duplicate commits, that have
> no ties in DAG.

Yes.  That's a cherry-pick, if you want a merge, you merge ;-)  But
merge carries the baggage of expectation that *all* changes in both
parents have been combined.

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: PostgreSQL Developer meeting minutes up

From
Marko Kreen
Date:
On 6/3/09, Aidan Van Dyk <aidan@highrise.ca> wrote:
> * Marko Kreen <markokr@gmail.com> [090603 11:12]:
>  > Well, thats good to know, but this also seems to mean it's rather bad
>  > tool for back-patching, as you risk including random unwanted commits
>  > too that happened in the HEAD meantime.  But also, it's very good
>  > tool for forward-patching.
>
> It doesn't "pull in commits" in the sense that darcs does... But rather,
>  its more like "the patch changes $XXX in $file, but that $file was
>  really $old_file at the common point between the 2 commits, and
>  $old_file is still $old file in the commit I'm trying to apply the patch
>  to".
>
>  It looks at the history of the changes to figure out why (or why
>  not) they apply, and see if they should still be applied to the same
>  file, or another file (in case of a rename/moved file in 1 branch), or
>  if the changed area has been moved drastically in the file in one
>  branch, and the change should be applied there instead.

I'm not certain, but I remember using cherry pick and seeing
several commits in result.  This seems to be a point that needs
to be checked.

>  > But my point was not about that - rather I was pointing out that
>  > this "patch-commute" will result in duplicate commits, that have
>  > no ties in DAG.
>
>
> Yes.  That's a cherry-pick, if you want a merge, you merge ;-)  But
>  merge carries the baggage of expectation that *all* changes in both
>  parents have been combined.

But in forward-merge case it's true.

-- 
marko


Re: PostgreSQL Developer meeting minutes up

From
Aidan Van Dyk
Date:
* Marko Kreen <markokr@gmail.com> [090603 11:28]:
> I'm not certain, but I remember using cherry pick and seeing
> several commits in result.  This seems to be a point that needs
> to be checked.

I'm not sure what you're recalling, but git cherry-pick takes a single
commit, and applies it as a single commit (or, with -n, doesn't actually
commit it).  That's what it does... There are various *other* tools (like
rebase, am, cherry, etc) which operate on "sets" of commits.

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: PostgreSQL Developer meeting minutes up

From
Ron Mayer
Date:
Robert Haas wrote:
> But I
> wonder if it would make more sense to include some kind of metadata in
> the commit message (or some other property of the commit?  does git
> support that?) to make it not depend on that.

From elsewhere in this thread[1], 'The "git cherry-pick" ... "-x" flag adds
a note to the commit comment describing the relationship between the commits.'

If the commit on the main branch had this message
=================  added a line on the main branch
=================
The commit on the cherry picked branch will have this comment
=================  added a line on the main branch   (cherry picked from commit
189ef03b4f4ed5078328f7965c7bfecce318490d)
=================
where the big hex string identifies the comment on the other branch.


[1] http://archives.postgresql.org/pgsql-hackers/2009-06/msg00191.php




Re: PostgreSQL Developer meeting minutes up

From
"Markus Wanner"
Date:
Hi,

Quoting "Greg Stark" <stark@enterprisedb.com>:
> This is all completely irrelevant to the CVS import.

To the CVS import it is, yes. After all, CVS has no notion of renaming
files. But my example is about renaming with git *after* the
conversion. Git *does* support renaming (to some extent). However, it
fails as explained if you feed it with "corrupt" data (the corruption
being the missing link between the two added files - after a rename,
git simply has no chance of knowing it should be the same file).

> I don't think
> we've ever renamed files because CVS can't handle it cleanly.

Yes, that applies to the past. But I think we *are* going to rename
files *after* the switch, because git *can* handle it cleanly - given
a correct import.

If that defect would only affect historic information, I'd not be half
as pestering as I am. But it's such delayed effects which might
surprise you years after the cause, which make me nervous.

> It does sound to me like we really ought to have merge commits marking
> the bug fixes in old releases as merged in the equivalent commits to
> later branches based on Tom's commit messages.

Now, I don't know how you got to that conclusion, but I absolutely agree ;-)

Regards

Markus Wanner



Re: PostgreSQL Developer meeting minutes up

From
Greg Stark
Date:
On 4 Jun 2009, at 09:11, "Markus Wanner" <markus@bluegap.ch> wrote:

> Hi,
>
> Quoting "Greg Stark" <stark@enterprisedb.com>:
>> This is all completely irrelevant to the CVS import.
>
> To the CVS import it is, yes. After all, CVS has no notion of  
> renaming files. But my example is about renaming with git *after*  
> the conversion. Git *does* support renaming (to some extent).  
> However, it fails as explained if you feed it with "corrupt" data  
> (the corruption being the missing link between the two added files -  
> after a rename, git simply has no chance of knowing it should be the  
> same file).
>


Hmm. I see. I'm not sure we've ever added files to back branches  
either. I'm less sure of that though.



>> I don't think
>> we've ever renamed files because CVS can't handle it cleanly.
>
> Yes, that applies to the past. But I think we *are* going to rename  
> files *after* the switch, because git *can* handle it cleanly -  
> given a correct import.
>
> If that defect would only affect historic information, I'd not be  
> half as pestering as I am. But it's such delayed effects which might  
> surprise you years after the cause, which make me nervous.
>
>> It does sound to me like we really ought to have merge commits  
>> marking
>> the bug fixes in old releases as merged in the equivalent commits to
>> later branches based on Tom's commit messages.
>
> Now, I don't know how you got to that conclusion, but I absolutely  
> agree ;-)
>
> Regards
>
> Markus Wanner
>


Re: PostgreSQL Developer meeting minutes up

From
"Markus Wanner"
Date:
Hi,

Quoting "Greg Stark" <greg.stark@enterprisedb.com>:
> Hmm. I see. I'm not sure we've ever added files to back branches  
> either. I'm less sure of that though.

We did from time to time. Every merge commit in my current conversion  
contains at least one such file that got added as part of a back  
patch. The perl file mentioned in the example upstream is one of them.

Regards

Markus Wanner




Re: PostgreSQL Developer meeting minutes up

From
"Markus Wanner"
Date:
Hi,

Quoting "Tom Lane" <tgl@sss.pgh.pa.us>:
> BTW, Markus: you do realize "thomas" is not me but Tom Lockhart?

Uh.. thanks, that name has fallen through the cracks, before. I've  
added it now, it will be included in the next sample conversion.

Regards

Markus Wanner




Re: PostgreSQL Developer meeting minutes up

From
"Markus Wanner"
Date:
Hi,

Quoting "Marko Kreen" <markokr@gmail.com>:
>>  I'm not sure whether we should mark the old branches getting merges
>>  down or the new branches getting merged up. I suspect I'm missing
>>  something but I don't see any reason one is better than the other.

As pointed out by others, it doesn't make sense to merge (all commits
since the last merge) from HEAD to the back branches. You'd have to
cherry-pick only the commits which actually have to get back patched.

The "new branches getting merged up" could work. That is, applying the
fix to the oldest back-branch which requires the fix first and then
merge it to all newer ones, including HEAD. However, that would
require some rethinking: instead of creating bugfix-patches for HEAD,
then manually adjust patches for back-branches and then group
committing, you'd have to create a bugfix-patch for the oldest branch
first, commit that and then merge that to the newer branches.

I consider merging a cleaner and simpler operation than
cherry-picking, because merging allows the VCS to keep track of what
needs to be propagated, while with cherry-picking, you'd have to keep
track of that manually (or with the help of other tools).

An example for that is the very same unability to properly track
renames when cherry-picking, just like what I explained for the CVS
conversion.

> It seems to require noticeable development effort to get a importer
> to a level it can do it.  Will this be a requirement for import?
> Or just a good thing to have?  Also how to check if all such merges
> are sensible?

If that's how you'd like to have the CVS repository represented in git
(which I'd support as well), I'd give it a try. With all of the work
I've done for mtn cvs_import I certainly have the necessary experience
in CVS conversion and with the cvs2svn algorithm itself.

> And note that such effort will affect only old imported history,
> it will not make easier to handle back-branch fixes in the future...

Hm.. depends, if you want to merge from older branches to newer ones,
instead of cherry-picking, it would certainly help to get the history
clean.

> Various scenarios with git cherry-pick and similar tools would still
> result in duplicate commits, so we would need a git log post-processor
> anyway if we want to somehow group them together for eg. weekly commit
> summary.  And such post-processor would work on old history too.

I think we should decide on either using merges or using duplicate
commits we try to link somehow. But then, we should IMO use that
scheme for the conversion as well as later on, so as not to get a
messy history, as you put it.

Regards

Markus Wanner


Re: PostgreSQL Developer meeting minutes up

From
Ron Mayer
Date:
Markus Wanner wrote:
> The "new branches getting merged up" could work. That is, applying the
> fix to the oldest back-branch which requires the fix first and then
> merge it to all newer ones, including HEAD. However, that would require
> some rethinking: instead of creating bugfix-patches for HEAD, then
> manually adjust patches for back-branches and then group committing,
> you'd have to create a bugfix-patch for the oldest branch first, commit
> that and then merge that to the newer branches.

That sounds a bit dangerous too, since I imagine there are some
changes in the old release branches you wouldn't want merged into
the newest releases (say, code affecting sections that got redesigned).

Seems you'd want to do is create a new branch as close to the point
where the bug was introduced - and then merge that forward into each
of the branches.  This concept was mentioned in a page linked earlier
in the thread[1] and seems like the way monotone recommends people
use their system[2].   See that page for more reasons why they think
it's good.

[1]http://archives.postgresql.org/pgsql-hackers/2009-06/msg00153.php
[2]http://www.monotone.ca/wiki/DaggyFixes/


Re: PostgreSQL Developer meeting minutes up

From
"Markus Wanner"
Date:
Hi,

Quoting "Ron Mayer" <rm_pg@cheapcomplexdevices.com>:
> Seems you'd want to do is create a new branch as close to the point
> where the bug was introduced - and then merge that forward into each
> of the branches.

Thank you for pointing this out. As a fan of monotone I certainly know
and like that way. However, for people who are used to CVS, lots of
branching and merging quickly sound dangerous and messy. So I'd like
to keep things as simple as possible while still keeping possibilities
open for the future.

Note that a requirement for daggy fixes is that "the bug is fixed
close to the point where it was introduced". So fixing it on the
oldest stable branch that introduced a bug instead of fixing it on
HEAD and then back-porting would certainly be a step into the right
direction. And I think it would be sufficient in most cases. If not,
we can still enhance that and used daggy fixes later on (as long as we
have a conversion that allows merging, that is).

Regards

Markus Wanner


Re: PostgreSQL Developer meeting minutes up

From
Tom Lane
Date:
"Markus Wanner" <markus@bluegap.ch> writes:
> Note that a requirement for daggy fixes is that "the bug is fixed  
> close to the point where it was introduced". So fixing it on the  
> oldest stable branch that introduced a bug instead of fixing it on  
> HEAD and then back-porting would certainly be a step into the right  
> direction.

I think it's already been made crystal clear that the people who
actually do this work don't do it that way, and are uninterested in
allowing their tools to force them to do it that way.  Patching from
HEAD back works better for us for a number of reasons, the main one
being that HEAD is the version of the code that's most "swapped into"
our awareness.

However, so long as we can have a separate working copy per branch,
I see no problem with preparing all the versions of a patch and then
committing them back-to-front.  What I'm not clear about is the
mechanics for doing that.  Would someone explain exactly what the
steps should be to produce the nicest-looking git history?
        regards, tom lane


Re: PostgreSQL Developer meeting minutes up

From
Andrew Dunstan
Date:

Tom Lane wrote:
> "Markus Wanner" <markus@bluegap.ch> writes:
>   
>> Note that a requirement for daggy fixes is that "the bug is fixed  
>> close to the point where it was introduced". So fixing it on the  
>> oldest stable branch that introduced a bug instead of fixing it on  
>> HEAD and then back-porting would certainly be a step into the right  
>> direction.
>>     
>
> I think it's already been made crystal clear that the people who
> actually do this work don't do it that way, and are uninterested in
> allowing their tools to force them to do it that way.  Patching from
> HEAD back works better for us for a number of reasons, the main one
> being that HEAD is the version of the code that's most "swapped into"
> our awareness.
>   


Yeah, a requirement to work from the back branch forward is quite 
unacceptable IMNSHO. It's also quite unreasonable. The tool is there to 
help, not to force an unnatural work pattern on us.


cheers

andrew


Re: PostgreSQL Developer meeting minutes up

From
Robert Haas
Date:
On Fri, Jun 5, 2009 at 9:38 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> "Markus Wanner" <markus@bluegap.ch> writes:
>> Note that a requirement for daggy fixes is that "the bug is fixed
>> close to the point where it was introduced". So fixing it on the
>> oldest stable branch that introduced a bug instead of fixing it on
>> HEAD and then back-porting would certainly be a step into the right
>> direction.
>
> I think it's already been made crystal clear that the people who
> actually do this work don't do it that way, and are uninterested in
> allowing their tools to force them to do it that way.  Patching from
> HEAD back works better for us for a number of reasons, the main one
> being that HEAD is the version of the code that's most "swapped into"
> our awareness.
>
> However, so long as we can have a separate working copy per branch,
> I see no problem with preparing all the versions of a patch and then
> committing them back-to-front.  What I'm not clear about is the
> mechanics for doing that.  Would someone explain exactly what the
> steps should be to produce the nicest-looking git history?

I'm sure someone is going to come in here and again recommend merging,
but I'm going to again recommend not merging.  Cherry-picking is the
way to go here.  Or just commit to each branch completely separately
with the same commit message; cherry-pick at least IMO is just a
convenience to help you attempt to apply the patch to a different
branch.

The way you're using commit messages to construct the release notes
really puts a limits on what the history has to look like.  I think it
would be good to find a better way to generate release notes that
isn't quite so dependent on having a very tight history, but even if
we do that I think in this particular situation cherry-picking is
going to be less work for the committers than any of the other options
that have been proposed.

...Robert


Re: PostgreSQL Developer meeting minutes up

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I'm sure someone is going to come in here and again recommend merging,
> but I'm going to again recommend not merging.  Cherry-picking is the
> way to go here.  Or just commit to each branch completely separately
> with the same commit message; cherry-pick at least IMO is just a
> convenience to help you attempt to apply the patch to a different
> branch.

"Commit to each branch separately" is surely the closest analog to what
we have done historically.  What I'm trying to understand is whether
there's an easy variant on that that'd expose the related-ness of the
patch versions in a way git understands, hopefully giving us more
ability to leverage git's capabilities in future.

However, given that we don't do any real development on the back
branches, it might be that trying to be smart about this is a waste of
time anyway.  Surely only the HEAD version of the patch is going to be
something that other developers care about merging with.
        regards, tom lane


Re: PostgreSQL Developer meeting minutes up

From
Robert Haas
Date:
On Fri, Jun 5, 2009 at 11:37 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> However, given that we don't do any real development on the back
> branches, it might be that trying to be smart about this is a waste of
> time anyway.  Surely only the HEAD version of the patch is going to be
> something that other developers care about merging with.

I think that's about right.  I think there would be some benefit in
developning better tools - release notes seem to be the main issue -
so that, for example, if I develop a complex feature and you think my
code is great (ok, now I'm dreaming), you could actually merge my
commits rather than flattening them.  The EXPLAIN stuff I'm working on
right now is a good example where it's a lot easier to review the
changes piece by piece rather than as a big unit, but I know you won't
want to commit it that way because (1) with CVS, it would be a lot
more work to do that, and (2) it would suck a lot of extra commits
into the data you use to generate release notes, thereby making that
process more complex.

I'm actually going to the trouble of trying to make sure that each of
my commits does one and only one thing that can be separately checked,
tested, and either accepted (hopefully) or rejected (hopefully not).
Hopefully, that will still help with reviewing, but then if you commit
it, it'll probably go in as one stomping commit that changes the
world, or at most as two or three commits that are all still pretty
big.  There are certainly cases where big stomping commits are good (I
have them in my own projects, too, and branches with long histories of
little dumb commits regularly get squashed and rebased before merging)
but I think it would be nice to have other options.

(As a side benefit, if one of my little micro-commits turns out to
have a bug, you can easily revert *just that commit*, without having
to manually sort out exactly which pieces related to that change.)

...Robert


Re: PostgreSQL Developer meeting minutes up

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> [ about micro commits ]
> (As a side benefit, if one of my little micro-commits turns out to
> have a bug, you can easily revert *just that commit*, without having
> to manually sort out exactly which pieces related to that change.)

I don't actually have a lot of faith in such an approach.  My experience
is that bugs arise from unforeseen interactions of changes, and that
"backing out just one" isn't a useful thing to do, even if none of the
later parts of the patch directly depend on it.

So, yeah, presenting a patch as a series of edits can be useful for
review purposes, but I'm not at all excited about cluttering the
long-term project history with a zillion micro-commits.  One of the
things I find most annoying about reviewing the current commit history
is that Bruce has taken a micro-commit approach to managing the TODO
list --- I was seldom so happy as the day that disappeared from CVS,
because of the ensuing reduction in noise level.
        regards, tom lane


Re: PostgreSQL Developer meeting minutes up

From
Robert Haas
Date:
On Fri, Jun 5, 2009 at 12:15 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> [ about micro commits ]
>> (As a side benefit, if one of my little micro-commits turns out to
>> have a bug, you can easily revert *just that commit*, without having
>> to manually sort out exactly which pieces related to that change.)
>
> I don't actually have a lot of faith in such an approach.  My experience
> is that bugs arise from unforeseen interactions of changes, and that
> "backing out just one" isn't a useful thing to do, even if none of the
> later parts of the patch directly depend on it.
>
> So, yeah, presenting a patch as a series of edits can be useful for
> review purposes, but I'm not at all excited about cluttering the
> long-term project history with a zillion micro-commits.  One of the
> things I find most annoying about reviewing the current commit history
> is that Bruce has taken a micro-commit approach to managing the TODO
> list --- I was seldom so happy as the day that disappeared from CVS,
> because of the ensuing reduction in noise level.

I've never even noticed that noise, even when reviewing older history.The power of "git log" to get you exactly the
commitsyou care about 
is not to be underestimated.

With regard to micro-commits, I don't have hugely strong feelings on
the issue.  I like them in certain situations, and I think that git
makes it feasible to use them that way if you want to; but if you
don't want to, I don't think that's a disaster either.

...Robert


Re: PostgreSQL Developer meeting minutes up

From
Bruce Momjian
Date:
Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > [ about micro commits ]
> > (As a side benefit, if one of my little micro-commits turns out to
> > have a bug, you can easily revert *just that commit*, without having
> > to manually sort out exactly which pieces related to that change.)
> 
> I don't actually have a lot of faith in such an approach.  My experience
> is that bugs arise from unforeseen interactions of changes, and that
> "backing out just one" isn't a useful thing to do, even if none of the
> later parts of the patch directly depend on it.
> 
> So, yeah, presenting a patch as a series of edits can be useful for
> review purposes, but I'm not at all excited about cluttering the
> long-term project history with a zillion micro-commits.  One of the
> things I find most annoying about reviewing the current commit history
> is that Bruce has taken a micro-commit approach to managing the TODO
> list --- I was seldom so happy as the day that disappeared from CVS,
> because of the ensuing reduction in noise level.

Yea, that was a problem that is now fixed.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: PostgreSQL Developer meeting minutes up

From
Aidan Van Dyk
Date:
* Andrew Dunstan <andrew@dunslane.net> [090605 13:55]:

> Yeah, a requirement to work from the back branch forward is quite  
> unacceptable IMNSHO. It's also quite unreasonable. The tool is there to  
> help, not to force an unnatural work pattern on us.

Again, just to make it clear, git isn't going to *force* anyone to
drastically change their workflow.  For people who want to keep a
separate "working directory" per branch, and just work on them as
independently as they do with CVS, *nothing* is going to have to change,
except the possible "git push" step required to actually publish your
committed changes...  But, if you want, you could just also have a
post-commit hook that will do that push for you too, and you just don't
commit until you're sure (a-la-cvs-style):
cvs update === git stash save && git pull && git stash applycvs commit === git commit -a && git push

The "git stash" is because git won't pull/merge remote work into a
"dirty" workdir... This is the classic conflict CVS mess that git avoids,
and then allows you to use all it's powerful merge machinery to "merge"
any of your stashed local changes back into what you've just pulled.

But....

I have a feeling that as people (specifically the comitters) get slowly
introduced and exposed to some of the more advanced things git lets you
do, and as you get comfortable with using it, people will *want* to
start altering how they do thing, simply because they start to find out
that git really allows them to do what they really want, rather than
what they have "thought they want" because they've been so brainwashed
by CVS...

;-)


-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: PostgreSQL Developer meeting minutes up

From
Andrew Dunstan
Date:

Aidan Van Dyk wrote:
> * Andrew Dunstan <andrew@dunslane.net> [090605 13:55]:
>
>   
>> Yeah, a requirement to work from the back branch forward is quite  
>> unacceptable IMNSHO. It's also quite unreasonable. The tool is there to  
>> help, not to force an unnatural work pattern on us.
>>     
>
> Again, just to make it clear, git isn't going to *force* anyone to
> drastically change their workflow.  

My reaction was against someone saying in effect "don't work that way, 
work this way".

So make your argument to that person ;-)

[...]
> I have a feeling that as people (specifically the comitters) get slowly
> introduced and exposed to some of the more advanced things git lets you
> do, and as you get comfortable with using it, people will *want* to
> start altering how they do thing, simply because they start to find out
> that git really allows them to do what they really want, rather than
> what they have "thought they want" because they've been so brainwashed
> by CVS...
>
>
>   


The whole point is that we want something better *that suits our work 
patterns*. Almost all the backpatching that gets done is by the 
committers. So we have a bunch of concerns that are not relevant to that 
vast majority of developers. In particular, it would be nice to be able 
to make a bunch of changes on different branches and then commit it all 
in one hit. If that's possible, then well and good. If it's not, that's 
a pity.


cheers

andrew


Re: PostgreSQL Developer meeting minutes up

From
Aidan Van Dyk
Date:
* Andrew Dunstan <andrew@dunslane.net> [090605 14:41]:

> The whole point is that we want something better *that suits our work  
> patterns*. Almost all the backpatching that gets done is by the  
> committers. So we have a bunch of concerns that are not relevant to that  
> vast majority of developers. In particular, it would be nice to be able  
> to make a bunch of changes on different branches and then commit it all  
> in one hit. If that's possible, then well and good. If it's not, that's  
> a pity.

My only concern is that I am seeing 2 requirements emerge:
1) Everything has to work as it currently does with CVS
2) We want better information about how patches relate for possible  future stuff

Unfortunately, those 2 requirements are conflicting...  If you (not
anyone personally, but the more general "PostgreSQL committer") want the
repository to properly track the "fixes" and show their relationship,
and extra through all the branches than you really do want the
"branch-to-fix" and merge the fix forward into all your STABLE/master
branches, like the "daggy" type thing mentioned elsewhere...  But
notice, that is *very* different from the current work patterns based on
the CVS model where everything is completely independent (save the
commit message), and it's a huge change to the way developers work.

If you want to stay with the current CVS style, then you aren't going to
get any closer than "commit messages matching" (or possibly a reference
to another commit as an extra line) that we currently have with CVS.

My suggestion is to keep it simple.  Just work independently, like you
currently do.  You don't want every committer to have to completely
learn the advanced features of a new tool just to use it...  You can use
it as you use the less feature-full tool as you learn all the
features...

But as people start to use the new tool, and start to use it's more
advanced features, then it's natural that their results will start to be
reflected the main repository.

But insisting that people currently comfortable and proficient in the
current work patterns *have* to learn completely new ones for a
"flag-day" type switch and start using them immediately is going to:* Piss them off* Create great ill-will against the
tool
And neither of those will be the fault of the tool itself, but of the
way a "new process" was forced in conjunction with a new tool...

I don't want to see the PG project trying to *force* a radical change in
the way the development/branches currently work at the same time as a
switch to git.  Replace the tool, and allow the current processes and
work-flows to gradually improve.  The process and work-flow improvements
will be an iterative and collaborative process, just like the actual
code improvements, where huge radical patches are generally frowned
upon.

I've used git for a long time, on many different projects.  I do know
how radically it *can* change the process, and how much more efficient
and "natural" the improved processes can be.  But the change is not an
overnight change.  And it's not going to happen unless the people
needing to change *see* it's benefits.  And that's going to take time
and experience with the new tool...

Anyways, I said previously that I was over with this thread, but now I
mean it ;-) If someone want specific git information or help, I'm
available.

a.

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: PostgreSQL Developer meeting minutes up

From
Markus Wanner
Date:
Hi,

Tom Lane wrote:
> I think it's already been made crystal clear that the people who
> actually do this work don't do it that way, and are uninterested in
> allowing their tools to force them to do it that way.

That's well understood.

> Patching from
> HEAD back works better for us for a number of reasons, the main one
> being that HEAD is the version of the code that's most "swapped into"
> our awareness.

Committing on the oldest back-branch first doesn't necessarily mean
having to develop the patch there.

> However, so long as we can have a separate working copy per branch,
> I see no problem with preparing all the versions of a patch and then
> committing them back-to-front.

That's what I think as well.

However, I bet git could help a lot with creating all the versions of a
patch in the first place. You don't *need* to use that feature, but
preserving the option could help.

> What I'm not clear about is the
> mechanics for doing that.

If you create each of the patches individually, there's not much magic
required from git. It should be trivial to commit those as merges.

> Would someone explain exactly what the
> steps should be to produce the nicest-looking git history?

I fear the cherry-picking approach creates the "nicest-looking" history
(especially to the CVS trained eye).

Regards

Markus Wanner


Re: PostgreSQL Developer meeting minutes up

From
Markus Wanner
Date:
Hi,

Andrew Dunstan wrote:
> Yeah, a requirement to work from the back branch forward is quite
> unacceptable IMNSHO. It's also quite unreasonable.

The monotone page about daggy fixes does quite a good job in explaining
why it is helpful. I think it's how to make best use of these tools. And
it's obviously not the same as what worked well in practice with CVS.
Out of interest, and not necessarily related to Postgres: why do you
think it's unreasonable? Fixing the problem where it was introduced
sounds like the most reasonable place to fix it, IMO.

Regards

Markus Wanner


Re: PostgreSQL Developer meeting minutes up

From
Tom Lane
Date:
Markus Wanner <markus@bluegap.ch> writes:
> Out of interest, and not necessarily related to Postgres: why do you
> think it's unreasonable? Fixing the problem where it was introduced
> sounds like the most reasonable place to fix it, IMO.

There are a number of possible reasons, but here are a few that hold for me:

* I always prefer to isolate a bug in HEAD if possible.  It's the
version of the code that's most familiar at the moment, and there are
often new features available that make it easier to test a problem.
So that generally leads to formulating the fix in terms of the HEAD code
first.  After that you start to think about whether (some form of) the
bug exists in back branches and how to fix those branches.

* Experience has shown that later branches tend to have more places
affected by an issue than older ones; eg you might need to touch four
places to fix a bug now, but only three of those places exist in the
older branches.  ISTM you'd be far more likely to miss fixing the
fourth place if you do your initial investigation and fixing/testing
in the oldest affected branch.

* We want HEAD to have the cleanest, most maintainable version of the
fix.  It's not infrequently the case that the most natural way of fixing
a problem varies across branches --- for instance, there might be a
helpful subroutine available in later branches.  If you design the fix
in terms of what works in the oldest branch that has the problem,
you're more likely to come up with something that's suboptimal for later
branches.  For instance in the helpful-subroutine case, I'd be more
likely to decide to back-port the subroutine along with the fix if
I work from HEAD back than if I try to work the other way.

* We are often willing to adopt a fairly invasive fix for HEAD, if
that's what's needed to have a clean maintainable solution, and then
look for a less invasive but klugy solution for the back branches.
Approaching it the other way around would strongly encourage use of
the kluge solution as a permanent fix.


So there are a lot of good reasons to work backwards in patching.
I don't believe that these would be outweighed by some advantage
in the mechanics of applying an unchanging patch to multiple
branches (especially since AFAICT the mechanical advantage would
be pretty darn minimal anyhow).
        regards, tom lane


Re: PostgreSQL Developer meeting minutes up

From
Andrew Dunstan
Date:

Markus Wanner wrote:
> Hi,
>
> Andrew Dunstan wrote:
>   
>> Yeah, a requirement to work from the back branch forward is quite
>> unacceptable IMNSHO. It's also quite unreasonable.
>>     
>
> The monotone page about daggy fixes does quite a good job in explaining
> why it is helpful. I think it's how to make best use of these tools. And
> it's obviously not the same as what worked well in practice with CVS.
> Out of interest, and not necessarily related to Postgres: why do you
> think it's unreasonable? Fixing the problem where it was introduced
> sounds like the most reasonable place to fix it, IMO.
>
>
>   

Half the trouble with this discussion is that it has not been related 
enough to how the Postgres project actually works IMNSHO.

One fact to keep in mind is that, unlike most other FOSS projects, we 
keep quite a large number of branches live. If we don't remove one (and 
so far there is no great reason to that I know of) that number will be 
seven when we release 8.4. There is a huge benefit from this to the user 
community. It means that they can deploy Postgres with confidence that 
they will not have to upgrade for quite a few years. In the corporate 
world, especially, that is a major issue. I occasionally have clients 
running 7.4 or even older versions. Anyway, the large number of branches 
alone means that our patterns are unlikely to match those of other 
projects.

The question we often face in backpatching is not "where did it first 
occur?" but "how far back should we patch it?". Problems are almost 
always discovered near the top of the version list, overwhelmingly on 
the HEAD or most recent stable branches. So the way we work is not to 
try to develop a fix where the problem first occurred (which might not 
even be on a supported branch at all) but as high up the list as the 
problem goes (usually HEAD) and then work out how far down the list to 
apply the fix. And the notion that a fix of any complexity at all is 
going to be simply applicable across six or seven branches simply defies 
our experience. It almost never does. Frequently it won't apply cleanly 
from *any* one branch to another. Even fairly trivial patches can suffer 
from this: the pretty small plperl fixes I applied yesterday and the day 
before, required adjustment going from one branch to the previous one in 
about three out of five back branch cases. Sometimes these adjustments 
are small, sometimes they are quite large. So the idea that we can just 
create a fix on say, the 7.4 branch, and then just merge it forward 
nicely, is just fanciful in most cases, as well as being contrary to our 
methods of work.

Most of this stuff is almost invisible to most of the community. But 
people like Tom work with it every day. And we want to keep Tom 
productive, right? ;-)

cheers

andrew


Re: PostgreSQL Developer meeting minutes up

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> [ most of a good summary omitted ]
> ... Even fairly trivial patches can suffer 
> from this: the pretty small plperl fixes I applied yesterday and the day 
> before, required adjustment going from one branch to the previous one in 
> about three out of five back branch cases. Sometimes these adjustments 
> are small, sometimes they are quite large. So the idea that we can just 
> create a fix on say, the 7.4 branch, and then just merge it forward 
> nicely, is just fanciful in most cases, as well as being contrary to our 
> methods of work.

I have heard it claimed that git is more intelligent than plain
diff/patch and could successfully merge patches in cases that currently
require manual adjustment of the sort Andrew describes.  If that's
really true to any significant extent, then it could represent a benefit
large enough to persuade us to alter work flows (at least for simple
patches that don't require significant rethinking across branches).
However, I have yet to see any actual *evidence* in support of this
claim.  How robust is git about dealing with whitespace changes,
nearby variable renamings, and such?

Andrew's plperl patches would be an excellent small test case.  Anybody
want to try them against the experimental git repository and see if git
does any better than plain patch?
        regards, tom lane


Re: PostgreSQL Developer meeting minutes up

From
Mark Mielke
Date:
Tom Lane wrote:
> I have heard it claimed that git is more intelligent than plain
> diff/patch and could successfully merge patches in cases that currently
> require manual adjustment of the sort Andrew describes.  If that's
> really true to any significant extent, then it could represent a benefit
> large enough to persuade us to alter work flows (at least for simple
> patches that don't require significant rethinking across branches).
> However, I have yet to see any actual *evidence* in support of this
> claim.  How robust is git about dealing with whitespace changes,
> nearby variable renamings, and such?
>
> Andrew's plperl patches would be an excellent small test case.  Anybody
> want to try them against the experimental git repository and see if git
> does any better than plain patch

Any revision control system should be able to do better than diff/patch 
as these systems have more information available to them. Normal GIT 
uses the relatively common 3-way merge based upon the most recent common 
ancestor algorithm. Assuming there is a most recent common ancestor that 
isn't "file creation", it will have a better chance of doing the right 
thing.

Systems such as ClearCase have had these capabilities for a long time. 
The difference with distributed version control systems is that they 
absolutely must work well, as every user has their own repository, and 
every repository represents a branch, therefore each user of the system 
is working on a different branch. The need for reliable merges goes up 
under a distributed version control system.

Not to say GIT is truly best-in-class here, but it definitely has 
motivation to be and benefit of being better than diff/patch.

These sorts of tools usually work with another tool such as kdiff3 to 
allow for only the conflicts the be resolved. If you set it up properly, 
you can have the automatic merges completely successful, and kdiff3 or 
similar can present you a graphical interface that allow you to identify 
and resolve the conflicts that require help. I've used these sorts of 
tools long enough to completely take them for granted now, and it feels 
painful to go back to anything more primitive.

Cheers,
mark

-- 
Mark Mielke <mark@mielke.cc>



Re: PostgreSQL Developer meeting minutes up

From
Tom Lane
Date:
Mark Mielke <mark@mark.mielke.cc> writes:
> Tom Lane wrote:
>> I have heard it claimed that git is more intelligent than plain
>> diff/patch and could successfully merge patches in cases that currently
>> require manual adjustment of the sort Andrew describes.
>> ...
>> However, I have yet to see any actual *evidence* in support of this
>> claim.

> Any revision control system should be able to do better than diff/patch 
> as these systems have more information available to them. Normal GIT 
> uses the relatively common 3-way merge based upon the most recent common 
> ancestor algorithm. Assuming there is a most recent common ancestor that 
> isn't "file creation", it will have a better chance of doing the right 
> thing.

And I still haven't seen any actual evidence.  Could we have fewer
undocumented assertions and more experimental evidence?  Take Andrew's
plperl patches and see if git does any better with them than plain patch
does.  (If it's not successful with that patch, it's pointless to try it
on any bigger cases, I fear.)
        regards, tom lane


Re: PostgreSQL Developer meeting minutes up

From
Greg Stark
Date:
On Fri, Jun 5, 2009 at 4:37 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>
> However, given that we don't do any real development on the back
> branches, it might be that trying to be smart about this is a waste of
> time anyway.  Surely only the HEAD version of the patch is going to be
> something that other developers care about merging with.

For what it's worth that's certainly not true. Any user maintaining a
patched version of the source tree for production use will want to
merge in any patches for older releases. For example anyone using the
CONNECT BY patch with 8.3 will surely want to take any 8.3 patch
releases. Of course EDB in particular has to maintain sources based on
old patch releases as well as the current branch.

That said, I don't see that this really affects the decision here.
These devleopers will just merge in the patch as it was applied to the
back branch anyways.



--
greg


Re: PostgreSQL Developer meeting minutes up

From
Andrew Dunstan
Date:

Tom Lane wrote:
>> Any revision control system should be able to do better than diff/patch 
>> as these systems have more information available to them. Normal GIT 
>> uses the relatively common 3-way merge based upon the most recent common 
>> ancestor algorithm. Assuming there is a most recent common ancestor that 
>> isn't "file creation", it will have a better chance of doing the right 
>> thing.
>>     
>
> And I still haven't seen any actual evidence.  Could we have fewer
> undocumented assertions and more experimental evidence?  Take Andrew's
> plperl patches and see if git does any better with them than plain patch
> does.  (If it's not successful with that patch, it's pointless to try it
> on any bigger cases, I fear.)
>
>             
>   

The plperl stuff is actually a tough case. In 7.4 we didn't have 
provision for two interpreters, so PERL_SYS_INIT3 is called 
unconditionally, and we didn't have a Windows port either, so the 
comment is also different.

I guess that in itself illustrates the problems.

I also entirely agree with your point about us being more kludgey and 
less invasive on back branches.

cheers

andrew


Re: PostgreSQL Developer meeting minutes up

From
Markus Wanner
Date:
Hi,

Tom Lane wrote:
> There are a number of possible reasons, but here are a few that hold for me:

Thank you for this very good collection. I'm still wondering about
what's the best way to represent this in git (or others). Cherry-picking
is arguably the simplest variant. Maybe that can be combined with
merging to preserve merge capability. I'll try that...

> So there are a lot of good reasons to work backwards in patching.

Agreed and understood. However, there are good reasons for keeping merge
capability between branches intact as well. I still hope we can get both
somehow, if not, I'm certainly accepting that backward patching is more
important.

Regards

Markus Wanner



Re: PostgreSQL Developer meeting minutes up

From
Markus Wanner
Date:
Hi,

Andrew Dunstan wrote:
> One fact to keep in mind is that, unlike most other FOSS projects, we
> keep quite a large number of branches live.

So far I thought exactly that would be a good reason for migrating to
something like git. Those claim to ease working on multiple branches in
parallel, and in my experience that works pretty well. I'd like to find
a good way to allow the Postgres project to make use of these features
to ease development.

> It means that they can deploy Postgres with confidence that
> they will not have to upgrade for quite a few years. In the corporate
> world, especially, that is a major issue. I occasionally have clients
> running 7.4 or even older versions.

I agree and appreciate that very much as well.

> The question we often face in backpatching is not "where did it first
> occur?" but "how far back should we patch it?".

Uh.. the difference here mostly being *when* the question comes up,
right? Because the possible answers "in 8.1" or "back to 8.1" are pretty
close.

From what I understand now, you are saying here that you work on the
patch and only after that question how far back to apply it. Note that
working on the patch doesn't necessarily mean having to commit it on
HEAD first. I seem to recall a script which has so far been used for CVS
to do the multi-branch commits pretty much at the same time. Is that
correct?

> the pretty small plperl fixes I applied yesterday and the day
> before, required adjustment going from one branch to the previous one in
> about three out of five back branch cases.

I'll give these a try with one of the touted merge algorithms. I'm
curious myself.

> Sometimes these adjustments
> are small, sometimes they are quite large. So the idea that we can just
> create a fix on say, the 7.4 branch, and then just merge it forward
> nicely, is just fanciful in most cases, as well as being contrary to our
> methods of work.

Well, my experience with the Postgres-R patch has been different.
However, that patch is probably not overly invasive.

> Most of this stuff is almost invisible to most of the community.

The daily work maybe, yes. But not the end result, which is known as
rock-solid. I certainly don't want to change that. ;-)

Regards

Markus Wanner



Re: PostgreSQL Developer meeting minutes up

From
Markus Wanner
Date:
Hi,

Tom Lane wrote:
> How robust is git about dealing with whitespace changes,
> nearby variable renamings, and such?

Monotone tracks changes line by line. I'm not sure about git. Kdiff3,
which is used to do the manual merge, if necessary, uses some finer
grained method, AFAIK.

However, there's no special whitespace treatment. Nor anything remotely
as clever as "nearby variable renaming". There's no such magic, the
developer still needs to tell the tool what he wants.

However, I'd argue that monotone (as well as git) do an incredible job
at "remembering" these decisions and merges, so you never need to do a
manual merge twice. (Which I remember doing a lot with diff/patch, quilt
or subversion).

> Andrew's plperl patches would be an excellent small test case.  Anybody
> want to try them against the experimental git repository and see if git
> does any better than plain patch?

I've given that patch a try under monotone (just because I happen to
know that a lot better). The results should be the same as with git.

I've started with the patch against 7.4 (which I know doesn't resemble
the current workflow, but is sufficient for testing merging
capabilities). Merging that to 8.0 worked without any conflicts.
Although the result then differed from Andrew's work in that the
variable dummy_perl_env is declared after the "#ifdef WIN32" block as
opposed to before in 7.4. The addition in the comment ("notably on
Windows") of course also didn't appear automatically.

It merged from 8.0 to 8.1 without any conflicts, results were equal.

Merging from 8.1 to 8.2 resulted in one merge conflict, because of the
additional condition ('if (interp_state == INTERP_NONE)') that got added
between 8.1 and 8.2.

Merging from 8.2 to 8.3 and then to HEAD as well was conflict free
again. The results differ in whitespace changes exclusively.

So, three out of the five merges would have been equally perfect with
automatic merging, while requiring only one single command, which could
even be scripted, because it remains the same over time, i.e. for
monotone it was something similar to:
  mtn propagate REL8_0_STABLE REL8_1_STABLE

Regards

Markus Wanner


Re: PostgreSQL Developer meeting minutes up

From
Markus Wanner
Date:
Hi,

Greg Stark wrote:
> For what it's worth that's certainly not true. Any user maintaining a
> patched version of the source tree for production use will want to
> merge in any patches for older releases. For example anyone using the
> CONNECT BY patch with 8.3 will surely want to take any 8.3 patch
> releases.

..or port that forward to 8.4, once it's released. To me it doesn't seem
that unreasonable to develop something on top of a stable branch and to
want to migrate that to a newer stable branch or HEAD later on. Git
could certainly help to reduce bit-rotting here, in my experience.

Regards

Markus Wanner


Re: PostgreSQL Developer meeting minutes up

From
Mark Mielke
Date:
Tom Lane wrote:<br /><blockquote cite="mid:12326.1244311188@sss.pgh.pa.us" type="cite"><blockquote type="cite"><pre
wrap="">Anyrevision control system should be able to do better than diff/patch 
 
as these systems have more information available to them. Normal GIT 
uses the relatively common 3-way merge based upon the most recent common 
ancestor algorithm. Assuming there is a most recent common ancestor that 
isn't "file creation", it will have a better chance of doing the right 
thing.   </pre></blockquote><pre wrap="">And I still haven't seen any actual evidence.  Could we have fewer
undocumented assertions and more experimental evidence?  Take Andrew's
plperl patches and see if git does any better with them than plain patch
does.  (If it's not successful with that patch, it's pointless to try it
on any bigger cases, I fear.) </pre></blockquote><br /> This comes to the theory vs profiling I suppose. I am a theory
person- I run things in my head. To me, the concept of having more context to make the right decision, and an algorithm
thattakes advantage of this context to make the right decision, is simple and compelling on its own. Knowing the
algorithmsthat are in use, including how it selects the most recent common ancestor gives me confidence. You have the
capabilitiesto test things for yourself. If you have any questions, try it out. No amount of discussions where others
say"it works great" and you say "I don't believe you until you provide me with output" is going to get anywhere. I
couldset up a few scenarios or grab actual patches and show you particular success cases and particular failure cases,
butwill you really believe it? Because you shouldn't. For all you know, I picked the cases I knew would work and put
themup against the cases I knew would fail. <br /><br /> I've used ClearCase for around 10 years now, and with the
exceptionof "cherry picking", it has very strong and mature merge support. We rely on merges being safe while managing
manyprojects much larger than PostgreSQL. Many of the projects have hundreds of users working on them at the same time.
CVSis *unusable* in these environments. Recently, however, in spite of investments into ClearCase, we are looking at
GITas providing *stronger* merge capabilities than ClearCase, specifically with regard to propagating changes from one
releaseto another. I'm not going to pull up the last ten years of history and make it available to you.<br /><br />
Nothingis going to prove this to you other than trying it out for yourself. People need to be burned by unreliable
mergealgorithms before they respect the value of a reliable merge algorithm. People need to experience reliable merging
beforethey buy the product.<br /><br /> If the theory doesn't work for you, you really are going to have to try it out
foryourself.<br /><br /> Or not.<br /><br /> It doesn't matter to me. :-)<br /><br /> In any case - you raised the
question- I explained how it works - and you shot me done without any evidence of your own. I explained how it works.
It'sup to you to try it out for yourself and decide if you are a believer.<br /><br /> Cheers,<br /> mark<br /><br />
P.S.I'm only a bit insulted by these threads. There are a lot of sceptical people in the crowd who until now have
raisedquestions which only make it clear that these people have not ever worked with a capable SCM system on a major
projectbefore. I really shouldn't hold this against you, which is why I continue to try and provide the theory and
background,so that when you do give it a chance, it will all start to make sense. You'll try it out - find it works
great- and wonder "how does it do that?" Then, hopefully you can go back to my post (or the many others who have tried
tohelp out) and read how it works and say "ah hah! excellent!"<br /><br /><pre class="moz-signature" cols="72">-- 
 
Mark Mielke <a class="moz-txt-link-rfc2396E" href="mailto:mark@mielke.cc"><mark@mielke.cc></a>
</pre>

Re: PostgreSQL Developer meeting minutes up

From
Nicolas Barbier
Date:
2009/6/7 Markus Wanner <markus@bluegap.ch>:

> However, there's no special whitespace treatment. Nor anything remotely
> as clever as "nearby variable renaming". There's no such magic, the
> developer still needs to tell the tool what he wants.

If I understand correctly, "nearby variable renaming" refers to
changes to the few lines surrounding the changes-to-be-merged. There
is certainly supposed to be an advantage relative to diff/patch here:
as all changes leading to both versions are known (up to some common
ancestor), git doesn't need "context lines" to recognize the position
in the file that is supposed to receive the updates.

Example:

Original file:

a
b
c

Random other changes later (a and c are updated to incorporate "nearby
variable renaming" or somesuch):

extra line
a'
b
c'

(Note that the extra line is important, because if the line numbers
stay the same and the lines-to-update are exactly the same, patch
could just ignore the context lines.)

An update to line b yields:

extra line
a'
b'
c'

This change would not be diff/patch-mergeable to the original file,
because the "context lines" a' and c' wouldn't be found. Git is
smarter than this and doesn't need the context lines; rather it uses
the full history to determine that the change to line 3 becomes a
change to line 2 in the original file. It therefore merges this change
to yield:

a
b'
c

Disclaimer: I don't use git, but I assume that this is how all systems
that are smarter than diff/patch work.

Nicolas


Re: PostgreSQL Developer meeting minutes up

From
"Markus Wanner"
Date:
Hi,

Quoting "Mark Mielke" <mark@mark.mielke.cc>:
> I am a theory person - I run things in my head. To me, the concept
> of having more context to make the right decision, and an algorithm
> that takes advantage of this context to make the right decision, is
> simple and compelling on its own. Knowing the algorithms that are in
> use, including how it selects the most recent common ancestor gives
> me confidence.

Than makes me wondering why you are speaking against merges, where
there are common ancestors. I'd argue that in theory (and generally) a
merge yields better results than cherry-picking (where there is no
common ancestor, thus less information). Especially for back-branches,
where there obviously is a common ancestor.

> No amount of discussions where others say "it works great" and you
> say "I don't believe you until you provide me with output" is going
> to get anywhere.

Well, I guess it can be frustrating for both sides. However, I think
these discussions are worthwhile (and necessary) none the less.

As not even those who highly appreciate merge algorithms (you and me,
for example) are in agreement on how to use them (cherry-picking vs.
merging) it doesn't surprise me that others are generally skeptic.

Regards

Markus Wanner


Re: PostgreSQL Developer meeting minutes up

From
Ron Mayer
Date:
Robert Haas wrote:
> On Fri, Jun 5, 2009 at 12:15 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>> ... but I'm not at all excited about cluttering the
>> long-term project history with a zillion micro-commits.  One of the
>> things I find most annoying about reviewing the current commit history
>> is that Bruce has taken a micro-commit approach to managing the TODO
>> list --- I was seldom so happy as the day that disappeared from CVS,
>> because of the ensuing reduction in noise level.

For better or worse, git also includes a command "git-rebase" that can
collapse such micro-commits into a larger one.

Quoting the git-rebase man page:      A range of commits could also be removed with rebase. If we have the
followingsituation:              E---F---G---H---I---J  topicA      then the command          git-rebase --onto
topicA~5topicA~3 topicA      would result in the removal of commits F and G:              E---H´---I´---J´  topicA
 

While I wouldn't recommend using this for historical revisionism, I
imagine it could be useful during code-review time when the
micro-commits (from both the patch submitter and patch reviewer)
are interesting.  After the review, the commits could be collapsed
into meaningful-sized-chunks just before they're merged into the
official branches.




Re: PostgreSQL Developer meeting minutes up

From
"Markus Wanner"
Date:
Hi,

Quoting "Nicolas Barbier" <nicolas.barbier@gmail.com>:
> If I understand correctly, "nearby variable renaming" refers to
> changes to the few lines surrounding the changes-to-be-merged.

Hm.. I took that to mean "changes on the same line". I now realize
this interpretation has been an overly strict interpretation.

> There
> is certainly supposed to be an advantage relative to diff/patch here:
> as all changes leading to both versions are known (up to some common
> ancestor), git doesn't need "context lines" to recognize the position
> in the file that is supposed to receive the updates.

Yes, that's how I understand it as well. Your example seems fine
(except that it does not make much sense to merge with an ancestor).

I'm not sure if git also works line by line (as does monotone).
However, IIRC kdiff3 uses some finer grained comparison, so it can
even merge unrelated change on the same line, i.e.:

ancestor: aaa bbb
left:     axa bbb  (modified a -> x)
right:    aaa byb  (modified b -> y)
merge:    axa byb  (contains both modifications)

Regards

Markus Wanner


Re: PostgreSQL Developer meeting minutes up

From
Mark Mielke
Date:
Markus Wanner wrote:
> Quoting "Mark Mielke" <mark@mark.mielke.cc>:
>> I am a theory person - I run things in my head. To me, the concept of 
>> having more context to make the right decision, and an algorithm that 
>> takes advantage of this context to make the right decision, is simple 
>> and compelling on its own. Knowing the algorithms that are in use, 
>> including how it selects the most recent common ancestor gives me 
>> confidence.
>
> Than makes me wondering why you are speaking against merges, where 
> there are common ancestors. I'd argue that in theory (and generally) a 
> merge yields better results than cherry-picking (where there is no 
> common ancestor, thus less information). Especially for back-branches, 
> where there obviously is a common ancestor.

Nope - definitely not speaking against merges. Automatic merges = best. 
Automatic cherry picking = second best if the work flow doesn't allow 
for merges. Doing things by hand = bad but sometimes necessary. 
Automatic merges or automatic cherry picking with some manual tweaking 
(hopefully possible from kdiff3) = necessary at times but still better 
than doing things by hand completely. I think you and I are in 
agreement. (Even Tom and I are in agreement on many things - I just 
didn't respond to his well thought out great posts, like the one that 
describes why back patching is often better than forward patching when 
having multiple parallel releases open at the same time)

>> No amount of discussions where others say "it works great" and you 
>> say "I don't believe you until you provide me with output" is going 
>> to get anywhere.
> Well, I guess it can be frustrating for both sides. However, I think 
> these discussions are worthwhile (and necessary) none the less.
>
> As not even those who highly appreciate merge algorithms (you and me, 
> for example) are in agreement on how to use them (cherry-picking vs. 
> merging) it doesn't surprise me that others are generally skeptic.

We're in agreement on the merge algorithms I think. :-)

That said, it is a large domain, and there is room for disagreement even 
between those with experience, and you are right that it shouldn't be 
surprising that others are generally sceptic.

Cheers,
mark

-- 
Mark Mielke <mark@mielke.cc>



Re: PostgreSQL Developer meeting minutes up

From
"Markus Wanner"
Date:
Hi,

Quoting "Nicolas Barbier" <nicolas.barbier@gmail.com>:
> ISTM that back-patching

I take this to mean "back-patching by cherry picking".

> a change to a file that wasn't modified on the
> back-branch leads exactly to merging a change to a (file-wise)
> ancestor?

Regarding the file's contents - and therefore the immediately visible  
result - that's correct. However, for a merge, the two ancestor  
revisions are stored, where as with cherry-pinging this information is  
lost (at least for git).

So, trying to merge on top of a cherry-pick, git must merge these  
changes again (which might or might not work). Merging on top of  
merging works just fine.

Regards

Markus Wanner



Re: PostgreSQL Developer meeting minutes up

From
kris@shannon.id.au
Date:
2009/6/7 Tom Lane <tgl@sss.pgh.pa.us>:
> So there are a lot of good reasons to work backwards in patching.
> I don't believe that these would be outweighed by some advantage
> in the mechanics of applying an unchanging patch to multiple
> branches (especially since AFAICT the mechanical advantage would
> be pretty darn minimal anyhow).

As another data point,  the stable branches of the linux kernel are
actually maintained this way.  There is a policy that any patch for the
stable branches must have already be included (in some form) in HEAD.
There is no merging going on.  They aren't even using git cherry-pick,  but
that's because all backpatching goes into a review list rather than happening
immediately.

The multiple branches and merging that is going on in the linux kernel
is all about development of new features, not fixing of bugs.