Thread: Hacking on PostgreSQL via GIT

Hacking on PostgreSQL via GIT

From

Martin Langhoff

Date:

16 April 2007, 14:34:24

Hi Florian,

I am right now running an rsync of the Pg CVS repo to my work machine to
get a git import underway. I'm rather keen on seeing your cool PITR Pg
project go well and I have some git+cvs fu I can apply here (being one
of the git-cvsimport maintainers) ;-)

For the kind of work you'll be doing (writing patches that you'll want
to be rebasing onto the latest HEAD for merging later) git is probably
the best tool. That's what I use it for... tracking my experimental /
custom branches of projects that use CVS or SVN :-)

Initially, I'll post it on http://git.catalyst.net.nz/ and I can run a
daily import for you - once that's in place you can probably get a repo
with your work on http://repo.or.cz/

cheers,


martin
-- 
-----------------------------------------------------------------------
Martin @ Catalyst .Net .NZ  Ltd, PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/           PHYS: Level 2, 150-154 Willis St
OFFICE: +64(4)916-7224  UK: 0845 868 5733 ext 7224  MOB: +64(21)364-017     Make things as simple as possible, but no
simpler- Einstein
 
-----------------------------------------------------------------------

Re: Hacking on PostgreSQL via GIT

From

Aidan Van Dyk

Date:

16 April 2007, 15:05:31

Martin Langhoff wrote:

> Hi Florian,
> 
> I am right now running an rsync of the Pg CVS repo to my work machine to
> get a git import underway. I'm rather keen on seeing your cool PITR Pg
> project go well and I have some git+cvs fu I can apply here (being one
> of the git-cvsimport maintainers) ;-)
> 
> For the kind of work you'll be doing (writing patches that you'll want
> to be rebasing onto the latest HEAD for merging later) git is probably
> the best tool. That's what I use it for... tracking my experimental /
> custom branches of projects that use CVS or SVN :-)
> 
> Initially, I'll post it on http://git.catalyst.net.nz/ and I can run a
> daily import for you - once that's in place you can probably get a repo
> with your work on http://repo.or.cz/

Well, now that more than one of us are working with git on PostgreSQL...

I've had a repo conversion running for a while...  I've only got it to what
I consider "stable" last week:   http://repo.or.cz/w/PostgreSQL.git   git://repo.or.cz/PostgreSQL.git

Note that this is a "special" conversion - I intentionally "unmunge" all the
$PostgreSQL$ tags in this repo.  

I hate the Keyword expansion, and it only servers to make otherwise
automatically merging a manual process.  So I specifically go through and
un-munge any keyword a-like things before stomping it into GIT.

For those interested int he conversion process, I've used a slightly
modified version of fromcvs (A ruby cvs to git/Hg tool), and it runs on all
of pgsql in about 20 minutes.

I gave up on git-svn (because of both speed and my in-ablility to
easy "filter" out Keywords, etc) and git-cvsimport (because cvsps doesn't
seem to like pgsql's repo)

I "update" the git repo daily, based on an anonymous rsync of the cvsroot. 
If the anon-rsync is updated much more frequently, and people think my git
conversion should match it, I have no problem having cron run it more than
daily.

Also - note that I give *no* guarentees of it's integrity, etc.

I've "diffed" a CVS checkout and a git checkout, and the are *almost*
identical.  Almost, because it seems like my git repository currently has 3
files that a cvs checkout doesn't:backend/parser/gram.c             |12088
+++++++++++++++++++++++++++interfaces/ecpg/preproc/pgc.c    | 2887 ++++++interfaces/ecpg/preproc/preproc.c |16988
++++++++++++++++++++++++++++++++++

And at this point, I haven't been bothered to see where those files came
from (and where they dissapear) in CVS and why my import isn't picking that
up...  I could probably be pushed if others find this repo really useful,
but those files problematic...

Re: Hacking on PostgreSQL via GIT

From

Aidan Van Dyk

Date:

16 April 2007, 15:10:39

* Aidan Van Dyk <aidan@highrise.ca> [070416 14:08]:
> Note that this is a "special" conversion - I intentionally "unmunge" all the
> $PostgreSQL$ tags in this repo.  

Blah - and I just noticed that I actually "missed" the $PostgreSQL$
(although I did catch the Date/Modified/From/etc)...

> I hate the Keyword expansion, and it only servers to make otherwise
> automatically merging a manual process.  So I specifically go through and
> un-munge any keyword a-like things before stomping it into GIT.

Expect it to change in the next little while once more ;-)

a.

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: Hacking on PostgreSQL via GIT

From

"Florian G. Pflug"

Date:

16 April 2007, 15:12:39

Martin Langhoff wrote:
> Hi Florian,
> 
> I am right now running an rsync of the Pg CVS repo to my work machine to
> get a git import underway. I'm rather keen on seeing your cool PITR Pg
> project go well and I have some git+cvs fu I can apply here (being one
> of the git-cvsimport maintainers) ;-)
Cool - I'm new to git, so I really appreciate any help that I can get.

> For the kind of work you'll be doing (writing patches that you'll want
> to be rebasing onto the latest HEAD for merging later) git is probably
> the best tool. That's what I use it for... tracking my experimental /
> custom branches of projects that use CVS or SVN :-)
Thats how I figured I'd work - though I don't yet understand what
the advantage of "rebase" is over "merge".

Currently, I've setup a git repo that pulls in the changes from the SVN
repo, and pushed them to my main soc git repo. On that main repo I have
two branches, master and pgsql-head, and I call "cg-merge pgsql-head"
if I want to merge with CVS HEAD.

> Initially, I'll post it on http://git.catalyst.net.nz/ and I can run a
> daily import for you - once that's in place you can probably get a repo
> with your work on http://repo.or.cz/
Having a git mirror of the pgsql CVS would be great.
BTW, I've just check out repo.or.cz, and noticed that there is already a
git mirror of the pgsql CVS: http://repo.or.cz/w/PostgreSQL.git

greetings + thanks
Florian Pflug

Re: Hacking on PostgreSQL via GIT

From

"Florian G. Pflug"

Date:

16 April 2007, 15:24:31

Aidan Van Dyk wrote:
> Martin Langhoff wrote:
> Well, now that more than one of us are working with git on PostgreSQL...
> 
> I've had a repo conversion running for a while...  I've only got it to what
> I consider "stable" last week:
>     http://repo.or.cz/w/PostgreSQL.git
>     git://repo.or.cz/PostgreSQL.git
Ah - thats what I just stumbled over ;-)

> For those interested int he conversion process, I've used a slightly
> modified version of fromcvs (A ruby cvs to git/Hg tool), and it runs on all
> of pgsql in about 20 minutes.
> 
> I gave up on git-svn (because of both speed and my in-ablility to
> easy "filter" out Keywords, etc) and git-cvsimport (because cvsps doesn't
> seem to like pgsql's repo)
Yeah, git-cvsimport didn't work for me either...

> I "update" the git repo daily, based on an anonymous rsync of the cvsroot. 
> If the anon-rsync is updated much more frequently, and people think my git
> conversion should match it, I have no problem having cron run it more than
> daily.
> 
> Also - note that I give *no* guarentees of it's integrity, etc.
> 
> I've "diffed" a CVS checkout and a git checkout, and the are *almost*
> identical.  Almost, because it seems like my git repository currently has 3
> files that a cvs checkout doesn't:
>  backend/parser/gram.c             |12088 +++++++++++++++++++++++++++
>  interfaces/ecpg/preproc/pgc.c     | 2887 ++++++
>  interfaces/ecpg/preproc/preproc.c |16988 ++++++++++++++++++++++++++++++++++
> 
> And at this point, I haven't been bothered to see where those files came
> from (and where they dissapear) in CVS and why my import isn't picking that
> up...  I could probably be pushed if others find this repo really useful,
> but those files problematic...
Thats interesting - the SVN mirror of the pgsql CVS at
http://projects.commandprompt.com/public/pgsql/browser
has exactly the same problem with those 3 files, as I found out the hard way ;-)

In the case of pgc.c, I've compared that revisions in CVS with the one in
SVN. SVN include the cvs-version 1.5 if this file in trunk, which seems to
be the last version of that file in CVS HEAD. Interestingly,
http://developer.postgresql.org/cvsweb.cgi/pgsql/src/interfaces/ecpg/preproc/Attic/pgc.c
shows no trace of the file being deleted from HEAD either - it just shows
that it was removed from WIN32_DEV. But still a CVS checkout doesn't include 
that file...

Since 3 tools (cvsweb, git-cvsimport and whatever commandprompt uses to create
the SVN mirror) all come to the same conclusion regarding this file, I think
that this is caused by some corruption of the CVS repository - but I don't have
the cvs-fu to debug this...

greetings, Florian Pflug

Re: Hacking on PostgreSQL via GIT

From

Alvaro Herrera

Date:

16 April 2007, 15:28:08

Aidan Van Dyk wrote:

> I've "diffed" a CVS checkout and a git checkout, and the are *almost*
> identical.  Almost, because it seems like my git repository currently has 3
> files that a cvs checkout doesn't:
>  backend/parser/gram.c             |12088 +++++++++++++++++++++++++++
>  interfaces/ecpg/preproc/pgc.c     | 2887 ++++++
>  interfaces/ecpg/preproc/preproc.c |16988 ++++++++++++++++++++++++++++++++++
> 
> And at this point, I haven't been bothered to see where those files came
> from (and where they dissapear) in CVS and why my import isn't picking that
> up...  I could probably be pushed if others find this repo really useful,
> but those files problematic...

These files are generated (from gram.y, pgc.l and preproc.y
respectievly) and are not present in the CVS repo, though I think they
have been at some point.

It's strange that other generated files (that have also been in the repo
in the past) like preproc.h are not showing up.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Hacking on PostgreSQL via GIT

From

Tom Lane

Date:

16 April 2007, 16:00:38

Alvaro Herrera <alvherre@commandprompt.com> writes:
> These files are generated (from gram.y, pgc.l and preproc.y
> respectievly) and are not present in the CVS repo, though I think they
> have been at some point.

> It's strange that other generated files (that have also been in the repo
> in the past) like preproc.h are not showing up.

The weird thing about these files is that the CVS history shows commits
on HEAD later than the file removal commit.  I don't recall if Vadim
unintentionally re-added the files before making those commits ... but
if he did, you'd think it'd have taken another explicit removal to get
rid of them in HEAD.  More likely, there was some problem in his local
tree that allowed a "cvs commit" to think it should update the
repository with copies of the derived files he happened to have.

I think this is a corner case that CVS handles in a particular way and
the tools people are using to read the repository handle in a different
way.  Which would be a bug in those tools, since CVS's interpretation
must be right by definition.
        regards, tom lane

Re: Hacking on PostgreSQL via GIT

From

"Florian G. Pflug"

Date:

16 April 2007, 17:16:58

Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> These files are generated (from gram.y, pgc.l and preproc.y
>> respectievly) and are not present in the CVS repo, though I think they
>> have been at some point.
> 
>> It's strange that other generated files (that have also been in the repo
>> in the past) like preproc.h are not showing up.
> 
> The weird thing about these files is that the CVS history shows commits
> on HEAD later than the file removal commit.  I don't recall if Vadim
> unintentionally re-added the files before making those commits ... but
> if he did, you'd think it'd have taken another explicit removal to get
> rid of them in HEAD.  More likely, there was some problem in his local
> tree that allowed a "cvs commit" to think it should update the
> repository with copies of the derived files he happened to have.
> 
> I think this is a corner case that CVS handles in a particular way and
> the tools people are using to read the repository handle in a different
> way.  Which would be a bug in those tools, since CVS's interpretation
> must be right by definition.

The question is if it'd be acceptable to manually remove that last commit
from the repository. I guess simply readding, and then removing the files
again should do the trick, though I'd be cleaner to fix remove the
offending commit in the first place. Should postgres ever decide to switch
to another version control system (which I don't advocate), that'd be
one obstacle less to deal with...

Or is the risk of causing breakage too high?

greetings, Florian Pflug

Re: Hacking on PostgreSQL via GIT

From

Chris Browne

Date:

16 April 2007, 19:04:07

fgp@phlo.org ("Florian G. Pflug") writes:
> Martin Langhoff wrote:
>> Hi Florian,
>> I am right now running an rsync of the Pg CVS repo to my work
>> machine to
>> get a git import underway. I'm rather keen on seeing your cool PITR Pg
>> project go well and I have some git+cvs fu I can apply here (being one
>> of the git-cvsimport maintainers) ;-)
> Cool - I'm new to git, so I really appreciate any help that I can get.
>
>> For the kind of work you'll be doing (writing patches that you'll want
>> to be rebasing onto the latest HEAD for merging later) git is probably
>> the best tool. That's what I use it for... tracking my experimental /
>> custom branches of projects that use CVS or SVN :-)
> Thats how I figured I'd work - though I don't yet understand what
> the advantage of "rebase" is over "merge".
>
> Currently, I've setup a git repo that pulls in the changes from the SVN
> repo, and pushed them to my main soc git repo. On that main repo I have
> two branches, master and pgsql-head, and I call "cg-merge pgsql-head"
> if I want to merge with CVS HEAD.
>
>> Initially, I'll post it on http://git.catalyst.net.nz/ and I can run a
>> daily import for you - once that's in place you can probably get a repo
>> with your work on http://repo.or.cz/
> Having a git mirror of the pgsql CVS would be great.
> BTW, I've just check out repo.or.cz, and noticed that there is already a
> git mirror of the pgsql CVS: http://repo.or.cz/w/PostgreSQL.git

This strikes me as being a really super thing, having both Subversion
and Git repositories publicly available that are tracking the
PostgreSQL sources.

Stepping back to the SCM discussion, people were interested in finding
out what merits there were in having these sorts of SCMs, and in
finding out what glitches people might discover (e.g. - like the files
where the CVS repository is a bit schizophrenic as to whether they are
still there or not...).  Having these repositories should allow some
of this experimentation to take place now.

I'd be interested in fiddling with a Git repository, at some point;
I'll happily wait a bit to start drawing from one of these existing
ones, to let the dust settle and to let things stabilize a bit.
-- 
(reverse (concatenate 'string "moc.enworbbc" "@" "enworbbc"))
http://linuxdatabases.info/info/emacs.html
"Support your local medical examiner - die strangely."
-- Blake Bowers

Re: Hacking on PostgreSQL via GIT

From

Chris Browne

Date:

16 April 2007, 19:04:42

aidan@highrise.ca (Aidan Van Dyk) writes:
> I've "diffed" a CVS checkout and a git checkout, and the are *almost*
> identical.  Almost, because it seems like my git repository currently has 3
> files that a cvs checkout doesn't:
>  backend/parser/gram.c             |12088 +++++++++++++++++++++++++++
>  interfaces/ecpg/preproc/pgc.c     | 2887 ++++++
>  interfaces/ecpg/preproc/preproc.c |16988 ++++++++++++++++++++++++++++++++++
>
> And at this point, I haven't been bothered to see where those files came
> from (and where they dissapear) in CVS and why my import isn't picking that
> up...  I could probably be pushed if others find this repo really useful,
> but those files problematic...

Those three files are normally generated by either flex or bison
(gram.c depends on gram.y, pgc.c on pgc.l, and preproc.c on
preproc.y); I'd suggest removing those three files from your git
repository.
-- 
"cbbrowne","@","acm.org"
http://cbbrowne.com/info/rdbms.html
"They laughed at Columbus, they laughed at Fulton, they laughed at the
Wright brothers.  But they also laughed at Bozo the Clown."
-- Carl Sagan

Re: Hacking on PostgreSQL via GIT

From

Aidan Van Dyk

Date:

16 April 2007, 19:44:14

* Florian G. Pflug <fgp@phlo.org> [070416 16:16]:

> >I think this is a corner case that CVS handles in a particular way and
> >the tools people are using to read the repository handle in a different
> >way.  Which would be a bug in those tools, since CVS's interpretation
> >must be right by definition.

;-)

Would anyone know if these were "hand moved" to Attic?  For instance, I
*can't* seem to get non-dead files into Attic, no matter what I try with
my cvs (on debian).  But I haven't gone through the last 8 years of
CVS's CVS logs to see if they fixed a bug in the cvs server code that
would allow a non-dead HEAD rcs to be in the Attic...

> The question is if it'd be acceptable to manually remove that last commit
> from the repository. I guess simply readding, and then removing the files
> again should do the trick, though I'd be cleaner to fix remove the
> offending commit in the first place. Should postgres ever decide to switch
> to another version control system (which I don't advocate), that'd be
> one obstacle less to deal with...
> 
> Or is the risk of causing breakage too high?

Well, I've "hand fixed" this in my conversion process so my git
conversion should not have this problem...

I'm not a fan of mucking around by hand in CVS.  It's only because of
the short comings of CVS that it's necessary to every resort to that.
So I don't think re-adding/deleting it is worth it...

I've updated the repo.or.cz/PostgreSQL.git again - and this time it
should be pretty good.  Consider it "usable" to clone off and follow CVS
development with...  I won't re-convert the whole thing again, and will
just provide daily updates to it now.  Unless anybody finds issues with
it...

Ignore the "public" branch in there - that got in in an errant push, and
I don't know how to remove branches on repo.or.cz.  I'm now just putting
"conversion notes" up in the public branch...  IT's *not* a PostgreSQL
branch.

a.

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: Hacking on PostgreSQL via GIT

From

Tom Lane

Date:

16 April 2007, 20:03:37

Aidan Van Dyk <aidan@highrise.ca> writes:
> Would anyone know if these were "hand moved" to Attic?

Seems unlikely, since there's a commit log entry for the removal.  But
this all happened seven-plus years ago and I'm sure there's an old CVS
bug involved *somewhere*.

I like the idea of re-adding and then re-removing the files on HEAD.
Does anyone think that poses any real risk?
        regards, tom lane

Re: Hacking on PostgreSQL via GIT

From

Aidan Van Dyk

Date:

16 April 2007, 20:23:02

* Tom Lane <tgl@sss.pgh.pa.us> [070416 19:03]:
> Aidan Van Dyk <aidan@highrise.ca> writes:
> > Would anyone know if these were "hand moved" to Attic?
> 
> Seems unlikely, since there's a commit log entry for the removal.  But
> this all happened seven-plus years ago and I'm sure there's an old CVS
> bug involved *somewhere*.
> 
> I like the idea of re-adding and then re-removing the files on HEAD.
> Does anyone think that poses any real risk?

No - it even fixed the "hand moved" test I had done trying to create an
Attic with, when trying to figure out how they got that way in the first
place...

What I did when I converted the repo was just hand edit those files to
have a state of "dead" to match their position in Attic for those RCS
revs.  If you "add" them and remove them, I believe my GIT conversion
will actually "follow" that correctly...  If not - I just rm -Rf it and
let it go from scratch "one more time"...  I'm glad computers are good
at that type of repetitive task...

a.

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: Hacking on PostgreSQL via GIT

From

Tom Lane

Date:

16 April 2007, 21:57:21

Aidan Van Dyk <aidan@highrise.ca> writes:
> * Tom Lane <tgl@sss.pgh.pa.us> [070416 19:03]:
>> I like the idea of re-adding and then re-removing the files on HEAD.
>> Does anyone think that poses any real risk?

> No - it even fixed the "hand moved" test I had done trying to create an
> Attic with, when trying to figure out how they got that way in the first
> place...

Well, it doesn't work :-(.  CVS is definitely a bit confused about the
status of these files:

$ touch gram.c
$ cvs add gram.c
cvs add: gram.c added independently by second party
$ cvs remove gram.c
cvs remove: file `gram.c' still in working directory
cvs remove: 1 file exists; remove it first
$ rm gram.c
rm: remove regular empty file `gram.c'? y
$ cvs remove gram.c
cvs remove: nothing known about `gram.c'

So there's no way, apparently, to fix the state of these files through
the "front door".  Shall we try the proposed idea of hand-moving the
files out of the Attic subdirectory, whereupon they should appear live
and we can cvs remove them again?  I have login on cvs.postgresql.org
and can try this, but I'd like confirmation from someone that this is
unlikely to break things.  Is there any hidden state to be fixed in the
CVS repository?  I don't see any ...
        regards, tom lane

Re: Hacking on PostgreSQL via GIT

From

Tom Lane

Date:

16 April 2007, 22:11:43

I wrote:
> So there's no way, apparently, to fix the state of these files through
> the "front door".

I take that back: the right sequence involving a "cvs update" got me
into a state where it thought the files were "locally modified", and
then I could commit and "cvs remove" and commit again.  So hopefully
it's all cleaned up now --- at least the states of the files look
reasonable in cvsweb.
        regards, tom lane

Re: Hacking on PostgreSQL via GIT

From

Andrew Dunstan

Date:

16 April 2007, 22:13:28

Tom Lane wrote:
> Aidan Van Dyk <aidan@highrise.ca> writes:
>   
>> * Tom Lane <tgl@sss.pgh.pa.us> [070416 19:03]:
>>     
>>> I like the idea of re-adding and then re-removing the files on HEAD.
>>> Does anyone think that poses any real risk?
>>>       
>
>   
>> No - it even fixed the "hand moved" test I had done trying to create an
>> Attic with, when trying to figure out how they got that way in the first
>> place...
>>     
>
> Well, it doesn't work :-(.  CVS is definitely a bit confused about the
> status of these files:
>
> $ touch gram.c
> $ cvs add gram.c
> cvs add: gram.c added independently by second party
> $ cvs remove gram.c
> cvs remove: file `gram.c' still in working directory
> cvs remove: 1 file exists; remove it first
> $ rm gram.c
> rm: remove regular empty file `gram.c'? y
> $ cvs remove gram.c
> cvs remove: nothing known about `gram.c'
>
> So there's no way, apparently, to fix the state of these files through
> the "front door".  Shall we try the proposed idea of hand-moving the
> files out of the Attic subdirectory, whereupon they should appear live
> and we can cvs remove them again?  I have login on cvs.postgresql.org
> and can try this, but I'd like confirmation from someone that this is
> unlikely to break things.  Is there any hidden state to be fixed in the
> CVS repository?  I don't see any ...
>
>   

Forgive my caution, but I'd suggest trying on a copy first.

cheers

andrew

Re: Hacking on PostgreSQL via GIT

From

Alvaro Herrera

Date:

16 April 2007, 22:33:02

Andrew Dunstan wrote:
> Tom Lane wrote:

> >So there's no way, apparently, to fix the state of these files through
> >the "front door".  Shall we try the proposed idea of hand-moving the
> >files out of the Attic subdirectory, whereupon they should appear live
> >and we can cvs remove them again?  I have login on cvs.postgresql.org
> >and can try this, but I'd like confirmation from someone that this is
> >unlikely to break things.  Is there any hidden state to be fixed in the
> >CVS repository?  I don't see any ...
> 
> Forgive my caution, but I'd suggest trying on a copy first.

Too late ;-)

FWIW my CVSup copy seems happy with the change; it reported this when I
updated it:

$ pgcvsup 
Connected to cvsup.postgresql.org
Updating collection repository/cvsEdit pgsql/src/backend/parser/gram.c,v -> AtticEdit
pgsql/src/backend/utils/mb/encnames.c,vEditpgsql/src/bin/pg_dump/pg_dump.c,vEdit pgsql/src/bin/psql/common.c,vEdit
pgsql/src/include/pg_config.h.win32,vEditpgsql/src/interfaces/ecpg/preproc/pgc.c,v -> AtticEdit
pgsql/src/interfaces/ecpg/preproc/preproc.c,v-> AtticEdit pgsql/src/tools/msvc/Solution.pm,vRsync
sup/repository/checkouts.cvs
Finished successfully

The gram.c,v file looks good -- it has the expected "state dead;" line.
A checked out tree from that updates fine.  A "cvs update" to a checked
out tree direct from the main CVS server also updates fine.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Hacking on PostgreSQL via GIT

From

Aidan Van Dyk

Date:

16 April 2007, 22:49:17

* Tom Lane <tgl@sss.pgh.pa.us> [070416 21:11]:
> I wrote:
> > So there's no way, apparently, to fix the state of these files through
> > the "front door".
>
> I take that back: the right sequence involving a "cvs update" got me
> into a state where it thought the files were "locally modified", and
> then I could commit and "cvs remove" and commit again.  So hopefully
> it's all cleaned up now --- at least the states of the files look
> reasonable in cvsweb.

And my GIT conversion handled that nicely too.  Looks good (at least
from the GIT PoV).

Now, on my hand-crafted GIT repo - you see them in and out now with
Tom's commits.  But any *real* conversion tracking the *actual* RCS cvs
states should have them checked out from 1999 to now in the state they
were from vadim's last changes, and Tom's first commit will "truncate"
them (because he checked them in as empty files), and the 2nd commit
will remove them again.

So it's still a "gotcha" if you're trying to get a copy of CVS from ages
ago via one of the alternative SCM conversions...

But my git one works, so I'll let others worry about the others ;-)

a.

--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: Hacking on PostgreSQL via GIT

From

Tom Lane

Date:

16 April 2007, 22:59:17

Aidan Van Dyk <aidan@highrise.ca> writes:
> Now, on my hand-crafted GIT repo - you see them in and out now with
> Tom's commits.  But any *real* conversion tracking the *actual* RCS cvs
> states should have them checked out from 1999 to now in the state they
> were from vadim's last changes, and Tom's first commit will "truncate"
> them (because he checked them in as empty files), and the 2nd commit
> will remove them again.

> So it's still a "gotcha" if you're trying to get a copy of CVS from ages
> ago via one of the alternative SCM conversions...

It shouldn't be a big problem, assuming the checkout preserves the file
dates --- they'll look older than the source files and so a rebuild will
happen anyway in such a checkout.
        regards, tom lane

Re: Hacking on PostgreSQL via GIT

From

Kris Jurka

Date:

17 April 2007, 00:59:45

Tom Lane wrote:

> It shouldn't be a big problem, assuming the checkout preserves the file
> dates --- they'll look older than the source files and so a rebuild will
> happen anyway in such a checkout.
> 

Actually this is a problem with at least SVN.  A "svn export" will 
create files with the original repository dates, but a "svn checkout" 
will use the current time unless you enable a config option for your 
local svn client.

Kris Jurka

Re: Hacking on PostgreSQL via GIT

From

Heikki Linnakangas

Date:

17 April 2007, 07:25:44

Chris Browne wrote:
> This strikes me as being a really super thing, having both Subversion
> and Git repositories publicly available that are tracking the
> PostgreSQL sources.
> 
> Stepping back to the SCM discussion, people were interested in finding
> out what merits there were in having these sorts of SCMs, and in
> finding out what glitches people might discover (e.g. - like the files
> where the CVS repository is a bit schizophrenic as to whether they are
> still there or not...).  Having these repositories should allow some
> of this experimentation to take place now.

Yep. It'd be nice to have official GIT and SVN etc. mirrors of the main 
CVS repository. There's no pressing reason for the PostgreSQL project to 
switch from CVS, but we could provide alternatives to developers. As 
long as you can create a diff to send to pgsql-patches, it doesn't 
matter which version control system you use. I'm interested in trying 
GIT or Monotone myself, presumably they would be good for managing 
unapplied, work-in-progress patches.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Hacking on PostgreSQL via GIT

From

Martin Langhoff

Date:

17 April 2007, 18:28:23

Florian G. Pflug wrote:
>> Initially, I'll post it on http://git.catalyst.net.nz/ and I can run a
>> daily import for you - once that's in place you can probably get a repo
>> with your work on http://repo.or.cz/

Ok - you can now clone from http://git.catalyst.net.nz/postgresql.git
viewable from http://git.catalyst.net.nz/gitweb too. It's 24hs behind,
and I'm sorting the updating scripts that will run daily.

The HEAD of CVS is renamed to cvshead there. All the other branches and
tags are untouched. Please DO check that the tip of cvshead matches a
CVS checkout with -kk. I've had limited time to sanitycheck the import ;-)

cheers,

m
-- 
-----------------------------------------------------------------------
Martin @ Catalyst .Net .NZ  Ltd, PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/           PHYS: Level 2, 150-154 Willis St
OFFICE: +64(4)916-7224  UK: 0845 868 5733 ext 7224  MOB: +64(21)364-017     Make things as simple as possible, but no
simpler- Einstein

-----------------------------------------------------------------------

Re: Hacking on PostgreSQL via GIT

From

Martin Langhoff

Date:

17 April 2007, 18:28:39

Florian G. Pflug wrote:
> Cool - I'm new to git, so I really appreciate any help that I can get.

Great - I am a SoC mentor for 2 other projects (git and moodle) so I've
got some time set aside for SoC stuff. You might as well take advantage
of it :-)

>> For the kind of work you'll be doing (writing patches that you'll want
>> to be rebasing onto the latest HEAD for merging later) git is probably
>> the best tool. That's what I use it for... tracking my experimental /
>> custom branches of projects that use CVS or SVN :-)
>
> Thats how I figured I'd work - though I don't yet understand what
> the advantage of "rebase" is over "merge".

Probably during your development cycle you'll want to merge the changes
from cvshead into your dev branch - that's what you seem to be doing.
Great. Later when you are getting things ready for actual merging into
CVS you'll want to prepare a series of patches that apply to the top of
cvshead. That's where the rebase tools become useful.

> Currently, I've setup a git repo that pulls in the changes from the SVN
> repo, and pushed them to my main soc git repo. On that main repo I have
> two branches, master and pgsql-head, and I call "cg-merge pgsql-head"
> if I want to merge with CVS HEAD.

You are doing the right thing. If possible, I'd suggest that you use git
instead of cogito. Recent git is as user-friendly as cogito. The main
difference is that you'll need to learn a bit about the index, and
that'll be useful.

>> Initially, I'll post it on http://git.catalyst.net.nz/ and I can run a
>> daily import for you - once that's in place you can probably get a repo
>> with your work on http://repo.or.cz/
>
> Having a git mirror of the pgsql CVS would be great.
> BTW, I've just check out repo.or.cz, and noticed that there is already a
> git mirror of the pgsql CVS: http://repo.or.cz/w/PostgreSQL.git

Yes, I've seen it, but I don't know the guy. I can ensure you have a
CVS->GIT gateway updated daily or twice daily.

cheers,

martin
-- 
-----------------------------------------------------------------------
Martin @ Catalyst .Net .NZ  Ltd, PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/           PHYS: Level 2, 150-154 Willis St
OFFICE: +64(4)916-7224  UK: 0845 868 5733 ext 7224  MOB: +64(21)364-017     Make things as simple as possible, but no
simpler- Einstein

-----------------------------------------------------------------------

Re: Hacking on PostgreSQL via GIT

From

Aidan Van Dyk

Date:

17 April 2007, 19:52:29

* Martin Langhoff <martin@catalyst.net.nz> [070417 17:32]:

> > Having a git mirror of the pgsql CVS would be great.
> > BTW, I've just check out repo.or.cz, and noticed that there is already a
> > git mirror of the pgsql CVS: http://repo.or.cz/w/PostgreSQL.git
> 
> Yes, I've seen it, but I don't know the guy. I can ensure you have a
> CVS->GIT gateway updated daily or twice daily.

I'm an unknown here, I know - I've used PostgreSQL for years, but only
recently started following the development community...  And at this
point I'm still pretty much just following, hence my interest in getting
a GIT repot of PostgreSQL.  GIT is *very* helful at a "new" code-base.

I have my CVS->GIT conversion running hourly from the anon-rsync of the
cvsroot.  I don't know the specifics of the PostgreSQL rsync/mirror
setup, so I may be pulling it more frequently than it's actually
published, but until I hear from someone that tells me I'm taxing to
many rsync resources, I'll just leave it that way...  The CVS->GIT
conversion is cheap - it's the rsync that takes most of the time... I
can run it more frequently if people think it would be valuble and the
rsync-admins don't care...

And remember the warning I gave that my conversion is *not* a direct CVS
import - I intentionally *unexpand* all Keywords before stuffing them
into GIT so that merging and branching can ignore all the Keyword
conflicts... 

a.

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: Hacking on PostgreSQL via GIT

From

Tom Lane

Date:

17 April 2007, 21:00:58

Aidan Van Dyk <aidan@highrise.ca> writes:
> I have my CVS->GIT conversion running hourly from the anon-rsync of the
> cvsroot.  I don't know the specifics of the PostgreSQL rsync/mirror
> setup, so I may be pulling it more frequently than it's actually
> published, but until I hear from someone that tells me I'm taxing to
> many rsync resources, I'll just leave it that way...

The anoncvs mirror updates once an hour, so you're fine.
        regards, tom lane

Re: Hacking on PostgreSQL via GIT

From

"Florian G. Pflug"

Date:

17 April 2007, 21:30:36

Martin Langhoff wrote:
> Aidan Van Dyk wrote:
>> And remember the warning I gave that my conversion is *not* a direct CVS
>> import - I intentionally *unexpand* all Keywords before stuffing them
>> into GIT so that merging and branching can ignore all the Keyword
>> conflicts... 
> 
> My import is unexpanding those as well to support rebasing and merging
> better.
> 
> So - if you are committed to providing your gateway long term to
> Florian, I'm happy to drop my gateway in favour of yours.

There seem to be other people than me who are interested in a git
mirror. Maybe we could declare one of those mirrors the
"official" one - I guess things would be easier if all people
interested in using git would use the same mirror...

What do you guys think?

> (Florian, before basing your code on either you should get a checkout of
> Aidan's and mine and check that the tips of the branches you are working
> on match the cvs branches -- the cvsimport code is good but whereever
> CVS is involved, there's a lot of interpretation at play, a sanity check
> is always good).
I actually hoped that I could just take my current git repo, and rebase
my branch onto one of those two repos - or does rebase only work from
an ancestor to a descendant?

greetings, Florian Pflug

Re: Hacking on PostgreSQL via GIT

From

Martin Langhoff

Date:

17 April 2007, 21:58:10

Aidan Van Dyk wrote:
> I'm an unknown here, I know - I've used PostgreSQL for years, but only
> recently started following the development community...  And at this

I'm probably unknown here as well. Hi everyone ;-)

> And remember the warning I gave that my conversion is *not* a direct CVS
> import - I intentionally *unexpand* all Keywords before stuffing them
> into GIT so that merging and branching can ignore all the Keyword
> conflicts... 

My import is unexpanding those as well to support rebasing and merging
better.

So - if you are committed to providing your gateway long term to
Florian, I'm happy to drop my gateway in favour of yours.

(Florian, before basing your code on either you should get a checkout of
Aidan's and mine and check that the tips of the branches you are working
on match the cvs branches -- the cvsimport code is good but whereever
CVS is involved, there's a lot of interpretation at play, a sanity check
is always good).

cheers,

m
-- 
-----------------------------------------------------------------------
Martin @ Catalyst .Net .NZ  Ltd, PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/           PHYS: Level 2, 150-154 Willis St
OFFICE: +64(4)916-7224  UK: 0845 868 5733 ext 7224  MOB: +64(21)364-017     Make things as simple as possible, but no
simpler- Einstein

-----------------------------------------------------------------------

Re: Hacking on PostgreSQL via GIT

From

Aidan Van Dyk

Date:

17 April 2007, 22:04:47

* Florian G. Pflug <fgp@phlo.org> [070417 20:30]:

> >So - if you are committed to providing your gateway long term to
> >Florian, I'm happy to drop my gateway in favour of yours.
> 
> There seem to be other people than me who are interested in a git
> mirror. Maybe we could declare one of those mirrors the
> "official" one - I guess things would be easier if all people
> interested in using git would use the same mirror...
> 
> What do you guys think?

I'll provide that gateway as long as I have access to hardware and
access that can keep up with PostgreSQL CVS...

Of course, the beauty of a DVCS is that we don't really need an official
one...  And with GIT, you can even "graft" history in if you want.  So
you could even "start" your GIT work from a cvs checkout of whenever,
and "graft" any commit from any of the CVS->GIT conversion history as a
parent to your starting point.

a.

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: Hacking on PostgreSQL via GIT

From

Tom Lane

Date:

18 April 2007, 02:33:41

Martin Langhoff <martin@catalyst.net.nz> writes:
> Aidan Van Dyk wrote:
>> And remember the warning I gave that my conversion is *not* a direct CVS
>> import - I intentionally *unexpand* all Keywords before stuffing them
>> into GIT so that merging and branching can ignore all the Keyword
>> conflicts... 

> My import is unexpanding those as well to support rebasing and merging
> better.

Um ... why do either of you feel there's an issue there?

We switched over to $PostgreSQL$ a few years ago specifically to avoid
creating merge problems for downstream repositories.  If there are any
other keyword expansions left in the source text I'd vote to remove
them.  If you have a problem with $PostgreSQL$, why?
        regards, tom lane

Re: Hacking on PostgreSQL via GIT

From

Aidan Van Dyk

Date:

18 April 2007, 07:56:28

* Tom Lane <tgl@sss.pgh.pa.us> [070418 01:33]:
> Um ... why do either of you feel there's an issue there?
> 
> We switched over to $PostgreSQL$ a few years ago specifically to avoid
> creating merge problems for downstream repositories.  If there are any
> other keyword expansions left in the source text I'd vote to remove
> them.  If you have a problem with $PostgreSQL$, why?

Mine is only a generic warning. I convert many CVS repos to GIT, all
using the same gateway setup, so I haven't done anything "specific" for
PostgreSQL.  Most other projects are not as diciplined as PostgreSQL,
and I regularly see Modified, Date, Id, Log, etc keywords, as well as
project specific ones like PostgreSQL, OpenBSD, FreeBSD, etc...

Un-expansion *may* not be perfect...

a.

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: Hacking on PostgreSQL via GIT

From

Alvaro Herrera

Date:

18 April 2007, 09:55:57

Tom Lane wrote:
> Martin Langhoff <martin@catalyst.net.nz> writes:
> > Aidan Van Dyk wrote:
> >> And remember the warning I gave that my conversion is *not* a direct CVS
> >> import - I intentionally *unexpand* all Keywords before stuffing them
> >> into GIT so that merging and branching can ignore all the Keyword
> >> conflicts... 
> 
> > My import is unexpanding those as well to support rebasing and merging
> > better.
> 
> Um ... why do either of you feel there's an issue there?
> 
> We switched over to $PostgreSQL$ a few years ago specifically to avoid
> creating merge problems for downstream repositories.  If there are any
> other keyword expansions left in the source text I'd vote to remove
> them.  If you have a problem with $PostgreSQL$, why?

One weird thing I noticed some time ago is that we have an $Id$ (or was
it $Header$? I don't remember) somewhere, which was supposed to be from
the upstream repo where we got the file from, but it was being expanded
to our local version to the file.  We _also_ have the $PostgreSQL$ tag
in there which carries the same info.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Hacking on PostgreSQL via GIT

From

Martin Langhoff

Date:

18 April 2007, 15:30:16

Tom Lane wrote:
> Um ... why do either of you feel there's an issue there?
> 
> We switched over to $PostgreSQL$ a few years ago specifically to avoid
> creating merge problems for downstream repositories.  If there are any
> other keyword expansions left in the source text I'd vote to remove
> them.  If you have a problem with $PostgreSQL$, why?

I have to accept the blame for not researching about the repo in the
first place. I didn't know about $PostgreSQL$ - from the looks of it, it
acts _just_ like $Id$. So I guess you use PostgreSQL instead of Id.

As GIT won't touch them, Florian will probably be just fine with his
patches, and I doubt they'll be more than a minor annoyance, if at all.

Keyword expansions are generally bad because SCM tools should track
_content_ - and keyword expansions _modify_ it to add metadata that is
somewhat redundant, obtainable in other ways, and should just not be in
the middle of the _data_. Those modifications lead to patches that have
bogus hunks and sometimes don't apply, MD5/SHA1 checksums that don't
match and a whole lot of uncertainty.

You can't just say "the content is the same" by comparing bytes or SHA1
digests if the committer, the path or the history are different. And it
is a mighty important ability for an SCM.

The argument runs much longer than that - and the flamewars are quite
entertaining. If anyone's keen we're having one right now on
git@vger.kernel.org . I am sure Pg hackers will find parallels between
keyword expansion (as a misfeature everyone is used to) and the SQL
travesties that early MySQL is famous for.

I've picked my poison... ran away from MySQL to Pg, and from CVS
/SVN/Arch to GIT. Not looking back :-)

cheers

m
-- 
-----------------------------------------------------------------------
Martin @ Catalyst .Net .NZ  Ltd, PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/           PHYS: Level 2, 150-154 Willis St
OFFICE: +64(4)916-7224  UK: 0845 868 5733 ext 7224  MOB: +64(21)364-017     Make things as simple as possible, but no
simpler- Einstein

-----------------------------------------------------------------------

Re: Hacking on PostgreSQL via GIT

From

"Jim C. Nasby"

Date:

18 April 2007, 15:36:25

On Wed, Apr 18, 2007 at 06:39:34PM +1200, Martin Langhoff wrote:
> Keyword expansions are generally bad because SCM tools should track
> _content_ - and keyword expansions _modify_ it to add metadata that is
> somewhat redundant, obtainable in other ways, and should just not be in
> the middle of the _data_. Those modifications lead to patches that have
> bogus hunks and sometimes don't apply, MD5/SHA1 checksums that don't
> match and a whole lot of uncertainty.

Then how do you tell what version a file is if it's outside of a
checkout?
-- 
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: Hacking on PostgreSQL via GIT

From

Aidan Van Dyk

Date:

18 April 2007, 16:00:03

* Jim C. Nasby <jim@nasby.net> [070418 14:39]:
> On Wed, Apr 18, 2007 at 06:39:34PM +1200, Martin Langhoff wrote:
> > Keyword expansions are generally bad because SCM tools should track
> > _content_ - and keyword expansions _modify_ it to add metadata that is
> > somewhat redundant, obtainable in other ways, and should just not be in
> > the middle of the _data_. Those modifications lead to patches that have
> > bogus hunks and sometimes don't apply, MD5/SHA1 checksums that don't
> > match and a whole lot of uncertainty.
> 
> Then how do you tell what version a file is if it's outside of a
> checkout?

That's what all the fun is about ;-)  Some would say that "labelling" the
file is the job of the release processes.  Others say it's the job of
the SCM system...

Of course I just sit on the fence because in the work I have to do, I'm
quite happy that nothing is "outside of a checkout".  GIT is good enough
that I have it everywhere.  I realise not everyone's that lucky..

;-)

a.

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: Hacking on PostgreSQL via GIT

From

Aidan Van Dyk

Date:

18 April 2007, 16:06:24

* Aidan Van Dyk <aidan@highrise.ca> [070418 15:03]:

> > Then how do you tell what version a file is if it's outside of a
> > checkout?
> 
> That's what all the fun is about ;-)  Some would say that "labelling" the
> file is the job of the release processes.  Others say it's the job of
> the SCM system...

Noting that if you take something "outside of a checkout" means you've
"released" it from the VCS...

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: Hacking on PostgreSQL via GIT

From

Alvaro Herrera

Date:

18 April 2007, 16:23:19

Aidan Van Dyk wrote:
> * Aidan Van Dyk <aidan@highrise.ca> [070418 15:03]:
> 
> > > Then how do you tell what version a file is if it's outside of a
> > > checkout?
> > 
> > That's what all the fun is about ;-)  Some would say that "labelling" the
> > file is the job of the release processes.  Others say it's the job of
> > the SCM system...
> 
> Noting that if you take something "outside of a checkout" means you've
> "released" it from the VCS...

Which is not always what happens in reality.  Consider for example that
we borrowed some files from NetBSD, OpenBSD, Tcl, zic and others.  It
would be nice to know exactly at what point we borrowed the file, so we
can go to the upstream repo and check if there's any bug fix that we
should also apply to our local copy.  And we _also_ modify locally the
file of course, so just digesting the file we have to get a SHA1 (or
whatever) identifier is not an option.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Hacking on PostgreSQL via GIT

From

"Jim C. Nasby"

Date:

18 April 2007, 21:41:14

On Thu, Apr 19, 2007 at 10:07:08AM +1200, Martin Langhoff wrote:
> Jim C. Nasby wrote:
> > Then how do you tell what version a file is if it's outside of a
> > checkout?
> 
> It's trivial for git to answer that - the file will either be pristine,
> and then we can just scan for the matching SHA1, or modified, and we can
> scan (taking a weee bit more time) which are the "closest matches" in
> your history, in what branches and commits.
> 
> The actual scripting for this isn't written just yet -- Linus posted a
> proof-of-concept shell implementation along the lines of
> 
>      git rev-list --no-merges --full-history v0.5..v0.7 --
> src/widget/widget.c > rev-list
> 
>        best_commit=none
>        best=1000000
>        while read commit
>        do
>                git cat-file blob "$commit:src/widget/widget.c" > tmpfile
>                lines=$(diff reference-file tmpfile | wc -l)
>                if [ "$lines" -lt "$best" ]
>                then
>                        echo Best so far: $commit $lines
>                        best=$lines
>                fi
>        done < rev-list
> 
> and it's fast. One of the good properties of this is that you can ask
> for a range of your history (v0.5 to v0.7 in the example) and an exact
> path (src/widget/widget.c) but you can also say --all (meaning "in all
> branches") and a handwavy "over there", like src. And git will take an
> extra second or two on a large repo, but tell you about all the good
> candidates across the branches.
> 
> Metadata is metadata, and we can fish it out of the SCM easily - and
> data is data, and it's silly to pollute it with metadata that is mostly
> incidental.
> 
> If I find time today I'll post to the git list a cleaned up version of
> Linus' shell script as
> 
>     git-findclosestmatch <head or range or --all> path/to/scan/ \
>                         randomfile.c

Not bad... took you 40 lines to answer my question. Let's see if I can
beat that...

> > Then how do you tell what version a file is if it's outside of a
> > checkout?

Answer: you look at the $Id$ (or in this case, $PostgreSQL$) tag.

Sorry, tried to get it to 2 lines, but couldn't. ;)

I understand the argument about metadata and all, and largely agree with
it. But on the other hand I think a version identifier is a critical
piece of information; it's just as critical as the file name when it
comes to identifying the information contained in the file.

Or does GIT not use filenames, either? :)
-- 
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: Hacking on PostgreSQL via GIT

From

Markus Schiltknecht

Date:

19 April 2007, 08:22:35

Hi,

Alvaro Herrera wrote:
> Which is not always what happens in reality.  Consider for example that
> we borrowed some files from NetBSD, OpenBSD, Tcl, zic and others.  It
> would be nice to know exactly at what point we borrowed the file, so we
> can go to the upstream repo and check if there's any bug fix that we
> should also apply to our local copy.  And we _also_ modify locally the
> file of course, so just digesting the file we have to get a SHA1 (or
> whatever) identifier is not an option.

I consider such information (i.e. 'where is this file coming from') to 
be historical information. As such, this information clearly belongs to 
the VCS sphere and should be tracked and presented by the VCS.

Advanced VCSes can import files from other projects and properly track 
those files or propagate on request. Even subversion can do that to some 
extent. My point here is: given a decent VCS, you don't need such 
historical information as often as you do with CVS. You can sit back and 
let the VCS do the job. (Like looking up, when the last 'import' of the 
file from the external project happened, what changed and merge those 
changes back into your (locally modified variant of the) file.) And if 
you really want to dig in the history of your project, you can ask the 
VCS, which you are going to need anyway for other historic information.

Regards

Markus

Re: Hacking on PostgreSQL via GIT

From

Markus Schiltknecht

Date:

19 April 2007, 08:31:45

Hi

Jim C. Nasby wrote:
> I understand the argument about metadata and all, and largely agree with
> it. But on the other hand I think a version identifier is a critical
> piece of information; it's just as critical as the file name when it
> comes to identifying the information contained in the file.

If you really want the files in your releases to carry a version 
identifier, you should let your release process handle that. But often 
enough, people can't even tell the exact PostgreSQL version they are 
running. How do you expect them to be able to tell you what version a 
single file has?

For the developers: they have all the history the VCS offers them. There 
are tags to associate a release with a revision in your repository. And 
because a decent VCS can handle all the diff'ing, patching and merging 
you normally need, you shouldn't ever have to process files outside of 
your repository.

So what exactly is the purpose of a version identifier within the file's 
contents? For whom could such a thing be good for?

Regards

Markus

Re: Hacking on PostgreSQL via GIT

From

Martin Langhoff

Date:

19 April 2007, 12:18:46

Jim C. Nasby wrote:
> Not bad... took you 40 lines to answer my question. Let's see if I can
> beat that...

Sure - it'll be 1 line when it's wrapped in a shell script. And then
we'll be even.

> I understand the argument about metadata and all, and largely agree with
> it. But on the other hand I think a version identifier is a critical
> piece of information; it's just as critical as the file name when it
> comes to identifying the information contained in the file.

Surely. It is important, but it's metadata and belongs elsewhere. That
metadata _is_ important doesn't mean you corrupt _data_ with it.

Just imagine that MySQL users were used to getting their SQL engine
expand $Oid$ $Tablename$ $PrimayKey$ in TEXT fields. And that when
INSERT/UPDATEing those were collapsed. And in comparisons too. Wouldn't
you say "that's metadata, can be queried in a thousand ways, does not
belong in the middle of the data"?

And the _really_ interesting version identifier is usually the "commit"identifier, which gives you a SHA1 of the whole
srcdirectory and the

history. Projects that use git usually include that SHA1 in their build
script, so even if a user compiles off a daily snapshot or a checkout on
a random branch of your SCM, you can just ask them "what's the build
identifier?" and they'll give you a SHA1.

Actually, git can spit a nicer build identifier that includes the latest
tag, so if you see the identifier being
 v8.2.<sha1>

You know it's not 8.2 "release" but a commit soon after it, identified
by that SHA1. GIT uses that during its build to insert the version
identifier, so:
  $ git --version  git version 1.5.1.gf8ce

With that in your hand, you can say
  # show me what commits on top of the tagged 1.5.1 have I got:  $ git log 1.5.1..gf8ce
  # file src/lib/foo.c at this exact commit   git show gf8ce:src/lib/foo.c

So if you use this identifier (just call `git version`) to
 - name your tarballs - create a "build-id" file at tarball creation time - tag your builds with a version id

And then when you have code out there in the wild, and people report
bugs or send you patches, there's a good identifier you can ask for that
covers _all_ the files.

If it happens that someone reports a bug and says they have 8.2.gg998
and you don't seem to have any gg998 commit after 8.2, you can say with
confidence: you are running some a patched Pg - please repro with a
pristine copy (or show us your code!) :-)

cheers,

m
-- 
-----------------------------------------------------------------------
Martin @ Catalyst .Net .NZ  Ltd, PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/           PHYS: Level 2, 150-154 Willis St
OFFICE: +64(4)916-7224  UK: 0845 868 5733 ext 7224  MOB: +64(21)364-017     Make things as simple as possible, but no
simpler- Einstein

-----------------------------------------------------------------------

Re: Hacking on PostgreSQL via GIT

From

Martin Langhoff

Date:

19 April 2007, 12:18:52

Jim C. Nasby wrote:
> Then how do you tell what version a file is if it's outside of a
> checkout?

It's trivial for git to answer that - the file will either be pristine,
and then we can just scan for the matching SHA1, or modified, and we can
scan (taking a weee bit more time) which are the "closest matches" in
your history, in what branches and commits.

The actual scripting for this isn't written just yet -- Linus posted a
proof-of-concept shell implementation along the lines of
    git rev-list --no-merges --full-history v0.5..v0.7 --
src/widget/widget.c > rev-list
      best_commit=none      best=1000000      while read commit      do              git cat-file blob
"$commit:src/widget/widget.c"> tmpfile              lines=$(diff reference-file tmpfile | wc -l)              if [
"$lines"-lt "$best" ]              then                      echo Best so far: $commit $lines
best=$lines             fi      done < rev-list

and it's fast. One of the good properties of this is that you can ask
for a range of your history (v0.5 to v0.7 in the example) and an exact
path (src/widget/widget.c) but you can also say --all (meaning "in all
branches") and a handwavy "over there", like src. And git will take an
extra second or two on a large repo, but tell you about all the good
candidates across the branches.

Metadata is metadata, and we can fish it out of the SCM easily - and
data is data, and it's silly to pollute it with metadata that is mostly
incidental.

If I find time today I'll post to the git list a cleaned up version of
Linus' shell script as
   git-findclosestmatch <head or range or --all> path/to/scan/ \                       randomfile.c

cheers,

m
-- 
-----------------------------------------------------------------------
Martin @ Catalyst .Net .NZ  Ltd, PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/           PHYS: Level 2, 150-154 Willis St
OFFICE: +64(4)916-7224  UK: 0845 868 5733 ext 7224  MOB: +64(21)364-017     Make things as simple as possible, but no
simpler- Einstein

-----------------------------------------------------------------------

Re: Hacking on PostgreSQL via GIT

From

"Florian G. Pflug"

Date:

30 April 2007, 09:58:26

Martin Langhoff wrote:
> So - if you are committed to providing your gateway long term to
> Florian, I'm happy to drop my gateway in favour of yours.

> (Florian, before basing your code on either you should get a checkout of
> Aidan's and mine and check that the tips of the branches you are working
> on match the cvs branches -- the cvsimport code is good but whereever
> CVS is involved, there's a lot of interpretation at play, a sanity check
> is always good).

Sorry for responding so late - I was rather busy during the last 1 1/2 weeks
with university stuff, and had only very little time to spend on SoC.

I've tried to switch my repo to both git mirrors, but there seems to be 
something strange happening. The checkout pulls a _lot_ of objects (
a few hunder thousands), and then takes ages to unpack them all, bloating
my local repository (Just rm-ing my local repo takes a few minutes after
the checkout).

It seems as if git pulls all revisions of all files during the pull -
which it shouldn't do as far as I understand things - it should only
pull those objects referenced by some head, no?

The interesting thing is that exactly the same problem occurs with
both if your mirrors...

Any ideas? Or is this just how things are supposed to work?

greetings, Florian Pflug

Re: Hacking on PostgreSQL via GIT

From

Aidan Van Dyk

Date:

30 April 2007, 10:11:11

* Florian G. Pflug <fgp@phlo.org> [070430 08:58]:
> It seems as if git pulls all revisions of all files during the pull -
> which it shouldn't do as far as I understand things - it should only
> pull those objects referenced by some head, no?

Git pulls full history to a common ancestor on the clone/pull.   So the
first pull on a repo *will* necessarily pull in the full object history.
So unless you have a recent common ancestor, it will pull lots.  Note
that because git uses crypto hashes to identify objects, my conversion
and Martin's probably do not have a recent common ancestor (because my
header munging probably doesn't match Martin's exactly).

> The interesting thing is that exactly the same problem occurs with
> both if your mirrors...
> 
> Any ideas? Or is this just how things are supposed to work?

Until you have a local repository of it, you'll need to go through the
full pull/clone.  If you're really not interested in history you can
"truncate" history with the --depth option to git clone.  That will give
you a "shallow repository", which you can use, develop, branch, etc in,
but won't give you all the history locally.

Also - what version of GIT are you using?  I *really* recommend using at
least 1.5 (1.5.2.X is current stable).  Please, do your self a favour,
and don't use 1.4.4.

a.

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: Hacking on PostgreSQL via GIT

From

"Florian G. Pflug"

Date:

30 April 2007, 10:18:50

Aidan Van Dyk wrote:
> * Florian G. Pflug <fgp@phlo.org> [070430 08:58]:
>  
>> It seems as if git pulls all revisions of all files during the pull -
>> which it shouldn't do as far as I understand things - it should only
>> pull those objects referenced by some head, no?
> 
> Git pulls full history to a common ancestor on the clone/pull.   So the
> first pull on a repo *will* necessarily pull in the full object history.
> So unless you have a recent common ancestor, it will pull lots.  Note
> that because git uses crypto hashes to identify objects, my conversion
> and Martin's probably do not have a recent common ancestor (because my
> header munging probably doesn't match Martin's exactly).
Ah, OK - that explains things.

>> The interesting thing is that exactly the same problem occurs with
>> both if your mirrors...
>>
>> Any ideas? Or is this just how things are supposed to work?
> 
> Until you have a local repository of it, you'll need to go through the
> full pull/clone.  If you're really not interested in history you can
> "truncate" history with the --depth option to git clone.  That will give
> you a "shallow repository", which you can use, develop, branch, etc in,
> but won't give you all the history locally.
I'll retry with the "--depth" option - I'm doing development on my powerbook,
and OSX seems to cope badly with lots of little files - the initial unpacking
took hours - literally..

> Also - what version of GIT are you using?  I *really* recommend using at
> least 1.5 (1.5.2.X is current stable).  Please, do your self a favour,
> and don't use 1.4.4.
I'm using 1.5.0  currently - it was the latest stable release when I began
to experiment with git.

greetings, Florian Pflug