Thread: Git out of sync vs. CVS

Git out of sync vs. CVS

From
Peter Eisentraut
Date:
Maybe I'm hallucinating and someone could check this in their
environment, but it appears to me that the Git repository is missing
parts of two non-recent commits.  See attached patch.

Attachment

Re: Git out of sync vs. CVS

From
Magnus Hagander
Date:
2010/1/17 Peter Eisentraut <peter_e@gmx.net>:
> Maybe I'm hallucinating and someone could check this in their
> environment, but it appears to me that the Git repository is missing
> parts of two non-recent commits.  See attached patch.

Not having looked at the repo in detail, but I bet this happened
because the git mirror grabbed it's snapshot in the middle of a cvs
commit with multiple files. Since cvs doesn't have atomic commits, I
think that kind of thing can happen. Does that seem possible wrt these
commits specifically?

I don't really know how to fix that. It's kind of hard to do
transaction safe replication from a system without transactions ;)

As for fixing it, I guess we can try the
rewind-to-commit-before-this-and-rerun. That'll break people who have
branched after, but last time it seemed that most peoples git clients
would clean that up automatically. Which commits are these exactly?

-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: Git out of sync vs. CVS

From
Peter Eisentraut
Date:
On sön, 2010-01-17 at 20:50 +0100, Magnus Hagander wrote:
> As for fixing it, I guess we can try the
> rewind-to-commit-before-this-and-rerun. That'll break people who have
> branched after, but last time it seemed that most peoples git clients
> would clean that up automatically. Which commits are these exactly?

These two belong together:

http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/contrib/start-scripts/freebsd?rev=1.5
http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/contrib/start-scripts/osx/PostgreSQL?rev=1.4

And this is a separate one:

http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/config/python.m4?rev=1.17




Re: Git out of sync vs. CVS

From
Magnus Hagander
Date:
2010/1/17 Peter Eisentraut <peter_e@gmx.net>:
> On sön, 2010-01-17 at 20:50 +0100, Magnus Hagander wrote:
>> As for fixing it, I guess we can try the
>> rewind-to-commit-before-this-and-rerun. That'll break people who have
>> branched after, but last time it seemed that most peoples git clients
>> would clean that up automatically. Which commits are these exactly?
>
> These two belong together:
>
> http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/contrib/start-scripts/freebsd?rev=1.5
> http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/contrib/start-scripts/osx/PostgreSQL?rev=1.4
>
> And this is a separate one:
>
> http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/config/python.m4?rev=1.17

Well, if we're going to roll something back in git, it's the git
comits that are interesting... To figure out how far back in time to
go.

-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: Git out of sync vs. CVS

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> 2010/1/17 Peter Eisentraut <peter_e@gmx.net>:
>> Maybe I'm hallucinating and someone could check this in their
>> environment, but it appears to me that the Git repository is missing
>> parts of two non-recent commits. �See attached patch.

> Not having looked at the repo in detail, but I bet this happened
> because the git mirror grabbed it's snapshot in the middle of a cvs
> commit with multiple files. Since cvs doesn't have atomic commits, I
> think that kind of thing can happen.

That would explain a single CVS commit appearing as two separate commits
in the git history; but it hardly seems like an acceptable excuse for
missing changes altogether, which is what I think Peter said he saw.
        regards, tom lane


Re: Git out of sync vs. CVS

From
Magnus Hagander
Date:
2010/1/17 Tom Lane <tgl@sss.pgh.pa.us>:
> Magnus Hagander <magnus@hagander.net> writes:
>> 2010/1/17 Peter Eisentraut <peter_e@gmx.net>:
>>> Maybe I'm hallucinating and someone could check this in their
>>> environment, but it appears to me that the Git repository is missing
>>> parts of two non-recent commits.  See attached patch.
>
>> Not having looked at the repo in detail, but I bet this happened
>> because the git mirror grabbed it's snapshot in the middle of a cvs
>> commit with multiple files. Since cvs doesn't have atomic commits, I
>> think that kind of thing can happen.
>
> That would explain a single CVS commit appearing as two separate commits
> in the git history; but it hardly seems like an acceptable excuse for
> missing changes altogether, which is what I think Peter said he saw.

It's likely the combination of that, and the cvs to git sync script
not considering that this can happen. So when it does the second pass
(once it's all been synced) it detects it as a single commit, and
doesn't re-import it.

We've seen this happen before.


-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: Git out of sync vs. CVS

From
"Kevin Grittner"
Date:
Magnus Hagander  wrote:
>>>> the Git repository is missing parts of two non-recent commits.
> We've seen this happen before.
That seems like kind of a blasé attitude toward something upon which
some people rely.
When we (at Wisconsin State Courts) were using CVS and had scripts to
automatically merge changes from one branch to another, we saw this
sort of thing unless people were very careful to grab a timestamp in
the past for their ranges and use it throughout the script.  Perhaps
the script is just not careful enough?  (Said in total ignorance of
what the PostgreSQL process here actually is....)
-Kevin


Re: Git out of sync vs. CVS

From
Magnus Hagander
Date:
On Mon, Jan 18, 2010 at 01:53, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Magnus Hagander  wrote:
>
>>>>> the Git repository is missing parts of two non-recent commits.
>
>> We've seen this happen before.
>
> That seems like kind of a blasé attitude toward something upon which
> some people rely.

For the record, I am one of those people. I use it for *all* my
postgresql development. And this is a serious pain.

It has been brought up before. Nobody has come up with a completely
safe way to do it, because CVS simply doesn't have the capabilities
required.

And yes, it is annoying to have to deal with the issues with CVS at
the same time as people keep saying CVS is perfectly fine. It's not.
It's just that we are doing our best to work around the issues in it,
and sometimes that leads to these issues.


> When we (at Wisconsin State Courts) were using CVS and had scripts to
> automatically merge changes from one branch to another, we saw this
> sort of thing unless people were very careful to grab a timestamp in
> the past for their ranges and use it throughout the script.  Perhaps
> the script is just not careful enough?  (Said in total ignorance of
> what the PostgreSQL process here actually is....)

That would be one way. However, AFAIK the tool we use (fromcvs)
doesn't support this. If somebody were to extend the tool with that,
it would be much appreciated. It's a Ruby tool though, so there's not
a thing I can do about it myself... And it's basically undocumented.

But yes, if we do that and set the timestamp far enough back in time,
that should make it "reasonably safe". Given how long some operations
can take ((C) year change, release tagging IIRC, stuff like that),
this has to be a fairly large number, which means the git mirror will
lack even further behind. But if that's what we have to pay to make it
safe, I guess we should... The time would have to be long enough to
cover any cvs commit including potential network slowness during it
etc.


-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: Git out of sync vs. CVS

From
Robert Haas
Date:
On Tue, Jan 19, 2010 at 10:44 AM, Magnus Hagander <magnus@hagander.net> wrote:
> On Mon, Jan 18, 2010 at 01:53, Kevin Grittner
> <Kevin.Grittner@wicourts.gov> wrote:
>> Magnus Hagander  wrote:
>>
>>>>>> the Git repository is missing parts of two non-recent commits.
>>
>>> We've seen this happen before.
>>
>> That seems like kind of a blasé attitude toward something upon which
>> some people rely.
>
> For the record, I am one of those people. I use it for *all* my
> postgresql development. And this is a serious pain.

FWIW, I am in favor of rewinding and making everyone rebase, but I
think we should do it ASAP.

...Robert


Re: Git out of sync vs. CVS

From
Magnus Hagander
Date:
On Tuesday, January 19, 2010, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Jan 19, 2010 at 10:44 AM, Magnus Hagander <magnus@hagander.net> wrote:
>> On Mon, Jan 18, 2010 at 01:53, Kevin Grittner
>> <Kevin.Grittner@wicourts.gov> wrote:
>>> Magnus Hagander  wrote:
>>>
>>>>>>> the Git repository is missing parts of two non-recent commits.
>>>
>>>> We've seen this happen before.
>>>
>>> That seems like kind of a blasé attitude toward something upon which
>>> some people rely.
>>
>> For the record, I am one of those people. I use it for *all* my
>> postgresql development. And this is a serious pain.
>
> FWIW, I am in favor of rewinding and making everyone rebase, but I
> think we should do it ASAP.

Got time to figure out exactly how far to rewind?

/Magnus


-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: Git out of sync vs. CVS

From
Aidan Van Dyk
Date:
* Magnus Hagander <magnus@hagander.net> [100119 10:44]:
> > When we (at Wisconsin State Courts) were using CVS and had scripts to
> > automatically merge changes from one branch to another, we saw this
> > sort of thing unless people were very careful to grab a timestamp in
> > the past for their ranges and use it throughout the script.  Perhaps
> > the script is just not careful enough?  (Said in total ignorance of
> > what the PostgreSQL process here actually is....)
>
> That would be one way. However, AFAIK the tool we use (fromcvs)
> doesn't support this. If somebody were to extend the tool with that,
> it would be much appreciated. It's a Ruby tool though, so there's not
> a thing I can do about it myself... And it's basically undocumented.
>
> But yes, if we do that and set the timestamp far enough back in time,
> that should make it "reasonably safe". Given how long some operations
> can take ((C) year change, release tagging IIRC, stuff like that),
> this has to be a fairly large number, which means the git mirror will
> lack even further behind. But if that's what we have to pay to make it
> safe, I guess we should... The time would have to be long enough to
> cover any cvs commit including potential network slowness during it
> etc.

Well, when I was running my conversion, I took a "cheap" way, I just
rsynced twice (with a delay, I don't remember how long I decided was long
enough) and made sure the 2nd rsync didn't do anything, before I let
fromcvs at the copy of CVSROOT.

Sure, it's not perfect either, I based that on the hope that no "single
CVS commit" would have a period of $X of inactivity on the CVSROOT.

Of course, that could all be useless (for my PG conversion) if the PG
CVSROOT that was an unstable point-in-time copy of the real CVSROOT, but
I was rsyncing CVSROOT of other projects too, so I needed it for my own
conversions...

a.

--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: Git out of sync vs. CVS

From
"Kevin Grittner"
Date:
Magnus Hagander <magnus@hagander.net> wrote:
> Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
>> Magnus Hagander  wrote:
>>
>>>>>> the Git repository is missing parts of two non-recent
>>>>>> commits.
>>
>>> We've seen this happen before.
>>
>> That seems like kind of a blasé attitude toward something upon
>> which some people rely.
> 
> For the record, I am one of those people. I use it for *all* my
> postgresql development. And this is a serious pain.
It appears I took your comment the wrong way.  Apologies.
>> When we (at Wisconsin State Courts) were using CVS and had
>> scripts to automatically merge changes from one branch to
>> another, we saw this sort of thing unless people were very
>> careful to grab a timestamp in the past for their ranges and use
>> it throughout the script. Perhaps the script is just not careful
>> enough?  (Said in total ignorance of what the PostgreSQL process
>> here actually is....)
> 
> That would be one way. However, AFAIK the tool we use (fromcvs)
> doesn't support this. If somebody were to extend the tool with
> that, it would be much appreciated. It's a Ruby tool though, so
> there's not a thing I can do about it myself... And it's basically
> undocumented.
> 
> But yes, if we do that and set the timestamp far enough back in
> time, that should make it "reasonably safe". Given how long some
> operations can take ((C) year change, release tagging IIRC, stuff
> like that), this has to be a fairly large number, which means the
> git mirror will lack even further behind. But if that's what we
> have to pay to make it safe, I guess we should... The time would
> have to be long enough to cover any cvs commit including potential
> network slowness during it etc.
My Ruby skills are minimal, but we've got some Ruby gurus around
here -- maybe between my rough skills and a few impositions on the
others I could wrangle something.  Is there any particular version I
should be looking at?  The last official version I can find is
0.0.0.132 from May 3, 2009.
Although, if there's not some reasonably obvious fix (like
subtracting some fixed amount of time from a timestamp they're
already grabbing), perhaps we should just plan on limping along
until we can convert to git.
Oh, and what sort of delay do you feel would be "long enough to
cover any cvs commit including potential network slowness during it
etc."?
-Kevin


Re: Git out of sync vs. CVS

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Oh, and what sort of delay do you feel would be "long enough to
> cover any cvs commit including potential network slowness during it
> etc."?

Why should the script make any assumptions about delay at all?
It seems to me that the problem comes from failing to check for
changed files, no more and no less.  It would be much less of an
issue if a non-atomic CVS commit showed up as two separate GIT
commits with similar log messages.
        regards, tom lane


Re: Git out of sync vs. CVS

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> Oh, and what sort of delay do you feel would be "long enough to
>> cover any cvs commit including potential network slowness during
>> it etc."?
> 
> Why should the script make any assumptions about delay at all?
> It seems to me that the problem comes from failing to check for
> changed files, no more and no less.  It would be much less of an
> issue if a non-atomic CVS commit showed up as two separate GIT
> commits with similar log messages.
I was trying to be accommodating; if Magnus's take on this isn't a
consensus, I'll put forward in a little more detail what I had in
mind.
What we did with our scripts was to grab the current time *from the
CVS server* (since not all clocks are necessarily set accurately)
and using that as the end of a time range.  The end of the previous
time range was recorded on successful completion; we would us that
as the start of a time range.  Done carefully, that allows no
commits to be missed.  The only way something could be done twice
would be for the process to die after it had pushed through some
changes and before it reached completion and saved the time.
Now, I haven't looked at the fromcvs code yet to know how easy or
hard it would be to use this logic within that package, so this is
still pretty hand-wavy.
-Kevin


Re: Git out of sync vs. CVS

From
Aidan Van Dyk
Date:
* Tom Lane <tgl@sss.pgh.pa.us> [100119 11:47]:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> > Oh, and what sort of delay do you feel would be "long enough to
> > cover any cvs commit including potential network slowness during it
> > etc."?
> 
> Why should the script make any assumptions about delay at all?
> It seems to me that the problem comes from failing to check for
> changed files, no more and no less.  It would be much less of an
> issue if a non-atomic CVS commit showed up as two separate GIT
> commits with similar log messages.

Well, I guess you could say:
 "fromcvs should go back and recheck all the previous work it's done, and double check and make sure no new files have
changedfor the timestamp/log message pair it's already done, because CVS isn't atomic"
 

But, I think that path leads to craziness... I mean, how far back?  CVS
is "non-attomic" enough that 2 (well, $N) people can commit separate
stuff, all with overlapping time stamps, and they can even commit stuff
in the "past" of they really want...

But, all I have to say is it's not perfect, pretty good, just deal with the
things as they come, after all, it's "CVS"

;-)

If you want better than "pretty good", drop CVS, do a one-time
conversion (a la parsecvs/cvs2git) and get on with life...  As long as
CVS is the tool of choice, pretty good is really good...

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: Git out of sync vs. CVS

From
"Kevin Grittner"
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
> I haven't looked at the fromcvs code yet to know how easy or
> hard it would be to use this logic within that package
Well, now I have looked.  It's about 2,000 lines of pretty dense
Ruby code (not as many comments as one would hope, especially since
there appears to be *no* other documentation of any sort).  On a
quick scan, they seem to be *trying* to do what I suggested, which
means that some sort of fix could probably be worked out, but that
the issue could be subtle enough that it could be hard to find.
Perhaps it is as simple, though, as using the client's time instead
of the CVS server's time -- that's one of the things I've seen cause
problems for this sort of thing using CVS before.  I haven't spotted
where they're getting the time.
Is there anyone fluent in Ruby who wants to look at this and see how
they're getting it?
http://ww2.fs.ei.tum.de/~corecode/hg/fromcvs/log/132
By the way, is anyone working on fixing up the current problem? 
I've been talking about trying to prevent recurrences, but that's
not gonna help get the current problem solved....
-Kevin


Re: Git out of sync vs. CVS

From
"Kevin Grittner"
Date:
I wrote:
> Perhaps it is as simple, though, as using the client's time
> instead of the CVS server's time -- that's one of the things I've
> seen cause problems for this sort of thing using CVS before.
I got a brief consult with a Ruby programmer here under the "if it's
less than ten minutes you don't have to schedule it through a
manager" rule.  From what we can see, fromcvs scans for all entries
*after* a "previous run" time, but it isn't setting an upper bound
on time during the scan.  I haven't found where it saves the time
for the lower limit of the next run, but I rather suspect that it
grabs the current time near the end of the scan.  If this is an
accurate assessment, to avoid a window for lost commits, we'd have
to fix a time before we started the scan to use as the upper bound
for CVS commits to handle, and use it for the "previous run" time.
There's still the possible issue of *whose* clock we're using for
this.
Reality check: does the frequency of lost CVS commits within git
seem consistent with this theory?
-Kevin


Re: Git out of sync vs. CVS

From
Magnus Hagander
Date:
On Tue, Jan 19, 2010 at 16:59, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Jan 19, 2010 at 10:44 AM, Magnus Hagander <magnus@hagander.net> wrote:
>> On Mon, Jan 18, 2010 at 01:53, Kevin Grittner
>> <Kevin.Grittner@wicourts.gov> wrote:
>>> Magnus Hagander  wrote:
>>>
>>>>>>> the Git repository is missing parts of two non-recent commits.
>>>
>>>> We've seen this happen before.
>>>
>>> That seems like kind of a blasé attitude toward something upon which
>>> some people rely.
>>
>> For the record, I am one of those people. I use it for *all* my
>> postgresql development. And this is a serious pain.
>
> FWIW, I am in favor of rewinding and making everyone rebase, but I
> think we should do it ASAP.

Ok, I started looking at this.

First, it's not at all clear to me what Peter means wiht his comments.
But it happens to be that one of the commits he's referring to is all
the way back in August. So we'd have to rewind it all that way. Do we
really want to do that, or do we want to do a manual commit on the
repository bringing it back in sync instead? (either by knowing what's
wrong with those commits, or do a complete diff of cvs head vs git
head)


-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: Git out of sync vs. CVS

From
Magnus Hagander
Date:
On Wed, Jan 20, 2010 at 09:52, Magnus Hagander <magnus@hagander.net> wrote:
> On Tue, Jan 19, 2010 at 16:59, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Tue, Jan 19, 2010 at 10:44 AM, Magnus Hagander <magnus@hagander.net> wrote:
>>> On Mon, Jan 18, 2010 at 01:53, Kevin Grittner
>>> <Kevin.Grittner@wicourts.gov> wrote:
>>>> Magnus Hagander  wrote:
>>>>
>>>>>>>> the Git repository is missing parts of two non-recent commits.
>>>>
>>>>> We've seen this happen before.
>>>>
>>>> That seems like kind of a blasé attitude toward something upon which
>>>> some people rely.
>>>
>>> For the record, I am one of those people. I use it for *all* my
>>> postgresql development. And this is a serious pain.
>>
>> FWIW, I am in favor of rewinding and making everyone rebase, but I
>> think we should do it ASAP.
>
> Ok, I started looking at this.
>
> First, it's not at all clear to me what Peter means wiht his comments.
> But it happens to be that one of the commits he's referring to is all
> the way back in August. So we'd have to rewind it all that way. Do we
> really want to do that, or do we want to do a manual commit on the
> repository bringing it back in sync instead? (either by knowing what's
> wrong with those commits, or do a complete diff of cvs head vs git
> head)

Actually, such a correction patch would be nice and short. Attached
for reference. Thoughts?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Attachment

Re: Git out of sync vs. CVS

From
Heikki Linnakangas
Date:
Magnus Hagander wrote:
> On Wed, Jan 20, 2010 at 09:52, Magnus Hagander <magnus@hagander.net> wrote:
>> On Tue, Jan 19, 2010 at 16:59, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Tue, Jan 19, 2010 at 10:44 AM, Magnus Hagander <magnus@hagander.net> wrote:
>>>> On Mon, Jan 18, 2010 at 01:53, Kevin Grittner
>>>> <Kevin.Grittner@wicourts.gov> wrote:
>>>>> Magnus Hagander  wrote:
>>>>>
>>>>>>>>> the Git repository is missing parts of two non-recent commits.
>>>>>> We've seen this happen before.
>>>>> That seems like kind of a blasé attitude toward something upon which
>>>>> some people rely.
>>>> For the record, I am one of those people. I use it for *all* my
>>>> postgresql development. And this is a serious pain.
>>> FWIW, I am in favor of rewinding and making everyone rebase, but I
>>> think we should do it ASAP.
>> Ok, I started looking at this.
>>
>> First, it's not at all clear to me what Peter means wiht his comments.
>> But it happens to be that one of the commits he's referring to is all
>> the way back in August. So we'd have to rewind it all that way. Do we
>> really want to do that, or do we want to do a manual commit on the
>> repository bringing it back in sync instead? (either by knowing what's
>> wrong with those commits, or do a complete diff of cvs head vs git
>> head)
> 
> Actually, such a correction patch would be nice and short. Attached
> for reference. Thoughts?

That seems better than rewinding the history all the way back to August.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Git out of sync vs. CVS

From
Robert Haas
Date:
On Wed, Jan 20, 2010 at 4:27 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Magnus Hagander wrote:
>> On Wed, Jan 20, 2010 at 09:52, Magnus Hagander <magnus@hagander.net> wrote:
>>> On Tue, Jan 19, 2010 at 16:59, Robert Haas <robertmhaas@gmail.com> wrote:
>>>> On Tue, Jan 19, 2010 at 10:44 AM, Magnus Hagander <magnus@hagander.net> wrote:
>>>>> On Mon, Jan 18, 2010 at 01:53, Kevin Grittner
>>>>> <Kevin.Grittner@wicourts.gov> wrote:
>>>>>> Magnus Hagander  wrote:
>>>>>>
>>>>>>>>>> the Git repository is missing parts of two non-recent commits.
>>>>>>> We've seen this happen before.
>>>>>> That seems like kind of a blasé attitude toward something upon which
>>>>>> some people rely.
>>>>> For the record, I am one of those people. I use it for *all* my
>>>>> postgresql development. And this is a serious pain.
>>>> FWIW, I am in favor of rewinding and making everyone rebase, but I
>>>> think we should do it ASAP.
>>> Ok, I started looking at this.
>>>
>>> First, it's not at all clear to me what Peter means wiht his comments.
>>> But it happens to be that one of the commits he's referring to is all
>>> the way back in August. So we'd have to rewind it all that way. Do we
>>> really want to do that, or do we want to do a manual commit on the
>>> repository bringing it back in sync instead? (either by knowing what's
>>> wrong with those commits, or do a complete diff of cvs head vs git
>>> head)
>>
>> Actually, such a correction patch would be nice and short. Attached
>> for reference. Thoughts?
>
> That seems better than rewinding the history all the way back to August.

It seems pretty horrible to me.  That means we'll have a range of
times 5 months long for which the git repository doesn't match CVS.

Admittedly, I understand that this is going to be extremely painful
for anyone who (like Heikki) has to manage a substantial private
branch.

I haven't been in a hurry to see us move to git because the git mirror
is, for most purposes, just as good.  But if the git mirror is going
to start sucking, then I'm in a hurry.  The way I used to work before
I learned git seems laughable now, and I do NOT want to go back.

...Robert


Re: Git out of sync vs. CVS

From
Heikki Linnakangas
Date:
Robert Haas wrote:
> On Wed, Jan 20, 2010 at 4:27 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> Magnus Hagander wrote:
>>> Actually, such a correction patch would be nice and short. Attached
>>> for reference. Thoughts?
>> That seems better than rewinding the history all the way back to August.
> 
> It seems pretty horrible to me.  That means we'll have a range of
> times 5 months long for which the git repository doesn't match CVS.
> 
> Admittedly, I understand that this is going to be extremely painful
> for anyone who (like Heikki) has to manage a substantial private
> branch.

I won't object to rewinding, it should be fairly painless to rebase.

> I haven't been in a hurry to see us move to git because the git mirror
> is, for most purposes, just as good.  But if the git mirror is going
> to start sucking, then I'm in a hurry.  The way I used to work before
> I learned git seems laughable now, and I do NOT want to go back.

My feelings exactly. I'm not in a hurry to switch because the mirror is
good enough for me. But if *I* have to spend time fixing the mirror
every few weeks, I'm not happy. Magnus has been kind enough to handle
the last mirror troubles, but I believe hë́ shares the feeling.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Git out of sync vs. CVS

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Magnus Hagander wrote:
>> Actually, such a correction patch would be nice and short. Attached
>> for reference. Thoughts?

> That seems better than rewinding the history all the way back to August.

+1 ... I'm just an interested observer not a user of the git repository,
but this approach seems far less work for everyone concerned.
        regards, tom lane


Re: Git out of sync vs. CVS

From
Magnus Hagander
Date:
On Wed, Jan 20, 2010 at 15:36, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Jan 20, 2010 at 4:27 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> Magnus Hagander wrote:
>>> On Wed, Jan 20, 2010 at 09:52, Magnus Hagander <magnus@hagander.net> wrote:
>>>> On Tue, Jan 19, 2010 at 16:59, Robert Haas <robertmhaas@gmail.com> wrote:
>>>>> On Tue, Jan 19, 2010 at 10:44 AM, Magnus Hagander <magnus@hagander.net> wrote:
>>>>>> On Mon, Jan 18, 2010 at 01:53, Kevin Grittner
>>>>>> <Kevin.Grittner@wicourts.gov> wrote:
>>>>>>> Magnus Hagander  wrote:
>>>>>>>
>>>>>>>>>>> the Git repository is missing parts of two non-recent commits.
>>>>>>>> We've seen this happen before.
>>>>>>> That seems like kind of a blasé attitude toward something upon which
>>>>>>> some people rely.
>>>>>> For the record, I am one of those people. I use it for *all* my
>>>>>> postgresql development. And this is a serious pain.
>>>>> FWIW, I am in favor of rewinding and making everyone rebase, but I
>>>>> think we should do it ASAP.
>>>> Ok, I started looking at this.
>>>>
>>>> First, it's not at all clear to me what Peter means wiht his comments.
>>>> But it happens to be that one of the commits he's referring to is all
>>>> the way back in August. So we'd have to rewind it all that way. Do we
>>>> really want to do that, or do we want to do a manual commit on the
>>>> repository bringing it back in sync instead? (either by knowing what's
>>>> wrong with those commits, or do a complete diff of cvs head vs git
>>>> head)
>>>
>>> Actually, such a correction patch would be nice and short. Attached
>>> for reference. Thoughts?
>>
>> That seems better than rewinding the history all the way back to August.
>
> It seems pretty horrible to me.  That means we'll have a range of
> times 5 months long for which the git repository doesn't match CVS.

Yes.

But how bad is that really the way we do things now? It still works
perfectly fine for development against HEAD, which believe is what
most people are using it for at this point. (As long as somebody keeps
finding these things when they happen, that is)

I'm going to do the fixup for now. We can always rewind past that one
later if we have to, it's not like it's going to get any worse.


> Admittedly, I understand that this is going to be extremely painful
> for anyone who (like Heikki) has to manage a substantial private
> branch.

Well, git actually picks that up reasonably well these days, but it's
still a bit of a pain. Also, all the links people have posted will no
longer be valid, etc.


> I haven't been in a hurry to see us move to git because the git mirror
> is, for most purposes, just as good.  But if the git mirror is going
> to start sucking, then I'm in a hurry.  The way I used to work before
> I learned git seems laughable now, and I do NOT want to go back.

I can only agree with this. I would very much like to see that
discussion opened again - after we've released 9.0. But for that
reason, it'd be good if we could take care of the issues listed on the
wiki page before that happens :-)

-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: Git out of sync vs. CVS

From
Magnus Hagander
Date:
On Tue, Jan 19, 2010 at 21:07, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> I wrote:
>
>> Perhaps it is as simple, though, as using the client's time
>> instead of the CVS server's time -- that's one of the things I've
>> seen cause problems for this sort of thing using CVS before.
>
> I got a brief consult with a Ruby programmer here under the "if it's
> less than ten minutes you don't have to schedule it through a
> manager" rule.  From what we can see, fromcvs scans for all entries
> *after* a "previous run" time, but it isn't setting an upper bound
> on time during the scan.  I haven't found where it saves the time
> for the lower limit of the next run, but I rather suspect that it
> grabs the current time near the end of the scan.  If this is an
> accurate assessment, to avoid a window for lost commits, we'd have
> to fix a time before we started the scan to use as the upper bound
> for CVS commits to handle, and use it for the "previous run" time.
>
> There's still the possible issue of *whose* clock we're using for
> this.
>
> Reality check: does the frequency of lost CVS commits within git
> seem consistent with this theory?

Well, supposedly all our servers are synced with NTP. I know the main
cvs server is, and the git server is, but it goes past the anoncvs
server which is a hub.org server so I don't know for sure there - but
I think it is? So I don't think it's the machines-out-of-sync issue.
Or at least the window for that is *really* small.

-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: Git out of sync vs. CVS

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> On Tue, Jan 19, 2010 at 21:07, Kevin Grittner
>> Reality check: does the frequency of lost CVS commits within git
>> seem consistent with this theory?

> Well, supposedly all our servers are synced with NTP. I know the main
> cvs server is, and the git server is, but it goes past the anoncvs
> server which is a hub.org server so I don't know for sure there - but
> I think it is? So I don't think it's the machines-out-of-sync issue.
> Or at least the window for that is *really* small.

I have noticed that CVS operations (at least from the user's viewpoint)
work in local time.  So even if the clocks are synced, a different TZ
setting could conceivably lead to issues.
        regards, tom lane


Re: Git out of sync vs. CVS

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I have noticed that CVS operations (at least from the user's
> viewpoint) work in local time.  So even if the clocks are synced,
> a different TZ setting could conceivably lead to issues.
Hmmm...  If that were the issue I would think we'd've seen the
problem more often.  From reading over the Ruby code, it appears to
me that if a commit happens when fromcvs is scanning for recent
commits, and commit touches a part the scan has already passed, we'd
see anomalies like this, although my weak Ruby skills leave me less
than 100% sure.  The same skill deficiency means it would take me at
least three FTE days to fix the flaw in fromcvs, which I'd have to
do off-hours.  So add me to the list of people who think that if
these are going to be recurring, we should look at moving from cvs
to git as soon as 9.0 is released.
-Kevin


Re: Git out of sync vs. CVS

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> So the list really isn't very long. I think it's perfectly possible to
> clear it off before the release. Because we still only want to change
> after the release, or are you saying once those are fixed, we can
> change even if we happen to be in beta at the time?

When and if we have the prerequisite tasks done, it'll be time enough to
think about exactly when to schedule the move.  Given the amount of
movement on the prerequisites in the past year, I'm not planning to
worry about it today.
        regards, tom lane


Re: Git out of sync vs. CVS

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>>> So add me to the list of people who think that if
>>> these are going to be recurring, we should look at moving from
>>> cvs to git as soon as 9.0 is released.
>> 
>> The gating factor is not release schedule; it is the still-
>> unaddressed tasks that must be done before we can consider moving.
>> http://wiki.postgresql.org/wiki/Switching_PostgreSQL_from_CVS_to_Git
> If you think people can work on that list without risk of delaying
> the release, OK.  I was assuming that such work would be too
> disruptive to work on at this point in a release cycle, and might
> possibly pull time from folks who would otherwise be working on the
> release.  Do you disagree?

Oh, if you meant that people should start dealing with those tasks after
release, that's fine with me.  I read your comment to be that we should
schedule the move for immediately after release, prerequisites or no.
        regards, tom lane


Re: Git out of sync vs. CVS

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> So add me to the list of people who think that if
> these are going to be recurring, we should look at moving from cvs
> to git as soon as 9.0 is released.

The gating factor is not release schedule; it is the still-unaddressed
tasks that must be done before we can consider moving.
http://wiki.postgresql.org/wiki/Switching_PostgreSQL_from_CVS_to_Git
        regards, tom lane


Re: Git out of sync vs. CVS

From
Magnus Hagander
Date:
On Thu, Jan 21, 2010 at 17:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> So add me to the list of people who think that if
>> these are going to be recurring, we should look at moving from cvs
>> to git as soon as 9.0 is released.
>
> The gating factor is not release schedule; it is the still-unaddressed
> tasks that must be done before we can consider moving.
> http://wiki.postgresql.org/wiki/Switching_PostgreSQL_from_CVS_to_Git

Assuming git-cvsserver works as advertised (which we should verify of
course) there are really only two points left:
"Confirm past releases can be built identically from Git, using binary diff "
which I intend to look at, and
"Provide backport examples "
which Heikki has promised to look at


Unless the NLS scripts actually do commits, in which case they also
have to be changed.

So the list really isn't very long. I think it's perfectly possible to
clear it off before the release. Because we still only want to change
after the release, or are you saying once those are fixed, we can
change even if we happen to be in beta at the time?

-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: Git out of sync vs. CVS

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> So add me to the list of people who think that if
>> these are going to be recurring, we should look at moving from
>> cvs to git as soon as 9.0 is released.
> 
> The gating factor is not release schedule; it is the still-
> unaddressed tasks that must be done before we can consider moving.
> http://wiki.postgresql.org/wiki/Switching_PostgreSQL_from_CVS_to_Git
If you think people can work on that list without risk of delaying
the release, OK.  I was assuming that such work would be too
disruptive to work on at this point in a release cycle, and might
possibly pull time from folks who would otherwise be working on the
release.  Do you disagree?
-Kevin