Robert Haas <robertmhaas@gmail.com> writes:
> 1. The new conversion seems to have stolen the apostrophe from "D'Arcy
> J.M. Cain <darcy@druid.net>", rendering him "DArcy J.M. Cain
> <darcy@druid.net>".
Yeah, I see that too. It's probably bad input rather than the
converter's fault ;-)
> 2. Any non-ASCII characters in, for example, contributor's names show
> up differently in the two repos. Generally, the original repo is OK
> and the new repo is garbled; although I found one very old example
> that went the other way.
What it looks like to me is that a Latin1->UTF8 conversion has been
applied to the log text. Which might be a good idea if it all *was*
Latin1, but a fair-sized percentage isn't. Applying this conversion to
UTF8 entries results in garbage, of course. Even if this could be done
reliably, I think this counts as editorializing on the historical
record, and should be switched off if possible.
> There are also a number of commits that differ in order between the
> two repos, and an even larger number where commits are duplicated or
> merged in one repository relative to the other.
I suspect that this is an artifact of the converter trying to merge
nearby commits into one commit, which it more or less *has* to do for
sanity since CVS commits aren't atomic. I don't have a problem with
the concept, but I notice cases where the converted commit has a
timestamp some minutes later than what the cvs2cl output claims.
I suspect this is what the converter was using as a cutoff time.
Would it be possible to make sure that the converted commit is always
timestamped with the latest individual file update timestamp from the
included CVS commits?
regards, tom lane