7 messages in com.perforce.revml[revml] Re: incremental conversion fr...
FromSent OnAttachments
Max Bowsher11 Jun 2003 15:10 
Barrie Slaymaker17 Jun 2003 06:01 
Barrie Slaymaker17 Jun 2003 06:04 
Barrie Slaymaker17 Jun 2003 06:21 
Barrie Slaymaker17 Jun 2003 06:30 
Barrie Slaymaker17 Jun 2003 06:33 
Barrie Slaymaker17 Jun 2003 06:39 
Subject:[revml] Re: incremental conversion from other SCM to svn by vcp
From:Barrie Slaymaker (barr@slaysys.com)
Date:06/17/2003 06:21:36 AM
List:com.perforce.revml

[Sorry for the delayed replies, was out of town & offline, thankfully :]

On Sat, Jun 14, 2003 at 01:40:29AM +0800, Chia-liang Kao wrote:

On Fri, Jun 13, 2003 at 11:21:35AM -0500, kfo@collab.net wrote:

- The subversion dest driver for VCP handles branches pretty well, because VCP itself already does the branch-point finding. So branches are created as copies, with the appropriate files added or removed when necessary. The revision history of your sample repository at http://svn.openfoundry.org/svn/sympa/ shows this.

actually the vcp core does only per-file branch point deduction. but the revision in svn semantic (svn cp trunk -r <which>) for branching is decided by the create_branch function in VCP::Dest::svn i wrote.

VCP::Source::cvs does detect branch creation and marks it with "placeholder" revisions (no delta, the rev id is the madig branch number, like "1.1.2.0", etc). This is so that CVS branches that do not contain any changed files still result in a branch in the destination repository and so that, in systems like perforce, the branch of all files in a new branch can occur in a single operation. The changeset aggregator should put all the branch founding placeholders in a single changeset so the VCP::Dest::foo can do the branch as a single operation.

- But it doesn't handle tags, because VCP doesn't deduce the tag points in the same way. Instead, it just marks the tags per file revision... which doesn't help us much in Subversion.

it shouldn't be hard to implement that. since the deduction is pretty much like I did in branching point: decide the `global point' from from points of every files.

Can somebody point me to a deep description of what svn means by a tag?

- The conversion time seems a bit slow to me (7 hours for 2000 svn revs with four branches). Extrapolating from cvs2svn.py's performance right now, I think it would do that in 10 minutes at the most. But perhaps there are optimizations you are planning?

as i saw in the profiling from vcp log, svn commit takes some time, the longest is 20 sec or so for one large commit. but the bottle neck right now is how it extracts every revision from cvs: doing cvs checkout -r <revision> <onefile> for every file. i'll be implementing fast retrieval of cvs by setting date tag and verifying the resulting revision, hopefully this would boost conversion time. but more importantly is that the conversion is incremental, so even if the very first conversion of a large repository is slow, subsequent conversion of newly committed files won't take long.

Try also the direct read of the source files. I'd like to take the RCS file parser and have it cache (on disk) reversed deltas from the head back to the oldest revision retrived, then apply these reversed deltas as it "cvs checkout"s each new revision. This will prevent it from spawning cvs each time (ugh), and will make it more efficient because it can apply the patches in a going-forward direction.

VCP::Source::revml does the roll-forward-and-patch operation already, using VCP::Patch (a limited all-perl, and thus slower but \000 safe patch routine), so the remaining operation here is to reverse any previously unreversed patches as a checkout is simulated and store them on disk.

This should be lots faster than spawning CVS kids, and hopefully even faster and less errorprone than using the -d$DATE command. But it would only apply to CVSROOTs on the local fs, do the -d$DATE optimization you discuss would be very nice for :pserver: and :ext: variants.

- Barrie