7 messages in com.perforce.revml[revml] Re: incremental conversion fr...
FromSent OnAttachments
Max Bowsher11 Jun 2003 15:10 
Barrie Slaymaker17 Jun 2003 06:01 
Barrie Slaymaker17 Jun 2003 06:04 
Barrie Slaymaker17 Jun 2003 06:21 
Barrie Slaymaker17 Jun 2003 06:30 
Barrie Slaymaker17 Jun 2003 06:33 
Barrie Slaymaker17 Jun 2003 06:39 
Subject:[revml] Re: incremental conversion from other SCM to svn by vcp
From:Barrie Slaymaker (barr@slaysys.com)
Date:06/17/2003 06:30:42 AM
List:com.perforce.revml

On Fri, Jun 13, 2003 at 01:08:14PM -0500, kfo@collab.net wrote:

Some questions:

- Have you tested the driver on any really big repositories, like the FreeBSD CVS repository (2.3 gigs)? Also, that one's good because it has a lot of edge cases -- twice-deleted files, branches where some files are branched much later than others ("split" branches), etc.

Perforce is testing this internally for cvs->p4 situations. We have gotten through the XFree86 tree, which contains some odd situations as well (like two branch tags applied to the same magic version number).

- Is it holding a lot of state in memory, such as all the branch paths and things like that?

VCP holds all of the "state" in SDBM files *except* the list of revisions to transfer. That's in RAM now and is slated to go to disk very soon (it's preventing full FreeBSD cvs->p4 testing due to excessive RAM utilization).

This will also allow "scan once, then convert, test, edit, convert, test, etc" cycling.

as i saw in the profiling from vcp log, svn commit takes some time, the longest is 20 sec or so for one large commit. but the bottle neck right now is how it extracts every revision from cvs: doing cvs checkout -r <revision> <onefile> for every file. i'll be implementing fast retrieval of cvs by setting date tag and verifying the resulting revision, hopefully this would boost conversion time. but more importantly is that the conversion is incremental, so even if the very first conversion of a large repository is slow, subsequent conversion of newly committed files won't take long.

Well, the total conversion time is still important -- many sites will be converting once and then using just Subversion. For them, the main issue is "How long will my developers be shut out of the repository during this conversion?"

VCP can run against a live repo once (slowly) and then be used to grab changes made while it was running the first time. So the lockout period is the duration of the second (fast) conversion.

That said, VCP::Source::cvs must get much faster.

Likewise, we found with VCP::Dest::p4 that direct API access (please tell me someone's working on a Perl SubVersion::LibSVN or some such) is much faster for a destination driver, even one that is optimized to spawn as few child process as possible.

If 2000 revisions was 7 hours, then (say) the main GNU toolchain repository would probably need a conversion time of several days.

Ew.

(I'm not sure how using date instead of revision will help your CVS retrieval time? I would think the Subversion commits are a huge bottleneck... outputting to a dumpfile and then loading it might save a lot of time.)

This approach was also considered for VCP::Dest::p4, it is probably the fastest way to go, but would require maintaining backcompatability as svn evolves. Why does svn take overly long to commit (an operation which should be lightening fast), and when is it likely to get faster?

- Barrie