6 messages in com.perforce.revml[revml] Conversion of Perl Perforce r...
FromSent OnAttachments
John Peacock19 May 2004 09:26 
Barrie Slaymaker20 May 2004 19:31 
John Peacock21 May 2004 04:26 
Chia-liang Kao30 May 2004 16:13 
Barrie Slaymaker31 May 2004 07:53 
John Peacock08 Jun 2004 12:13 
Subject:[revml] Conversion of Perl Perforce repository to Subversion - Part 1
From:Barrie Slaymaker (barr@slaysys.com)
Date:05/20/2004 07:31:28 PM
List:com.perforce.revml

On Wed, May 19, 2004 at 12:26:30PM -0400, John Peacock wrote:

I have [stupidly] agreed to test the feasibility of converting the main Perl repository from Perforce to Subversion. Initially, this would be to provide a readonly public repository; eventually, it might lead to development being moved permanently from P4 to SVN. I have two questions, the first more of a possible design issue with VCP and the second more of a practical question based on my incomplete understanding of VCP, so I'll leave the second question for another message.

I am using CLKao's svk to mirror the Perforce repository, which ultimately uses VCP to do the heavy lifting. I've attempted the conversion twice and both times, the server eventually swapped itself almost to death due to the huge RAM requirements (first 512MB then 2GB actual memory installed). Based on my readings of the LIMITATIONS in VCP::Dest::revml, the odds are good that the basic design is flawed for such a large conversion (64k revisions).

I hope that you're not trying to use the VCP::Dest::revml driver for serious conversions. Even if it didn't hog up a log of disk space, going to RevML and then away from RevML is going to be terribly slow.

The VCP::Dest::revml driver is definitely not meant to convert huge repositories. It's a research and testing driver until someone comes up with a good use case for RevML (we originally set out to develop RevML with VCP's precursor being a desktop extractor/inserter to/from RevML, but there seems to be no constituency for RevML the language and doing conversions by extracting from the source to RevML and then from RevML in to the destination is going to be much less efficient than going directly from one repository to another).

That being said, should a need for production support for RevML arise, VCP's RevML drivers could be optimized to only cache a few files and refresh the cache from the source repository, but only if the source repository is also not RevML.

The RAM limitation should not apply to other drivers, though I can't speak for the svn drivers. If you're seeing massive RAM use when using VCP::Source::{p4,cvs,vss} and VCP::Dest::p4, then I need to get to the bottom of it. But I don't think that's what you're doing.

If you want to send me a copy of the perl repository, I can work with it here to narrow in on the problem; the core VCP filters and {p4,vss,cvs} drivers need to be RAM friendly.

I don't know where to start looking; I assume if I could find out what hash is being used to store the metadata, I could convert that to a tied hash and trade performance for being able to actually finish the conversion. I'm not even sure if this is a flaw in VCP::Dest::svk or if it is in one of the other modules that makes up VCP.

Any hints and directions to start my hunt would be appreciated.

You can try using the null: destination and (first) no filter, then (second and later) the filters VCP reports using in its log file on the p4->svn conversion to isolate the RAM usage.

By far the most common data structure is the VCP::Rev object, so tracing the lifecycle of VCP::Rev instance is likely to turn up some information. In order to conserve memory, however, this is a packed data structure in memory and a lot of the standard strings are stored in tied hashes so that VCP::Rev instances can contain ints. Forcing a coredump and looking at it with the strings command might be informative (in case I forgot to tie a hash).

- Barrie