1 message in com.perforce.revml[revml] Adding Branching to vcp and R...
FromSent OnAttachments
Barrie Slaymaker01 Feb 2002 09:01 
Subject:[revml] Adding Branching to vcp and RevML
From:Barrie Slaymaker (barr@slaysys.com)
Date:02/01/2002 09:01:21 AM
List:com.perforce.revml

Here's a first cut; comments please!

Thanks,

Barrie

-------------------------------------------------------------------------

vcp Branching Design ====================

Version: 0.1 Author: Barrie Slaymaker <barr@slaysys.com> Discuss at: rev@perforce.com

I'm trying to figure out the most appropriate way to imbue RevML and vcp with branch and merge tracking and replicating abilities. This document is cvs and perforce specific at the moment, but I've tried to represent concepts and not product features so hopefully it won't be too far from here to a general solution.

Please think about all the SCMs you know about and see what we'd need to alter to accommodate them, thanks. VSS is especially crucial in this regard as we have a project underway to support it.

Definitions ===========

Every product and engineer seems to use unique terminology for branching and repositories, some of which conflict with terminology that vcp already uses (source and destination, for instance). Here is a list of terms and what they mean in the context of vcp and this document that may help head off some confusion:

version - A specific version of a file. This is easy to confuse with "revision", (defined just below); I do it all the time, though I've tried to keep this document consistent.

delta - the changes applied to one version to produce another.

revision - A description of the change from one version to another, usually including a delta, but sometimes (esp. with binary files) including the entire new version of a file.

source - the repository being read from (possibly a RevML file).

destination - the repository being written to (possibly a RevML file).

transfer - the act of extracting from the source and applying them to the destination.

metadata - all data not in the actual files/revisions being moved.

base version - the version before the current version. In the context of a branch, this is the file on the "main" line that the branched revision was created from.

target version - the version created by branching from a base version.

Goals =====

1. Record the origin (location and version) for each file version created by branching. We'll call this a "branch record", akin to Perforce's "integration record".

2. Branch records must be able to apply to groups of files or single files, depending on the source repository's branching methodology.

3. Revisions affected by branching should not need to refer to the branch record.

4. Branch records need to be able to capture generic and product-specific metadata. Generic metadata includes base and target versions. Product specific metadata includes Perforce branch views.

5. Handle merges as best as possible. Merges don't really affect cvs, but when doing p4->p4, it would be nice to simulate merges; not sure how to do this easily. I'd like to do the equivalent of a "p4 resolve" that would let me alter the file on disk instead of through the interactive resolve, then have "p4 submit" get the file like the p4 edit does. Something like the "am" action in the interactive mode, for example. Is there a p4 incantation for that (I've never had to do it and haven't been able to concoct such an incantation here)? As a fall-back, we could use P4MERGE and supply "m\n" to an interactive "p4 resolve" session.

7. A branch map is a mapping of source repository branches to destination repository branches. A branch map will contain the branch records. Branch maps should be extractable, probably using a command like:

vcp cvs:/module branchmap:foo.bmap

The resulting files would be XML and perhaps also as YAML (http://yaml.org/). The motivation for optionally supporting YAML is that it is a less cluttered file format that administrators may find easier to read and alter. It may not be enough easier to warrant the extra effort involved, however, especially in the first implementation. The XML format is required because branch maps must be able to exist "naturally" in RevML and so that additional tools (generic XML tools, textual interfaces and GUIs) can be brought to bear on branch maps without implementing a new format with an uncertain future (YAML).

8. Branch mapping files should be editable by hand and then (optionally) usable when doing a transfer:

vcp cvs:/module/... --branchmap=foo.bmap p4://depot

This will allow multiple transfers to take place with the same branch map.

9. If an external branchmap is specified for a transfer, it should be an error if a new branch has appeared that is not in the branchmap. This error should occur before any changes in the target repository occur and (possibly) a new branchmap file could be created (foo.bmap.001 or such) that contains the contents of foo.bmap with the missing branches added in. The error should be suppressible with an option. This goal is intended to prevent accidentally missing a branch or using the wrong branch map, while making it easy (by copying a file) to add new branch mappings when new branches occur.

10. Branch maps should not be required when doing a transfer if it is possible to make intelligent guesses about branch names. An inability to make intelligent guesses should cause an error message and exit without altering the destination repository.

11. Both "external wrapper" and integrated text and GUI clients should be supported by the branch map concepts and implementation.

Design ======

It is useful to describe the information that defines a branch using two categories: the "branch" metadata, which is associated with the branch itself and not with individual files affected by the branch, and "per-file branch metadata", which is largely a mapping of what files were branched from what versions (and perhaps by who and when).

Examples of branch metadata are: - branch name/tag/label - location in repository - whether or not to transfer the branch - product-specific data - like Perforce's branch view. - cvs's branch (and "magic branch") number - vcp/RevML's assigned branch id. - Perhaps a branch comment

Examples of per-file branch metadata are: - base version (what Perforce calls "source" or "theirs") - location in repository - version id (<rev_id> in RevML terms)

This metadata is all distinct from the actual file metadata, which for the first version in the branch would contain such data as - user performing the branch - when the branch was performed - the comment entered while branching. - the branch id (if necessary; with cvs the <rev_id> contains a branch number and with Perforce the file's location should be enough to identify the branch in most cases if not all; we need to identify any counter examples to this assumption).

This distinction between branch, per-file branch, and file metadata is made for several reasons:

1. file metadata is the minimal subset needed to move revisions; if the branches to operate on in the source and destination repositories are fully specified by the user, no branch or per-file branch metadata are needed.

2. It is a goal to be able to store branch metadata and perhaps per-file branch metadata externally in branch mapping files to allow them to be altered to control the transfer process in a reusable manner.

3. Branch meta data can be used to contain information that is common to the branched files, and is often the only thing a user may need to alter. A side effect of this is that branch metadata must occur before the per-file branch metadata in the information stream (whether it be RevML or vcp's internal transfer process).

4. It is far more likely that a user will want to review and alter the branch metadata than the per-file branch metadata to control whether or not and and how a branch is transferred.

5. branch metadata may exist before the branch is actually made (to wit, Perforce's branch views may exist before the branch is performed).

6. In RevML and in the inner workings of vcp, branch metadata will need to come before revision records (<rev> elements) so that the receiving processor can store them in a lookup table to be consulted when the file revisions are processed.

7. Some errors should be detected (nonexistent branches and branches that do exists but haven't been configured in the branch mapping, for instance) using branch maps before a transfer begins.

It is likely that the per-file branch metadata will be packaged with the metadata for the reversion that creates a branched file, though only when necessary.

The branch maps will be representable using a subset of RevML that can occur within a "normal" RevML file (within the <revml> element) or as a separate document. Only one branchmap may occur in any file.

As with RevML, a branch map may contain elements describing the source repository for auditing purposes (i.e. reading the file to see just what it contains some months after you created it :), but this will not be used in processing except possibly to give more informative errors:

branch r1_0 in source repository not found in branch map foo.bmap. NOTE: foo.bmap does not appear to be for this repository repository details: ... <=== extracted from repository branch map details: ... <==== extracted from branch map file

Here's an initial cut at a branch map describing a CVS repository in XML. Sorry for the wide-screen effect, I can reformat if it drives people's email clients bonkers. I'm picking on cvs here because I'm mostly concerned with cvs->foo transfers, thought the other way is necessary for testing purposes.

<branch_map>

<!-- metadata that applies to the entire branch map --> <source_root>/foo/bar</source_root> <!-- all source paths are relative
to this --> <dest_root>//depot/foo/bar</dest_root> <!-- user supplied, all destination
paths are relative to this --> <time>2000-01-01 00:00:00Z</time> <!-- when this map was created --> <rep_type>cvs</rep_type> <rep_desc>Concurrent Versions System (CVS) 1.10.7 (client/server)</rep_desc>

<!-- branch metadata -->

<branches> <branch id="mainline"> <dest_root>//depot/main</dest_root> <!-- where to put mainline files --> </branch> <branch id="branch-1"> <source_id>release_1</source_id> <dest_id>release_1</dest_id> <!-- edit this to change in a
transfer --> <dest_root>//depot/release_1</> <!-- only needed to override the
<branch_map/source_root/dest_root> --> </branch> ...more branches... </branches> </branch_map>

It is likely that the best place to put per-file branch metadata is with the existing per-revision information for the first revision of a file in the branch, like the **'ed items here:

<rev> <name>a/deeply/buried/file</name> <type>text</type> <cvs_info>Some info cvs might emit about this file</cvs_info> <rev_id>1.23.2.1</rev_id> <time>2000-01-01 12:00:08Z</time> <base_rev_id>1.23</base_rev_id> <!-- ** --> <base_rev_name>foo</base_rev_name> <!-- ** --> <user_id>cvs_t_user</user_id> <label>achoo08</label> <label>blessyou08</label> <comment>comment 2 </comment> <base_rev_id>1.1</base_rev_id> <delta type="diff-u" encoding="none">@@ -1 +1 @@ -a/deeply/buried/file, revision 1, char 0x01="<char code="0x01" />" +a/deeply/buried/file, revision 2, char 0x09=" " </delta> <digest type="MD5" encoding="base64">Dint+VF10zKgeQcxVRuU9g</digest> </rev>

The <base_rev_id> tag already exists and is ideal for use in identifying the version number of the base file. The <base_rev_name> tag is new and only necessary when the location or name of the file in the repository has changed as part of the branch process. It is relative to the <dest_root>, like the existing <name> is ("<dest_root>" is spelled "<rev_root>" in the current RevML version 0.28, that needs to change to be clearer and to support branch maps more effectively).