2 messages in com.perforce.perforce-userRe(2): storing binaries in SCCS
FromSent OnAttachments
Nick...@aperture.com23 Sep 1998 02:10 
Nick...@aperture.com23 Sep 1998 12:15 
Subject:Re(2): storing binaries in SCCS
From:Nick...@aperture.com (Nick@aperture.com)
Date:09/23/1998 02:10:51 AM
List:com.perforce.perforce-user

jeff_bowles at hotmail.com,Internet writes:

1. Storing uuencoded versions of binaries is a terrible solution. On an academic level, it'll work - but in practice the act of checking in (or out) a binary file means doing large numbers of text delta operations on [uuencoded text representations of] revisions that have little in common with each other. I've seen checkins of individual binary files take 30-50 minutes per file in a system that did this. 2. I've heard that more recent versions of RCS just treat binaries as text files, looking for '\n' when it can find 'em as to denote the "end of line". This is still pretty bad.

There are algorithms for diffing binary files without the '\n' kludge. We use a Mac program called UpdateMaker which creates updates of Mac programs or any other file and does an excellent job of just isolating the changes in binary Mac resources.

UpdateMaker was written by "Michael Hamel <michael at otago.ac.nz>". Perhaps he could be contacted for information on the diffing algorithm he uses. UpdateMaker is distributed by AD Instruments. The UpdateMaker home page is at http://www.adisoft.com/updatemaker/updatemaker.html.

The DOS program "patch", which creates PC based patch files, performs a similar operation.

Perhaps there is a decent program lying around in the public domain for diffing binaries. Perforce could use the same strategy for storing binaries as it does for text given a decent binary diff program.

Nick Pisarro, Jr. Aperture Technologies, Inc.

P.S. Diffing binaries sounds like a fun project for some programmer type--myself if I weren't so buried with other projects. I wrote the diff/merge program we used, before Perforce came around, for the source control of our Mac product. It used balanced binary search trees to avoid the breakdown some diff programs have when an insert or delete goes into the thousands of lines. Diffing is basically a search/sort process.

Interestingly, I took an opposite approach to Perforce. That is, the full source was kept of the base file and diffs were created of each change from there. (Perforce keeps the full source of last revision with diffs for the previous changes.) I could do this efficiently because the merge program would merge all diffs with the base file in one pass. Branching could be done more easily. But, I’m not sure which method is ultimately the best approach when all ramifications are considered.