1 message in com.perforce.jamming[jamming] Proteus - An alternative to...| From | Sent On | Attachments |
|---|---|---|
| Eric...@metrotools.com | 25 Jan 2000 16:03 |
| Subject: | [jamming] Proteus - An alternative to make![]() |
|---|---|
| From: | Eric...@metrotools.com (Eric...@metrotools.com) |
| Date: | 01/25/2000 04:03:41 PM |
| List: | com.perforce.jamming |
I posted this comp.software.config-mgmt. But I thought some folks on this list might find this interesting. I hope this isn't viewed as intrusive. So far the reaction has been mildly luke warm. Someone's bound to like it.
Proteus - A New Approach To Make
* A Criticism of Make
Proteus was born out of frustration. A make utility is a crucial part of the software development process. Yet make alone is never enough. Most developers and source code managers build up a warehouse of scripts and utilities to squeeze out the desired behavior. But even after building up such an arsenal, development groups must continually wage war with their ad hoc make system.
The costs of the silent war build though the years. Most developers within an organization are incapable of making bug fixes or enhancements to their build process. Those that can fear the complex dependencies of the various utilities cobbled together. Educating new developers about the vagaries of the build process becomes a rite of passage. As the source code base grows, the build system fails to scale and creaks along like band-aids applied to a sinking ship.
While there are a number of flavors of make, this will focus on critiquing the functionality common to Unix make, GNU make, Digital's MMS, and Microsoft's NMAKE. The following critique will refer to the collective common feature set as make.
The only prior knowledge that's required is that one needs to understand the prototypical make relationship. Which is this
target : dependent1 dependent2 dependent3 ... shell action1 shell action2
In words, the relationship operates like so - if any dependent on the right hand side of the relationship is out of date with respect to the target, the shell actions are invoked. Each dependent's out of date state is determined by recursively locating a target and analyzing its out of date relationship to its dependents.
* Out of Date Relationship
The simplest hardwired assumption is the method of determining if something is out of date. In general, make assumes that the target and dependent have a direct file counterpart from which a timestamp can be extracted. From here, make performs a time stamp comparison between the target file and dependent file. Should the dependent's time stamp be newer than the target's time stamp, the target is deemed out of date. When the target is out of date, the shell actions are invoked to bring the target up to date.
For this discussion, this relationship will be called the out of date relationship. To put it formally, make uses a timestamp out of date relationship, which is a grave mistake. Consider the following development scenario.
Suppose there are two development teams. The Tools Team designs a low-level class framework for implementing an object persistence model. The Tools Team has its own build and release cycle separate from others in the organization. The release process consists of delivering their source code base in its entirety to other internal development groups.
Now consider the timestamp out of date relationship from the perspective of those receiving the work of the Tools Team. Suppose the Applications Team has been hard at work with v1.0 of the Tools Team's object persistence framework. Since the Applications Team has been under much pressure, they are under a daily build system to speed the QA process and fold in the rest of the final development.. The effect of this cycle is to cause each object module and executable to have a very recent timestamp.
Let's say the Tools Team has finished work on v1.5 of their object persistence framework. After testing their library of source code, they release it to others within the organization. For sake of completeness, let's say the Tools Team's last build was on October 1st with all timestamps for their source code base dating from the previous day, September 30th. On October 10th, the Applications Team receives the release, they decide to rebuild their system against these new changes.
Now comes the challenge. The source code that the Tools Team delivered is in some sense old. It dates from September 30th. But given that the Applications Team rebuilds everyday, the Applications Team will have object modules and executables that date from after October 1st. Yet, those binaries were built using the previous version of the Tools Team's library.
As a result of this situation, when one goes to rebuild the system with the new object persistence framework source files, nothing will be rebuilt. That's because, from a timestamp perspective, all of the resulting object modules and executables are in fact up to date in relation to the files they depend on. From a conceptual perspective, the files have changed.
A common solution to this problem is to modify the timestamp of all the source files of the Tools Team's library. This would cause all object modules and executables to rebuild which would guarantee a correct build. Yet this is very unappealing for two significant reasons. This is quite hackish and terribly inefficient. Surely not everything has changed in the source code base, why waste so much time rebuilding when perhaps very little has changed? The timestamp out of date relationship is not an accurate way to communicate changed source files. Yet make does not allow one to control the behavior of the out of date relationship. There are of course better out of date relationship algorithms to use, but rather than foist this choice onto the make user, the make developer should have a choice.
* Dynamic Dependency Generation
Any large-scale system implemented in C or C++ will have a substantial source to header file dependency structure. No development group can be expected to manually create this information. Yet as a vital part of the build process, make does not offer a simple mechanism to conveniently generate this information. The root of the problem is the way in which make evaluates the makefile. Make has two phases of operation. The first phase is the syntactical parsing of the makefile. During this phase, the underlying tree of target to dependent information is built up. In the second phase, the tree is evaluated and then brought up to date.
In order for make to operate, the complete dependency tree needs to be computed as per the first phase. After all, there's no way to know what to actually rebuild in the second phase unless one has the whole dependency tree. But to compute the whole dependency tree can be time consuming, especially for a large system. Thus, we're left with a system where the second phase imposes a high cost on the execution of the first phase.
A large source code base makes it too expensive to compute the whole dependency tree each time. A more efficient process would be to compute the dependencies only for any files that have changed. But this is the very type of operation that only the second phase of make can perform. The result is a situation where we need the features of the evaluation phase to help us generate the information for the tree population phase.
Some utilities solve this issue via a recursive invocation mechanism, but this is an inefficient hack. The recursive invocations are inherently unable to share information without further hacks. The effect here is that each visited header file must be recursively evaluated for its complete chain of headers. More work would need to be reintroduced in order to eliminate this inefficiency.
The basic cause is that the two phases should really exist as one integrated phase. Why not allow one to populate the tree as its evaluated. Not only would this result in greater flexibility, but it would result in greater efficiency too. The tree is grown only to the size that is needed when it is needed. The theme here is lazy evaluation.
* Parallel Build Capabilities
As the complexity of the products created grows, so grows the source code base. With this growth comes the increased build time associated with it. While great strides have been made in distributed systems and multi-tasking operating systems, make is largely unable to put those resources to use. A make system should be able to independently build multiple components, but also have the skill to synchronize those components that do depend on each other. A make system is more than just file dependency, its that and conceptual dependency at the global scope.
* Shell Commands Are Inadequate Actions
As part of every make process, one needs to implement the actions that actually bring the target up to date with respect to the dependents. Unfortunately, make implements these update actions as shell operations. For those operating systems with a strong shell, this direct access to the shell is a blessing given make's weak variable handling tools. But for those make developers running under weaker shells, this turns into a hideous curse. Apart from the Unix based shells, OpenVMS's DCL and Microsoft's infamous "dos box" offers a very weak feature set.
Even if these shells were stronger, the point still remains - direct invocation of shell commands is the wrong approach. This hampers portability of the make system. In order to accommodate the ever-growing list of operating systems and Byzantine shells, the make developer has to push their makefile through mind-numbing contortions.
Rather than rely on the shell as a warmed over interpreter, the good make system should offer its own notion of a programmable function. This would allow the make developer to implement more complex behavior than could be achieved directly in the shell. In addition, by using an actual programming language, the update actions can become more operating system independent.
* A VPATH that Works
The goal of any large-scale make system is to minimize the amount of work that needs to be done for local development. Actually reaching this goal proves to be frustratingly difficult. One of the ways to reach this goal is to have a make system that draws upon centrally built binaries so that local development only rebuilds what's been changed locally.
To make the issue clear, let's consider the following example. This isn't the ideal way or the common way in which source code is shared, but it will give you an idea as to the issue involved here.
/shared_directory/src ; Complete set of source files /shared_directory/src/bin ; Binaries produced from the above
In the /shared_directory/src directory, we find a complete set of source files that forms a complete library. The src directory contains all headers and sources needed to build the library. And under the bin directory, we'll find all of the binaries that are to be produced from that source code base. Now, if a developer were to perform any local development, they would have a directory chain that looks something like this.
/usr/ejohnson/devel/src /usr/ejohnson/devel/src/bin
Note that the hierarchy of directories for local development is the same as that of the centralized development. In order for the user to do any useful local development, the library must be completely rebuilt. Thus the local directory becomes a complete mirror of the central directory. If the library is sufficiently large, this will consume a considerable amount of time and disk space. This becomes particularly painful for the developer when the change is miniscule compare to the overall size of the package.
The ideal way to handle this issue is to allow the local developer to invoke a make system that can draw upon the centrally built binaries when possible. More importantly though, the make system should have enough smarts to know when the local changes require rebuilding of central source files. To concretize this last point, let's assume a simple source code base. The library to be built consists of three source files, foo.h, foo.cpp, and bar.cpp. Furthermore, let's assume that both source files, foo.cpp and bar.cpp, include foo.h. Thus, any changes to foo.h would require the recompiling of foo.cpp and bar.cpp.
For simplicity's sake, let's assume that the centrally build library is completely up to date. Furthermore, let's assume that the developer would like to modify foo.h without directly modifying any other source file. This means that in the local directory, the developer will only have foo.h and no other source file.
In this scenario, the make system should recompile both source files from the central directory and place the binary output into the local directory. The desired source to be recompiled should not be placed into the local directory nor should the binary be placed into the central directory. Once both binaries are placed into the local binary directory, the library would be relinked. The difficulty in the above scenario is in handling the recompilation of the source file. That's because when the source file is recompiled, the binary is placed into the local directory. This means that the target for which we were looking at has been given a new home. To put it differently, the link action for the library will need to be told that the object module was placed into the local directory rather than the central directory.
This point is particularly important to grasp, yet difficult to convey, so let's reconsider this issue from make's perspective. When rebuilding the library, the make system will first consider a relationship like this.
foo.exe : foo.obj bar.obj [link actions]
The foo.exe will be produced locally, but the object modules, foo.obj and bar.obj, could be found either centrally or locally. Let's suppose that there are no copies locally, so when the make system goes to look for them, the object modules will be found in the central directory.
Thus the make system is now effectively working with a relationship like this.
/central/bin/foo.obj : foo.cpp foo.h [compiler actions]
As with foo.obj and bar.obj, foo.cpp and foo.h will need to be searched for in a similar path like way. Thus, we'll look for those source files locally and then centrally. In this case, we'll discover that foo.h (the local copy) is out of date with respect to the object module in the central directory. This will result in the source file being recompiled into the local binary directory.
This is all well and good, but we're left with the odd effect of the target not actually being built. In some sense, foo.obj was rebuilt. But the actual target, as reported or known to the make system, /central/bin/foo.obj, is still out of date. With a little hand waving, we've brought a different foo.obj up to date.
In order to be completely correct though, the make system needs to have a way to push back the new name of that target back up the evaluation sequence. This means that the dependencies for the link relationship, the one with foo.obj and bar.obj on the right hand side would need to be informed of their target's new home. This push back of new target home information is critical for the success of a shared directory build system.
* Make is a Lousy Programming Language
What sums up all of the previous criticisms against make is this simple observation. A makefile really needs to be thought of as a program. It's a tool to be used and customized by developers for their own needs. But as a language in which to write a program, make's suite of functionality leaves much to be desired.
Thus, the final and fundamental criticism of make is that it is not a professional language in which one could write any large scale, portable program. The flow control constructs are weak. There's no support for procedures with return values thus preventing any top-down design. Error handling is cryptic and handicapped. In addition, there's no real notion of structures to create aggregated data units. Make, as a development language, is deficient and retarded.
The birth of Proteus really began with the above observations. It began with the goal of implementing a good make utility that had a real language in it. But rather than write yet another scripting language, a simple laundry list of must have language features were developed.
* Broad based support across popular and fringe operating systems * Intelligent variable handling - including scoped variables * Primitive OOP support - classes and polymorphism * Thread support to implement parallel evaluation of build trees * Strong ties to an operating system's shell * Some measure of error handling
The two most popular scripting languages that match most of the above criteria are Python and Perl. Unfortunately, Perl's thread support is experimental and is incomplete. Thus, the only scripting language that satisfies the above requirements is Python.
To summarize -
Proteus is framework for building a make system. Its written entirely in Python and is freely distributable. If you'd like a copy, send me email.
-Eric Johnson ejoh...@metrotools.com




