3 messages in com.perforce.jamming[jamming] Improved Header Scan Cache ...| From | Sent On | Attachments |
|---|---|---|
| Matt Armstrong | 03 Jan 2002 11:01 | |
| Craig McPheeters | 09 Jan 2002 12:01 | |
| Matt Armstrong | 10 Jan 2002 08:18 |
| Subject: | [jamming] Improved Header Scan Cache for Jam![]() |
|---|---|
| From: | Matt Armstrong (jamm...@perforce.com) |
| Date: | 01/03/2002 11:01:33 AM |
| List: | com.perforce.jamming |
I just submitted code to //guest/matt_armstrong/jam/hdrscan_cache that implements a header scan cache for Jam.
This code is an incremental improvement over Craig McPheeters' original version in //guest/craig_mcpheeters/jam/src/. I've talked with Craig and he plans to roll most or all of my changes into his version.
I have even higher hopes -- I'd like it to make it into stock jam. Rationale:
- A header scan cache can improve things when HDRGRIST is in use. For example, with stock Jam if you always set HDRGRIST to $(SOURCE_GRIST), standard headers such as /usr/include/stdio.h will now get scanned once for each SubDir. With the header scan cache, common headers will be scanned only once.
This makes it practical to always use HDRGRIST. This means that the stock Jambase can support multiple header files of the same name. I think this rectifies a frequently encountered weakness in Jam.
It is important to point out that you get this benefit regardless of whether the cache is saved to disk.
- The header scan cache is persistent across runs of Jam only if the user wants it (controlled via the HCACHEFILE variable). So by default Jam will not sprinkle cache files all of the source tree, and it is possible to use LOCATE to put the persistent copy of the cache in, e.g., a build output directory.
Storing the header cache on disk can bring real benefits. On the medium sized project I use jam for, it seems to speed jam startup (time to first build action) by a factor of 6. People are happy to wait 15 seconds instead of 90.
It is important to point out that about half of this speedup occurs even if the cache is not persistent, since our project makes heavy use of HDRGRIST to correctly find all the header files in the project.
- The cache is implemented in such a way that it can never change the semantics of what Jam does. The call to a target's HDRRULE will be identical with or without the cache code.
Here is the text of the README.header_scan_cache that is part of the submit.
This change implements a header scan cache in a form that (cross fingers) can be incorporated into the stock version of Jam.
This code is taken from //guest/craig_mcpheeters/jam/src/ on the Perforce public depot. Many thanks to Craig McPheeters for making his code available. It is delimited by the OPT_HEADER_CACHE_EXT #define within the code.
Jam has a facility to scan source files for other files they might include. This code implements a cache of these scans, so the entire source tree need not be scanned each time jam is run. This brings the following benefits:
- If a file would otherwise be scanned multiple times in a single jam run (because the same file is represented by multiple targets, perhaps each with a different grist), it will now be scanned only once. In this way, things are faster even if the cache file is not present when Jam is run.
- If a cache entry is present in the cache file when Jam starts, and the file has not changed since the last time it was scanned, Jam will not bother to re-scan it. This markedly increaces Jam startup times for large projects.
This code has improvements over Craig McPheeters' original version. I've described all of these changes to Craig and he intends to incorporate them back into his version. The changes are:
- The actual name of the cache file is controlled by the HCACHEFILE Jam variable. If HCACHEFILE is left unset (the default), reading and writing of a cache file is not performed. The cache is always used internally regardless of HCACHEFILE, which helps when HDRGRIST causes the same file to be scanned multiple times.
Setting LOCATE and SEARCH on the the HCACHEFILE works as well, so you can place anywhere on disk you like or even search for it in several directories. You may also set it in your environment to share it amongst all your projects.
- The .jamdeps file is in a new format that allows binary data to be in any of the fields, in particular the file names. The original code would break if a file name contained the '@' or '\n' characters. The format is also versioned, allowing upgrades to automatically ignore old .jamdeps files. The format remains human readable. In addition, care has been taken to not add the entry into the header cache until the entire record has been successfully read from the file.
- The cache stores the value of HDRPATTERN with each cache entry, and it is compared along with the file's date to determine if there is a cache hit. If the HDRPATTERN does not match, it is treated as a cache miss. This allows HDRPATTERN to change without worrying about stale cache entries. It also allows the same file to be scanned multiple times with different HDRPATTERN values.
- Each cache entry is given an "age" which is the maximum number of times a given header cache entry can go unused before it is purged from the cache. This helps clean up old entries in the .jamdeps file when files move around or are removed from your project.
You control the maximum age with the HCACHEMAXAGE variable. If set to 0, no cache aging is performed. Otherwise it is the number of times a jam must be run before an unused cache entry is purged. The default for HCACHEMAXAGE if left unset is 100.
- Jambase itself is changed.
SubDir now always sets HDRGRIST to $(SOURCE_GRIST) so header scanning can deal with multiple header files of the same name in different directories. With the header cache, this does no longer incurs a performance penalty -- a given file will still only be scanned once.
The FGristSourceFiles rule is now just an alias for FGristFiles. Header files do not necessarily have global visibility, and the header cache eliminates any performance penalty this might otherwise incur.
Because of all these improvements, the following claims can be made about this header cache implementation that can not be made about Craig McPheeters' original version.
- The semantics of a Jam run will never be different because of the header cache (the HDRPATTERN check ensures this).
- It will never be necessary to delete .jamdeps to fix obscure jam problems or purge old entries.
-- matt




