|Colin Percival||Mar 30, 2005 3:47 pm|
|Ceri Davies||Mar 31, 2005 2:16 am|
|Alexey Dokuchaev||Mar 31, 2005 2:24 am|
|Colin Percival||Mar 31, 2005 2:33 am|
|Colin Percival||Mar 31, 2005 2:47 am|
|Robert Watson||Mar 31, 2005 9:06 pm|
|Mario Hoerich||Apr 1, 2005 5:43 am|
|Dan Nelson||Apr 1, 2005 7:27 am|
|Garance A Drosihn||Apr 1, 2005 12:16 pm|
|Alex Burke||Apr 1, 2005 1:18 pm|
|Colin Percival||Apr 1, 2005 2:12 pm|
|Max Laier||Apr 1, 2005 3:26 pm|
|John Baldwin||Apr 2, 2005 12:15 pm|
|Ceri Davies||Apr 4, 2005 8:45 am|
|Olaf Wagner||Apr 6, 2005 11:49 pm|
|Colin Percival||Apr 7, 2005 12:35 am|
|John Polstra||Apr 8, 2005 8:11 am|
|Subject:||Adding bsdiff to the base system|
|From:||Colin Percival (coli...@wadham.ox.ac.uk)|
|Date:||Apr 7, 2005 12:35:57 am|
Olaf Wagner wrote:
In article <424B...@wadham.ox.ac.uk> you wrote:
At present portsnap is the only mechanism available by which most users can securely maintain an up-to-date copy of the FreeBSD ports tree; it also provides some other advantages over cvsup (reduced bandwidth and ports INDEX/INDEX-5/INDEX-6 files).
Just out of interest: how does it do that? I've not tested it yet, but what intelligence or knowledge does it use to be so much more efficient (1/10) than CVSup? (I myself haven't found anything as efficient as CVSup yet, at least for replicating CVS repositories...)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Exactly. CVSup is a tool for replicating CVS repositories; portsnap is a tool for checking out the latest version of all the files in the repository. CVSup is solving a very difficult problem; portsnap is solving a very simple problem -- so it's not all that surprising that portsnap can be a bit more efficient.
The reason portsnap is more efficient lies in how portsnap and CVSup determine which files need to be updated. The ports tree contains roughly 71000 files, and the first thing the CVSup client does is list all of these files and send that list to the server.
In contrast, portsnap has an index file -- containing, roughly speaking, that same list -- and the portsnap client merely sends the sha256 hash of this index file to the server, which responds with either "I recognize that index -- here's a patch which will turn it into the latest index" or "I don't recognize that -- here's the new index". Because these indices have no user-serviceable parts (in fact, mucking about with the files in /usr/local/portsnap at all is strongly discouraged), there is a very good chance that the portsnap server will have a useful patch.
As a result, while CVSup uses (in this initial stage) bandwidth which is proportional to the number of files in the ports tree, portsnap uses bandwidth proportional to the number of files which have been modified, which is typically around 1% of the tree per day.
When it comes to the actual distribution of patches to files in the tree, portsnap is also marginally more efficient than CVSup, due to differences in how they encode the patches, but the real gains come in the process of identifying which files need to be updated.
Colin Percival PS. CVSup's inefficiency in dealing with large trees containing a small number of updated files isn't only relevant in the context of updating a ports tree; it is even more notable when tracking the security branches of the src tree. In the paper in which I introduced FreeBSD Update, I gave an example of where FreeBSD Update -- which distributes binary updates to the base system -- used less than half of the bandwidth needed by CVSup for the task of applying the corresponding updates to the src tree.