atom feed17 messages in org.freebsd.freebsd-archAdding bsdiff to the base system
FromSent OnAttachments
Colin PercivalMar 30, 2005 3:47 pm 
Ceri DaviesMar 31, 2005 2:16 am 
Alexey DokuchaevMar 31, 2005 2:24 am 
Colin PercivalMar 31, 2005 2:33 am 
Colin PercivalMar 31, 2005 2:47 am 
Robert WatsonMar 31, 2005 9:06 pm 
Mario HoerichApr 1, 2005 5:43 am 
Dan NelsonApr 1, 2005 7:27 am 
Garance A DrosihnApr 1, 2005 12:16 pm 
Alex BurkeApr 1, 2005 1:18 pm 
Colin PercivalApr 1, 2005 2:12 pm 
Max LaierApr 1, 2005 3:26 pm 
John BaldwinApr 2, 2005 12:15 pm 
Ceri DaviesApr 4, 2005 8:45 am 
Olaf WagnerApr 6, 2005 11:49 pm 
Colin PercivalApr 7, 2005 12:35 am 
John PolstraApr 8, 2005 8:11 am 
Subject:Adding bsdiff to the base system
From:Colin Percival (coli@wadham.ox.ac.uk)
Date:Apr 7, 2005 12:35:57 am
List:org.freebsd.freebsd-arch

Olaf Wagner wrote:

In article <424B@wadham.ox.ac.uk> you wrote:

At present portsnap is the only mechanism available by which most users can securely maintain an up-to-date copy of the FreeBSD ports tree; it also provides some other advantages over cvsup (reduced bandwidth and ports INDEX/INDEX-5/INDEX-6 files).

Just out of interest: how does it do that? I've not tested it yet, but what intelligence or knowledge does it use to be so much more efficient (1/10) than CVSup? (I myself haven't found anything as efficient as CVSup yet, at least for replicating CVS repositories...)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Exactly. CVSup is a tool for replicating CVS repositories; portsnap is a tool for checking out the latest version of all the files in the repository. CVSup is solving a very difficult problem; portsnap is solving a very simple problem -- so it's not all that surprising that portsnap can be a bit more efficient.

The reason portsnap is more efficient lies in how portsnap and CVSup determine which files need to be updated. The ports tree contains roughly 71000 files, and the first thing the CVSup client does is list all of these files and send that list to the server.

In contrast, portsnap has an index file -- containing, roughly speaking, that same list -- and the portsnap client merely sends the sha256 hash of this index file to the server, which responds with either "I recognize that index -- here's a patch which will turn it into the latest index" or "I don't recognize that -- here's the new index". Because these indices have no user-serviceable parts (in fact, mucking about with the files in /usr/local/portsnap at all is strongly discouraged), there is a very good chance that the portsnap server will have a useful patch.

As a result, while CVSup uses (in this initial stage) bandwidth which is proportional to the number of files in the ports tree, portsnap uses bandwidth proportional to the number of files which have been modified, which is typically around 1% of the tree per day.

When it comes to the actual distribution of patches to files in the tree, portsnap is also marginally more efficient than CVSup, due to differences in how they encode the patches, but the real gains come in the process of identifying which files need to be updated.