atom feed4 messages in org.freebsd.freebsd-afsRe: OpenAFS on FreeBSD 8.1
FromSent OnAttachments
Jan Henrik SylvesterJul 23, 2010 3:30 am 
Jan Henrik SylvesterJul 23, 2010 6:10 am 
Benjamin KadukJul 28, 2010 8:21 pm 
Benjamin KadukAug 1, 2010 5:33 pm 
Subject:Re: OpenAFS on FreeBSD 8.1
From:Benjamin Kaduk (kad@MIT.EDU)
Date:Jul 28, 2010 8:21:04 pm
List:org.freebsd.freebsd-afs

Hi Jan,

Sorry for the long delay in responding -- mail piled up a bit during a busy week.

On Fri, 23 Jul 2010, Jan Henrik Sylvester wrote:

On 07/23/2010 12:30, Jan Henrik Sylvester wrote:

I listed a few directories without blocks for longer periods of time as with my last testing. Good. Copying a huge file from AFS was terribly slow (even for my DSL connection), but it steadily progressed and I was able to abort it without deadlocking or crashing. Copying a 16MB file to AFS blocked a parallel "ls -l" on the same directory I was copying to,

I'm pretty sure that we're holding an exclusive vnode lock when we're not supposed to, but haven't looked into why the lock diagnostics don't complain about it.

but it eventually finished. The file was not corrupted. Great.

I did more testing from University to both of the AFS' I had been testing before. Copying a few MB from AFS and copying a 16MB file to AFS was both fine (showing 6MB/s while copying).

Trying to copy a 512MB file to AFS locked all AFS after two seconds that it was showing copy rates of 40MB/s (while the network is only 100Mbit/s). After increasing the AFS cache size to 512MB, almost all of the file got copied before AFS would lock. With a cache of 1GB, the file got copied without a deadlock or corruption. (All this is on MP, I have not tried to disable all but one core.)

Do you remember if this was with the git-based port or the 1.5.75 linked from the status report? The latter has an extra patch which band-aids around a reference-counting bug when we need to reclaim used vnodes due to a space crunch.

Rebooting the machine after having done nothing but the successful copy of the 512MB file, I got: Fatal trap 12: page fault while in kernel mode

Hm, hard to do much about that without a backtrace. I've seen occasional errors when shutting down afsd (various manifestations), but I'd say it completes successfully at least half the time (umount -f, that is).

Overall, the only problems I got during my tests were copying files larger than the cache size and shutting down afsd. So far, AFS seems to become usable for me (even on MP).

Glad to hear things are getting better.

On Fri, 23 Jul 2010, Jan Henrik Sylvester wrote:

I did not expect my problems to have vanished, but I wanted to try again.

Should I use the git based port http://stuff.mit.edu/afs/sipb.mit.edu/user/kaduk/freebsd/openafs/openafs-devel.shar.txt you pointed me to earlier for testing? Or should I always use http://web.mit.edu/freebsd/openafs/openafs.shar that you posted to the Quarterly Status Report?

I would probably stick to the git-based port, as that will give more useful reports when things break (such as the one you mention below). As I mentioned above, there is one patch in the latter shar which is not in git; it's http://gerrit.openafs.org/2321 . You can add it to the git-based port by stopping after the 'make patch' stage, going into the work directory and running: git pull git://git.openafs.org/openafs refs/changes/21/2321/1 and then proceeding with the configure, build, and install stages.

With both, I run into the same problem compiling on FreeBSD 8.1. http://svn.freebsd.org/viewvc/base?view=revision&revision=209524 changed the definition of ifa_ifwithnet. In rx/rx_kernel.h, FreeBSD 8.1 needs the same definition of rx_ifaddr_withnet as AFS_OBSD46_ENV (while FreeBSD 8.0 needs the generic one). Should FreeBSD 8.0 still be supported?

I'll try to get that fix in this weekend (if not sooner). I only have 9-current test boxes, and I think Derrick only has 8.0, so 8.1-specific things would otherwise rely on me noticing relevant changes in the commit emails that go by; this doesn't work very well when I don't have much time to read them :)

With the git based port, I get an error on "kldload libafs": "can't load libafs: Exec format error" (missing symbol?) -- openafs-1.5.75 (the other port) does not seem to have this problem.

Sounds like someone introduced a regression since then; thanks for the report.

Starting afsd, I realized that I had not updated my CellServDB and thus tried to shutdown afsd, which complained about afs still being mounted. Trying to umount /afs, I got a segfault in the kernel. (I had not actually accessed /afs before doing that.) I guess restarting the afsd is not possible for now. (No big deal.)

It ... should be possible, though it is not fully reliable. Be sure to unload and reload the kernel module between unmounting /afs and restarting afsd, though.

pagsh does not immediately crash anymore -- another improvement, even if it is minor compared to FreeBSD not crashing anymore using AFS.

BTW: Thanks for all your work!

Cheers, Jan Henrik