|Luigi Rizzo||Apr 19, 2012 6:12 am|
|Slawa Olhovchenkov||Apr 19, 2012 11:53 am|
|Andre Oppermann||Apr 19, 2012 1:05 pm|
|Luigi Rizzo||Apr 19, 2012 1:26 pm|
|K. Macy||Apr 19, 2012 1:34 pm|
|Luigi Rizzo||Apr 19, 2012 2:03 pm|
|K. Macy||Apr 19, 2012 2:06 pm|
|Andre Oppermann||Apr 19, 2012 2:11 pm|
|K. Macy||Apr 19, 2012 2:17 pm|
|Andre Oppermann||Apr 19, 2012 2:19 pm|
|Andre Oppermann||Apr 19, 2012 2:26 pm|
|K. Macy||Apr 19, 2012 2:35 pm|
|K. Macy||Apr 19, 2012 2:36 pm|
|Luigi Rizzo||Apr 19, 2012 2:43 pm|
|Andre Oppermann||Apr 19, 2012 3:36 pm|
|Luigi Rizzo||Apr 19, 2012 11:16 pm|
|Alexander V. Chernikov||Apr 20, 2012 1:26 am|
|Andre Oppermann||Apr 20, 2012 2:00 am|
|Andre Oppermann||Apr 20, 2012 2:25 am|
|John Baldwin||Apr 20, 2012 5:11 am|
|Luigi Rizzo||Apr 20, 2012 7:26 am|
|K. Macy||Apr 20, 2012 9:28 am|
|Luigi Rizzo||Apr 20, 2012 11:46 am|
|Bruce Evans||Apr 20, 2012 11:33 pm|
|Adrian Chadd||Apr 21, 2012 7:14 pm|
|K. Macy||Apr 22, 2012 7:04 am|
|Andre Oppermann||Apr 24, 2012 6:16 am|
|Luigi Rizzo||Apr 24, 2012 6:44 am|
|Li, Qing||Apr 24, 2012 7:15 am|
|K. Macy||Apr 24, 2012 8:03 am|
|K. Macy||Apr 24, 2012 8:05 am|
|Luigi Rizzo||Apr 24, 2012 9:16 am|
|K. Macy||Apr 24, 2012 9:18 am|
|Fabien Thomas||Apr 24, 2012 9:34 am|
|Li, Qing||Apr 24, 2012 10:39 am|
|Li, Qing||Apr 24, 2012 10:42 am|
|Bjoern A. Zeeb||Apr 24, 2012 5:01 pm|
|Maxim Konovalov||Apr 25, 2012 2:21 am|
|Slawa Olhovchenkov||Apr 25, 2012 3:19 am|
|K. Macy||Apr 25, 2012 8:44 am|
|Bjoern A. Zeeb||Apr 25, 2012 11:53 am|
|George Neville-Neil||May 1, 2012 7:27 am|
|Luigi Rizzo||May 1, 2012 8:21 am|
|George Neville-Neil||May 1, 2012 10:33 am|
|Bjoern A. Zeeb||May 1, 2012 2:08 pm|
|Luigi Rizzo||May 1, 2012 2:22 pm|
|Luigi Rizzo||May 3, 2012 9:32 am|
|Subject:||Re: more network performance info: ether_output()|
|From:||Bjoern A. Zeeb (bzee...@lists.zabbadoz.net)|
|Date:||May 1, 2012 2:08:42 pm|
On 1. May 2012, at 15:40 , Luigi Rizzo wrote:
On Tue, May 01, 2012 at 10:27:42AM -0400, George Neville-Neil wrote:
On Apr 20, 2012, at 15:03 , Luigi Rizzo wrote:
Continuing my profiling on network performance, another place were we waste a lot of time is if_ethersubr.c::ether_output()
In particular, from the beginning of ether_output() to the final call to ether_output_frame() the code takes slightly more than 210ns on my i7-870 CPU running at 2.93 GHz + TurboBoost. In particular:
- the route does not have a MAC address (lle) attached, which causes arpresolve() to be called all the times. This consumes about 100ns. It happens also with locally sourced TCP. Using the flowtable cuts this time down to about 30-40ns
- another 100ns is spend to copy the MAC header into the mbuf, and then check whether a local copy should be looped back. Unfortunately the code here is a bit convoluted so the header fields are copied twice, and using memcpy on the individual pieces.
Note that all the above happens not just with my udp flooding tests, but also with regular TCP traffic.
I'm really glad you're working on this. I may have missed this in a thread but are you tracking these somewhere so we can pick them up and fix them?
Also, how are you doing the measurements.
The measurements are done with tools/tools/netrate/netsend and kernel patches to return from sendto() at various places in the stack (from the syscall entry point down to the device driver). A patch is attached.
I think that was lost on the way. Can you mail it or put it somewhere and send a link?
You don't really need netmap to run it, it was just a convenient place to put the variables.
I am not sure how much we can "fix", there are multiple expensive functions on the tx path, and probably also on the rx path.
My hope at least for the tx path is that we can find out a way to install a "fastpath" handler in the socket. When there is no handler installed (e.g. on the first packet or unsupported protocols/interfaces) everything works as usual. Then when the packet reaches the bottom of the stack, we try to update the socket with a copy of the headers generated in the process, and the name of the fastpath function to be called. Next transmissions will then be able to shortcut the stack and go straight to the device output routine.
I don't have data on the receive path or good ideas on how to proceed -- the advantage of the tx path is that traffic is implicitly classified, whereas it might not be the case for incoming traffic, and classification might be the expensive step.
Hopefully we'll have time to discuss this next week in ottawa.
-- Bjoern A. Zeeb You have to have visions! It does not matter how good you are. It matters what good you do!
_______________________________________________ free...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "free...@freebsd.org"