We did a bunch of similar tests to determine the overhead caused by
kvm and limitations of the nova network architecture. We found that
VMs themselves were able to consistently saturate the network link
available to the host system, whether it was 1GE or 10GE, with
relatively modern node and network hardware. With the default
VLANManager network setup, there isn't much you can do to scale your
outbound connectivity beyond the hardware you can reasonably drive
with a single node, but using multi-host nova-network, we were able to
run a bunch of nodes in parallel, scaling up our outbound bandwidth
linearly. We managed to get 10 nodes, with a single VM per node, each
running 4 TCP streams, up to 99 gigabits on a dedicated cross country
link. There was a bunch of tuning that we needed to do, but it wasn't
anything particularly outlandish compared with the tuning needed for
doing this with bare metal. We've been meaning to do a full writeup,
but haven't had time yet.
TSO and GRO can cover a multitude of path-length sins :)
That is one of the reasons netperf does more than just bulk transfer :)
When I was/am measuring "scaling" of an SMP node I would use
aggregate, burst-mode, single-byte netperf TCP_RR tests to maximize the
packets per second while minimizing the actual bandwidth consumed.
And if there is a concern about flows coming and going there is the
TCP_CRR test which is like the TCP_RR test but each transaction is a
freshly created and torn-down TCP connection.