TSO and GRO can cover a multitude of path-length sins :)
That is one of the reasons netperf does more than just bulk transfer :)
When I was/am measuring "scaling" of an SMP node I would use
aggregate, burst-mode, single-byte netperf TCP_RR tests to maximize the
packets per second while minimizing the actual bandwidth consumed.
And if there is a concern about flows coming and going there is the
TCP_CRR test which is like the TCP_RR test but each transaction is a
freshly created and torn-down TCP connection.
It doesn't do TCP_CRR, and it is not geared towards the
scores/hundreds/thousands of isntances, but I've just put a script into
the netperf repository at netperf.org which will use novaclient.v1_1 to
launch three instances of a specified flavor and run the
runemomniaggdemo.sh script on one of them, targeting the other two.