| From | Sent On | Attachments |
|---|---|---|
| Ron Goldman | Mar 11, 2009 1:37 am | |
| John Daniels | Mar 11, 2009 2:01 am | |
| Ron Goldman | Mar 11, 2009 2:15 am | |
| John Daniels | Mar 11, 2009 2:46 am | |
| Markus Bestehorn | Mar 11, 2009 6:24 am | |
| Bart Braem | Mar 11, 2009 6:52 am | |
| Robert Taylor | Mar 11, 2009 7:24 am | |
| John Daniels | Mar 11, 2009 7:56 am | |
| Bart Braem | Mar 11, 2009 8:11 am | |
| Kurt Smolderen | Mar 11, 2009 11:04 am | |
| Ron Goldman | Mar 11, 2009 4:25 pm | |
| Ron Goldman | Mar 11, 2009 4:47 pm | |
| Ron Goldman | Mar 11, 2009 6:13 pm | |
| Ron Goldman | Mar 11, 2009 6:26 pm | |
| Stephan Kessler | Mar 16, 2009 12:04 am |
| Subject: | Radio update, update | |
|---|---|---|
| From: | Ron Goldman (Ron....@sun.com) | |
| Date: | Mar 11, 2009 1:37:49 am | |
| List: | net.java.dev.spots.dev | |
Here's an update on my improvements to the SDK radio stack.
To really stress the OTA code I decided to test flashing the latest Transducerlib (which now weighs in at 501,525 bytes). The new radio code was able to do so over 5 hops (which was all the SPOTs I had when doing the testing), though I got lots of failures at 5 hops due to route timeouts. The sending times and some radio statistics are given later.
From my various experiments here is what I learned & the code I changed:
MAC-layer
---------
First, there are large numbers of ChannelAccessFailures & No ACK errors. Since the MAC layer tries several times, usually the transmission eventually succeeds and does not result in an exception being thrown. However anything that can be done to minimize these errors should help. In particular I increased the number of retries for ChannelAccessFailures from 4 to 5. I doubt this made any practical difference. (I also changed the backoff delay to zero for the first attempt.)
For the No ACK errors I changed the wait times between resends from (0, 50, 200) to (10, 50, 120) for the (1st, 2nd, 3rd) retry. No ACKs occur because the packet was not received due to a collision or a receive buffer overflow. As collisions are random the delay time should probably also be random with a max of (10, 50, 120), and maybe we might try that latter. For the overflow it is important to give the receiving SPOT time to recognize the problem & recover. As John Daniels pointed out that might mean waiting for a partial GC to finish. After making this change I think I saw slightly less No ACK & buffer overflow errors, but I did not do enough testing to really confirm it.
LowPan-layer
------------
The real improvements in reliability were due to changes to the LowPan & Radiostream code. At the LowPan level I added delays after transmitting each packet so that the transmitting thread (A) would not send another packet until the first had been forwarded out of radio range: the delay gives the receiver (B) time to read & retransmit the packet, plus additional time for the next hop (C) to do likewise. Any less delay and the next packet sent by A would likely collide with the previous packet being forwarded by C (i.e. the hidden terminal problem). The delay is based on the number of hops the packet is being sent. This change made a noticeable improvement. (A possible optimization would be to use the same delay for 3 or more hops as by then the packet is out of radio range of A.)
Radiostream
-----------
The biggest difference was to shorten the Radiostream retransmit times, which then allowed recovery from transmission errors. The previous timeout value of 15 seconds was greater than the connection timeout value (5 seconds), so no retransmissions ever occurred. The timeout is now set to 500 milliseconds times the number of hops to the destination.
Several times when trying to flash the library over 5 hops I was able to successfully transmit all of the bits, but the final end-to-end ACK got lost and the operation failed. This is a basic failing of the current Radiostream design. One fix would be to have the sender transmit an ACK when it receives the end-to-end ACK.
OTA-layer
---------
Several minor changes were made to the OTA code to make it better able to handle errors and either recover or end the OTA session. One more major change was to eliminate the CRC stream that was being added on top of the Radiostream for sending data; Radiostream is supposed to guarantee delivery, so there is no need to add a checksum-based stream to it. Another minor change was to reduce the size of the chunks sent before requiring an end-to-end ACK.
AODV
----
Another problem that came up flashing the library over several hops was that No ACKs can cause the route between the sender & receiver to be dropped. Sometimes the route wasn't reestablished quickly enough and the attempt to transmit timed out---especially over 5 hops. The AODV routing algorithm is not designed for static nodes and would need to be modified so that it takes several No ACK failures before it drops a route.
What next?
----------
The SPOT SDK radio stack is now much more robust, but still suffers from 2 problems: dropped end-to-end ACKs in Radiostreams & a routing manager not well suited to many SPOT installations. Rather than try to improve Radiostream or our AODV implementation, we are planning on switching to an implementation of TCP that Pete is finishing up & investigating more appropriate routing protocols.
Also left to do is tune the delays & timeouts now used by LowPan & Radiostream to improve performance.
-- Ron --
-------------------------------------- timings:
1 hop flashlibrary: 2:50
2 hop flashlibrary: 6:40 [java] CRC errors = 14 [java] Overflow errors = 45 [java] ChannelAccessFailures = 677 [java] No ACKs = 92 [java] Partial GC's = 143 [java] Full GC's = 3
3 hop flashlibrary: 8:38 [java] CRC errors = 54 [java] Overflow errors = 25 [java] ChannelAccessFailures = 1761 [java] No ACKs = 400 [java] Partial GC's = 172 [java] Full GC's = 3
4 hop flashlibrary: 11:10 [java] CRC errors = 189 [java] Overflow errors = 37 [java] ChannelAccessFailures = 2181 [java] No ACKs = 516 [java] Partial GC's = 209 [java] Full GC's = 3
relay 3: [java] CRC errors = 126 [java] Overflow errors = 58 [java] ChannelAccessFailures = 5084 [java] No ACKs = 1077 [java] Partial GC's = 224 [java] Full GC's = 4
relay 2: [java] CRC errors = 136 [java] Overflow errors = 58 [java] ChannelAccessFailures = 3168 [java] No ACKs = 724 [java] Partial GC's = 228 [java] Full GC's = 4
5 hop flashlibrary: 17:14 - this attempt failed [java] CRC errors = 604 [java] Overflow errors = 64 [java] ChannelAccessFailures = 2541 [java] No ACKs = 429 [java] Partial GC's = 247 [java] Full GC's = 4
relay 4: [java] CRC errors = 569 [java] Overflow errors = 68 [java] ChannelAccessFailures = 6498 [java] No ACKs = 1909 [java] Partial GC's = 270 [java] Full GC's = 4
relay 3: [java] CRC errors = 326 [java] Overflow errors = 58 [java] ChannelAccessFailures = 4462 [java] No ACKs = 1713 [java] Partial GC's = 274 [java] Full GC's = 5
relay 2: [java] CRC errors = 387 [java] Overflow errors = 59 [java] ChannelAccessFailures = 4526 [java] No ACKs = 831 [java] Partial GC's = 272 [java] Full GC's = 5 [java] Partial GC's = 273
relay 1: [java] CRC errors = 252 [java] Overflow errors = 47 [java] ChannelAccessFailures = 4455 [java] No ACKs = 623 [java] Partial GC's = 267 [java] Full GC's = 4
5 hop flashlibrary: 16:31 - succeeded [java] CRC errors = 237 [java] Overflow errors = 48 [java] ChannelAccessFailures = 2545 [java] No ACKs = 582 [java] Partial GC's = 254 [java] Full GC's = 4





