|Ron Goldman||Mar 11, 2009 1:37 am|
|John Daniels||Mar 11, 2009 2:01 am|
|Ron Goldman||Mar 11, 2009 2:15 am|
|John Daniels||Mar 11, 2009 2:46 am|
|Markus Bestehorn||Mar 11, 2009 6:24 am|
|Bart Braem||Mar 11, 2009 6:52 am|
|Robert Taylor||Mar 11, 2009 7:24 am|
|John Daniels||Mar 11, 2009 7:56 am|
|Bart Braem||Mar 11, 2009 8:11 am|
|Kurt Smolderen||Mar 11, 2009 11:04 am|
|Ron Goldman||Mar 11, 2009 4:25 pm|
|Ron Goldman||Mar 11, 2009 4:47 pm|
|Ron Goldman||Mar 11, 2009 6:13 pm|
|Ron Goldman||Mar 11, 2009 6:26 pm|
|Stephan Kessler||Mar 16, 2009 12:04 am|
|Subject:||Re: Radio update, update|
|From:||Robert Taylor (das....@gmail.com)|
|Date:||Mar 11, 2009 7:24:36 am|
It's great that you're getting to the bottom of all this. Reading your last email I recall that CTP adds a backoff at L3 when forwarding packets to prevent the hidden node problem you have described in radiostream. The value they use was empirically derived using TinyOS and MicaZs, what value did you arrive at? I can add this to the Java CTP code.
There is a tech report I recently read that you might find useful when dealing with MAC layer stuff.
Kannan Srinivasan, Prabal Dutta, Arsalan Tavakoli, and Philip Levis "Understanding the Causes of Packet Delivery Success and Failure in Dense Wireless Sensor Networks." Technical Report SING-06-00.
2009/3/11 Bart Braem <bart...@ua.ac.be>:
On 11 Mar 2009, at 09:38, Ron Goldman wrote:
Here's an update on my improvements to the SDK radio stack.
To really stress the OTA code I decided to test flashing the latest Transducerlib (which now weighs in at 501,525 bytes). The new radio code was able to do so over 5 hops (which was all the SPOTs I had when doing the testing), though I got lots of failures at 5 hops due to route timeouts. The sending times and some radio statistics are given later.
From my various experiments here is what I learned & the code I changed:
First, there are large numbers of ChannelAccessFailures & No ACK errors. Since the MAC layer tries several times, usually the transmission eventually succeeds and does not result in an exception being thrown. However anything that can be done to minimize these errors should help. In particular I increased the number of retries for ChannelAccessFailures from 4 to 5. I doubt this made any practical difference. (I also changed the backoff delay to zero for the first attempt.)
For the No ACK errors I changed the wait times between resends from (0, 50, 200) to (10, 50, 120) for the (1st, 2nd, 3rd) retry. No ACKs occur because the packet was not received due to a collision or a receive buffer overflow. As collisions are random the delay time should probably also be random with a max of (10, 50, 120), and maybe we might try that latter. For the overflow it is important to give the receiving SPOT time to recognize the problem & recover. As John Daniels pointed out that might mean waiting for a partial GC to finish. After making this change I think I saw slightly less No ACK & buffer overflow errors, but I did not do enough testing to really confirm it.
If we understand the 802.15.4 standard correctly, you have to do a random back-off in order to avoid collisions in the CSMA scheme. This seems like a good idea, to avoid simultaneous channel access when multiple nodes react to the same event and to avoid synchronization.
The real improvements in reliability were due to changes to the LowPan & Radiostream code. At the LowPan level I added delays after transmitting each packet so that the transmitting thread (A) would not send another packet until the first had been forwarded out of radio range: the delay gives the receiver (B) time to read & retransmit the packet, plus additional time for the next hop (C) to do likewise. Any less delay and the next packet sent by A would likely collide with the previous packet being forwarded by C (i.e. the hidden terminal problem). The delay is based on the number of hops the packet is being sent. This change made a noticeable improvement. (A possible optimization would be to use the same delay for 3 or more hops as by then the packet is out of radio range of A.)
To us, it seems a bit strange to solve the hidden node problem at the LowPan layer, as 802.15.4 already defines the random back-off. Would it be possible to move this improvement to the MAC layer? Then also any other implementations of higher layers will be able to use regular 802.15.4, which seems like a good feature.
What do you mean by the delay being based on the number of hops the packet is sent? Would a larger number of hops imply a larger delay? Based on our experience, taking the transmission of a packet over 3 hops into account should be enough. A good estimate could be 3 times the transmission delay of the packet + some milliseconds to allow processing.
If I understand this correctly, perhaps you could combine this part with the previous optimisation and split the delay in a fixed delay (based on the back-off status, to get around the hidden node problem) and a random delay (to avoid synchronization from happening).
Another problem that came up flashing the library over several hops was that No ACKs can cause the route between the sender & receiver to be dropped. Sometimes the route wasn't reestablished quickly enough and the attempt to transmit timed out---especially over 5 hops. The AODV routing algorithm is not designed for static nodes and would need to be modified so that it takes several No ACK failures before it drops a route.
Why do no ACKs cause a route to be dropped? AODV should work fine for rather static topologies, if you completely implement layer 2 feedback. Chapter 6.2 of the RFC reads:
Each time a route is used to forward a data packet, its Active Route Lifetime field of the source, destination and the next hop on the path to the destination is updated to be no less than the current time plus ACTIVE_ROUTE_TIMEOUT. Since the route between each originator and destination pair is expected to be symmetric, the Active Route Lifetime for the previous hop, along the reverse path back to the IP source, is also updated to be no less than the current time plus ACTIVE_ROUTE_TIMEOUT. The lifetime for an Active Route is updated each time the route is used regardless of whether the destination is a single node or a subnet.
So each time a route is used, it should become fresh afterwards. (I guess the data interval was smaller than ACTIVE_ROUTE_TIMEOUT.) However, I do realize this can create a risk that routes never change because they are considered perfect. That's why chapter 6.10 reads:
A node can maintain accurate information about its continued connectivity to these active next hops, using one or more of the available link or network layer mechanisms, as described below.
- Any suitable link layer notification, such as those provided by IEEE 802.11, can be used to determine connectivity, each time a packet is transmitted to an active next hop. For example, absence of a link layer ACK or failure to get a CTS after sending RTS, even after the maximum number of retransmission attempts, indicates loss of the link to this active next hop.
If a link to the next hop cannot be detected by any of these methods, the forwarding node SHOULD assume that the link is lost, and take corrective action by following the methods specified in Section 6.11.
So you could implement an algorithm that counts the number of subsequent lost ACKs at the MAC layer and then marks the route as invalid if e.g. 3 or more ACKs in a row were dropped. My guess is that this will substantially lower the AODV overhead as less RREQs are needed. The difficulty is of course choosing the right number of lost ACKs before a route is considered invalid, I guess 3 would be fine.
Of course these are just my 2 eurocents, so feel free to react.
-- Bart Braem PATS research group - IBBT Dept. of Mathematics and Computer Sciences University of Antwerp Campus Middelheim, G3.30 Middelheimlaan 1 B-2020 Antwerpen, Belgium Phone: +32 (0)3 265.32.91 Fax: +32 (0)3 265.37.77 Web: www.pats.ua.ac.be