Bug 606

Summary: arp does an ip route lookup
Product: ns-3 Reporter: Mathieu Lacage <mathieu.lacage>
Component: internetAssignee: ns-bugs <ns-bugs>
Status: RESOLVED FIXED    
Severity: critical CC: boyko, tomh
Priority: P2    
Version: ns-3-dev   
Hardware: All   
OS: All   
Attachments: partial solution, see comments
Proposed patch (adds Ipv4::SelectSourceAddress())

Description Mathieu Lacage 2009-06-25 02:35:50 UTC
ARP does _ip_ route lookups for outgoing _arp_ messages. I have to confess
that I find this weird and unexpected. What I would have expected is
that:

1) SendReply uses the 'to' ip of the request as a 'from' address, so, no
lookup needed.

2) SendRequest is given the source address from Lookup

3) the source address is stored in the arp cache entry to give to the
ArpRequestCallback.
Comment 1 Tom Henderson 2009-06-25 12:38:33 UTC
(In reply to comment #0)
> ARP does _ip_ route lookups for outgoing _arp_ messages. I have to confess
> that I find this weird and unexpected. 

Perhaps, but you will also find that Linux Arp calls into the Linux IP routing system too.

> What I would have expected is
> that:
> 
> 1) SendReply uses the 'to' ip of the request as a 'from' address, so, no
> lookup needed.
> 
> 2) SendRequest is given the source address from Lookup

In Linux, arp_solicit() needs to pick source address, so it calls inet_select_addr().  We probably should consider to add this explicit function to Ipv4 but right now it is provided implicitly by the route output call which sets a source address.  We could also possibly change Arp::Lookup() as you suggest to pass this source address in as a parameter.  I don't care too strongly either way.

> 
> 3) the source address is stored in the arp cache entry to give to the
> ArpRequestCallback.
> 

I would be fine with what you suggest but I don't think it is a blocker issue for the release since this is some internal API.
Comment 2 Mathieu Lacage 2009-06-25 14:44:55 UTC
(In reply to comment #1)
> I would be fine with what you suggest but I don't think it is a blocker issue
> for the release since this is some internal API.

agreed. 

Comment 3 Pavel Boyko 2009-08-03 08:18:25 UTC
Created attachment 541 [details]
partial solution, see comments
Comment 4 Pavel Boyko 2009-08-03 08:18:59 UTC
(In reply to comment #0)
> ARP does _ip_ route lookups for outgoing _arp_ messages. I have to confess
> that I find this weird and unexpected. What I would have expected is
> that:
> 
> 1) SendReply uses the 'to' ip of the request as a 'from' address, so, no
> lookup needed.
> 
> 2) SendRequest is given the source address from Lookup
> 
> 3) the source address is stored in the arp cache entry to give to the
> ArpRequestCallback.
> 

This bug is critical for any on-demand routing protocol. Suppose you need to send some control traffic to find valid route, this will need arp request-reply handshake, this needs valid route, this will try to send control traffic and  loops forever.

I have attached simple patch eliminating route lookup in SendArpReply (Mathieu point 1) above) -- please review, this resolves infinite loop issue. Points 2 and 3 remain unfixed. 
Comment 5 Tom Henderson 2009-08-04 00:51:31 UTC
(In reply to comment #4)
> (In reply to comment #0)
> > ARP does _ip_ route lookups for outgoing _arp_ messages. I have to confess
> > that I find this weird and unexpected. What I would have expected is
> > that:
> > 
> > 1) SendReply uses the 'to' ip of the request as a 'from' address, so, no
> > lookup needed.
> > 
> > 2) SendRequest is given the source address from Lookup
> > 
> > 3) the source address is stored in the arp cache entry to give to the
> > ArpRequestCallback.
> > 
> 
> This bug is critical for any on-demand routing protocol. Suppose you need to
> send some control traffic to find valid route, this will need arp request-reply
> handshake, this needs valid route, this will try to send control traffic and 
> loops forever.

aren't route requests typically broadcast or multicast?

Comment 6 Pavel Boyko 2009-08-04 03:13:08 UTC
(In reply to comment #5)
> 
> aren't route requests typically broadcast or multicast?

Sure, but replies are unicast. Frankly speaking loop description above is a little bit simplified -- I can draw exact packet exchange sequence if you need it.  

Comment 7 Tom Henderson 2009-08-06 00:33:10 UTC
(In reply to comment #4)
> (In reply to comment #0)
> > ARP does _ip_ route lookups for outgoing _arp_ messages. I have to confess
> > that I find this weird and unexpected. What I would have expected is
> > that:
> > 
> > 1) SendReply uses the 'to' ip of the request as a 'from' address, so, no
> > lookup needed.
> > 
> > 2) SendRequest is given the source address from Lookup
> > 
> > 3) the source address is stored in the arp cache entry to give to the
> > ArpRequestCallback.
> > 
> 
> This bug is critical for any on-demand routing protocol. Suppose you need to
> send some control traffic to find valid route, this will need arp request-reply
> handshake, this needs valid route, this will try to send control traffic and 
> loops forever.
> 
> I have attached simple patch eliminating route lookup in SendArpReply (Mathieu
> point 1) above) -- please review, this resolves infinite loop issue. Points 2
> and 3 remain unfixed. 
> 

(In reply to comment #6)
> (In reply to comment #5)
> > 
> > aren't route requests typically broadcast or multicast?
> 
> Sure, but replies are unicast. Frankly speaking loop description above is a
> little bit simplified -- I can draw exact packet exchange sequence if you need
> it.  
> 

I believe you.  I suggest we apply your patch and leave the bugs open for 2 and 3 above.

Comment 8 Pavel Boyko 2009-08-06 02:17:13 UTC
> I believe you.  I suggest we apply your patch and leave the bugs open for 2 and
> 3 above.

 Done in 5ef92ccda11a.  

Comment 9 Pavel Boyko 2009-12-10 08:09:16 UTC
This bug is becoming annoying. There is fresh duplicate bug 755 and the same bug is reported by Mariusz Skrocki in his ns-3 vs. JiST/SWANS performance races.

The typical crash scenario is 
 1. send ARP request when route is known and schedule retry
 2. lost the request
 3. route becomes invalid during retry timeout 
 4. ARP request retry will crash with NS_ASSERT (route != 0).    

Mathieu, Tom -- I'd kindly ask you to agree on the way it should be fixed. Now I set "P2" since it is reported by several users and "critical" since it leads to crash.
Comment 10 Tom Henderson 2009-12-10 14:00:14 UTC
(In reply to comment #9)
> This bug is becoming annoying. There is fresh duplicate bug 755 and the same
> bug is reported by Mariusz Skrocki in his ns-3 vs. JiST/SWANS performance
> races.
> 
> The typical crash scenario is 
>  1. send ARP request when route is known and schedule retry
>  2. lost the request
>  3. route becomes invalid during retry timeout 
>  4. ARP request retry will crash with NS_ASSERT (route != 0).    
> 
> Mathieu, Tom -- I'd kindly ask you to agree on the way it should be fixed. Now
> I set "P2" since it is reported by several users and "critical" since it leads
> to crash.

I agree that it should be solved now.  As I said, I would support a patch that passed the address into the request, and cached it, or I would also support some kind of inet_select_addr() function as Linux seems to handle it, that wouldn't cause an assert.

I can take a crack at this tomorrow (fri) if you want, or if you plan to patch it yourself, that's fine too-- let me know your preference.
Comment 11 Tom Henderson 2009-12-10 15:51:34 UTC
*** Bug 755 has been marked as a duplicate of this bug. ***
Comment 12 Tom Henderson 2009-12-10 17:56:24 UTC
Would a solution like this be adequate and appropriate?  We would need a MANET test case to see whether we deal with the "node moves away while ARP is trying to resolve" use case.  I haven't looked yet at whether this source address should be cached by arp.

diff -r f7a4e1b3f632 src/internet-stack/arp-l3-protocol.cc
--- a/src/internet-stack/arp-l3-protocol.cc	Thu Dec 10 12:42:20 2009 +0100
+++ b/src/internet-stack/arp-l3-protocol.cc	Thu Dec 10 14:51:26 2009 -0800
@@ -23,7 +23,6 @@
 #include "ns3/net-device.h"
 #include "ns3/object-vector.h"
 #include "ns3/trace-source-accessor.h"
-#include "ns3/ipv4-route.h"
 
 #include "ipv4-l3-protocol.h"
 #include "arp-l3-protocol.h"
@@ -322,15 +321,14 @@
   header.SetDestination (to);
   Socket::SocketErrno errno_;
   Ptr<Packet> packet = Create<Packet> ();
-  Ptr<Ipv4Route> route = ipv4->GetRoutingProtocol ()->RouteOutput (packet, header, interface, errno_);
-  NS_ASSERT (route != 0);
+  Ipv4Address source = ipv4->InetSelectAddr (cache->GetDevice (), to, RT_SCOPE_LINK);
   NS_LOG_LOGIC ("ARP: sending request from node "<<m_node->GetId ()<<
             " || src: " << cache->GetDevice ()->GetAddress () <<
-            " / " << route->GetSource () <<
+            " / " << source <<
             " || dst: " << cache->GetDevice ()->GetBroadcast () <<
             " / " << to);
   arp.SetRequest (cache->GetDevice ()->GetAddress (),
-		  route->GetSource (),
+		  source,
                   cache->GetDevice ()->GetBroadcast (),
                   to);
   packet->AddHeader (arp);

--- a/src/node/ipv4.h	Thu Dec 10 12:42:20 2009 +0100
+++ b/src/node/ipv4.h	Thu Dec 10 14:51:26 2009 -0800
@@ -216,6 +216,22 @@
   virtual bool RemoveAddress (uint32_t interface, uint32_t addressIndex) = 0;
 
   /**
+   * Similar to Linux inet_select_addr.  If a device is provided but no
+   * unicast Ipv4Address is provided, the system selects any primary 
+   * address on the Ipv4Interface on that device.  If the destination address
+   * provided is non-broadcast, the source address returned will be 
+   * one corresponding to the same subnet (possibly constrained by 
+   * scope or device).
+   *
+   * \param nd output NetDevice (optionally provided, only to constrain the search)
+   * \param dst Destination address 
+   * \param scope Typically covers a site, link, or host
+   * \returns an Ipv4Address that falls within the same subnet as the 
+   *  destination address and with the same or smaller scope 
+   */
+  virtual Ipv4Address InetSelectAddr (Ptr<const NetDevice> nd, Ipv4Address dst,  uint8_t scope);
Comment 13 Pavel Boyko 2009-12-11 01:45:05 UTC
(In reply to comment #12)
> Would a solution like this be adequate and appropriate?  We would need a MANET
> test case to see whether we deal with the "node moves away while ARP is trying
> to resolve" use case.  I haven't looked yet at whether this source address
> should be cached by arp.

  I think this is exactly what is needed now. I will try to commit test case today. 

  Thank you. 

> diff -r f7a4e1b3f632 src/internet-stack/arp-l3-protocol.cc
> --- a/src/internet-stack/arp-l3-protocol.cc    Thu Dec 10 12:42:20 2009 +0100
> +++ b/src/internet-stack/arp-l3-protocol.cc    Thu Dec 10 14:51:26 2009 -0800
> @@ -23,7 +23,6 @@
>  #include "ns3/net-device.h"
>  #include "ns3/object-vector.h"
>  #include "ns3/trace-source-accessor.h"
> -#include "ns3/ipv4-route.h"
> 
>  #include "ipv4-l3-protocol.h"
>  #include "arp-l3-protocol.h"
> @@ -322,15 +321,14 @@
>    header.SetDestination (to);
>    Socket::SocketErrno errno_;
>    Ptr<Packet> packet = Create<Packet> ();
> -  Ptr<Ipv4Route> route = ipv4->GetRoutingProtocol ()->RouteOutput (packet,
> header, interface, errno_);
> -  NS_ASSERT (route != 0);
> +  Ipv4Address source = ipv4->InetSelectAddr (cache->GetDevice (), to,
> RT_SCOPE_LINK);
>    NS_LOG_LOGIC ("ARP: sending request from node "<<m_node->GetId ()<<
>              " || src: " << cache->GetDevice ()->GetAddress () <<
> -            " / " << route->GetSource () <<
> +            " / " << source <<
>              " || dst: " << cache->GetDevice ()->GetBroadcast () <<
>              " / " << to);
>    arp.SetRequest (cache->GetDevice ()->GetAddress (),
> -          route->GetSource (),
> +          source,
>                    cache->GetDevice ()->GetBroadcast (),
>                    to);
>    packet->AddHeader (arp);
> 
> --- a/src/node/ipv4.h    Thu Dec 10 12:42:20 2009 +0100
> +++ b/src/node/ipv4.h    Thu Dec 10 14:51:26 2009 -0800
> @@ -216,6 +216,22 @@
>    virtual bool RemoveAddress (uint32_t interface, uint32_t addressIndex) = 0;
> 
>    /**
> +   * Similar to Linux inet_select_addr.  If a device is provided but no
> +   * unicast Ipv4Address is provided, the system selects any primary 
> +   * address on the Ipv4Interface on that device.  If the destination address
> +   * provided is non-broadcast, the source address returned will be 
> +   * one corresponding to the same subnet (possibly constrained by 
> +   * scope or device).
> +   *
> +   * \param nd output NetDevice (optionally provided, only to constrain the
> search)
> +   * \param dst Destination address 
> +   * \param scope Typically covers a site, link, or host
> +   * \returns an Ipv4Address that falls within the same subnet as the 
> +   *  destination address and with the same or smaller scope 
> +   */
> +  virtual Ipv4Address InetSelectAddr (Ptr<const NetDevice> nd, Ipv4Address
> dst,  uint8_t scope);
Comment 14 Pavel Boyko 2009-12-11 02:26:30 UTC
(In reply to comment #13)
> We would need a MANET test case to see whether we deal with the "node moves away while ARP is trying
> to resolve" use case.

  It happened to be very easy:

$ ./waf configure --enable-examples
$ ./waf --run "aodv --ns3::ArpCache::AliveTimeout=1s --size=3"
Creating 3 nodes 120 m apart.
Starting simulation for 10 s ...
PING  10.0.0.3 56(84) bytes of data.
64 bytes from 10.0.0.3: icmp_seq=0 ttl=63 time=5 ms
64 bytes from 10.0.0.3: icmp_seq=1 ttl=63 time=3 ms
64 bytes from 10.0.0.3: icmp_seq=2 ttl=63 time=3 ms
64 bytes from 10.0.0.3: icmp_seq=3 ttl=63 time=1 ms
assert failed. file=../src/internet-stack/arp-l3-protocol.cc, line=326, cond="route != 0"

  I will convert it to the (crashing) test case in the routing-aodv-regression test suite and push.
Comment 15 Pavel Boyko 2009-12-11 03:18:44 UTC
(In reply to comment #14) 
>   I will convert it to the (crashing) test case in the routing-aodv-regression
> test suite and push.

 Done in [e6a70e535416]. Test suite routing-aodv-regression should crash until this bug will be fixed. After bugfix one should write and commit reference traces for this new test. To do so please:

1. Change const bool WRITE_VECTORS = false; to true in src/routing/aodv/test/aodv-regression.cc
2. Run ./test.py -s routing-aodv-regression
3. Add newly generated traces: hg add src/routing/aodv/test/*.pcap
4. Revert changes in src/routing/aodv/test/aodv-regression.cc
5. Commit and push reference traces.
Comment 16 Tom Henderson 2009-12-12 02:28:26 UTC
Created attachment 698 [details]
Proposed patch (adds Ipv4::SelectSourceAddress())

This proposed fix implements an ns-3 version of Linux inet_select_addr() and uses it in ArpL3Protocol to select a source address.  

It turns the aodv regression failure in test.py from CRASH to FAIL, where the failure is due to differing pcap traces, and all other test results remain good.
Comment 17 Tom Henderson 2009-12-12 02:35:39 UTC
(In reply to comment #15)
> (In reply to comment #14) 
> >   I will convert it to the (crashing) test case in the routing-aodv-regression
> > test suite and push.
> 
>  Done in [e6a70e535416]. Test suite routing-aodv-regression should crash until
> this bug will be fixed. After bugfix one should write and commit reference
> traces for this new test. To do so please:
> 
> 1. Change const bool WRITE_VECTORS = false; to true in
> src/routing/aodv/test/aodv-regression.cc
> 2. Run ./test.py -s routing-aodv-regression
> 3. Add newly generated traces: hg add src/routing/aodv/test/*.pcap
> 4. Revert changes in src/routing/aodv/test/aodv-regression.cc
> 5. Commit and push reference traces.

I can clear the failure with the new patch by following the above steps, so if
you are OK with the patch fix, I'll push it over the weekend.
Comment 18 Tom Henderson 2009-12-14 01:25:24 UTC
should now be fixed with changeset 1a6375ac5431