Bug 740

Summary: OLSR MprCompute () works wrong
Product: ns-3 Reporter: Kirill Andreev <andreev>
Component: routingAssignee: ns-bugs <ns-bugs>
Status: RESOLVED FIXED    
Severity: normal CC: boyko, gjcarneiro, tomh
Priority: P5    
Version: ns-3-dev   
Hardware: All   
OS: All   
Attachments: Test-case
Propsed fix
Test-case
Fix #2
Fix #2, to be applied above first fix.

Description Kirill Andreev 2009-11-12 11:10:36 UTC
Created attachment 650 [details]
Test-case

A simple tes-tcase (3x3 grid) was conducted and MPRs are selelcted wrong. Test-case is attached and proposal fix is attached
Comment 1 Kirill Andreev 2009-11-12 11:11:02 UTC
Created attachment 651 [details]
Propsed fix
Comment 2 Kirill Andreev 2009-11-12 11:14:05 UTC
Created attachment 652 [details]
Test-case
Comment 3 Tom Henderson 2009-11-12 16:20:26 UTC
(In reply to comment #2)
> Created an attachment (id=652) [details]
> Test-case

You seemingly read my mind-- I wanted to create a similar test case and rename the existing one to OLSR header (as you did), based on the previous bug 733 that was fixed this week.  The only different that I had in mind would be to instead export a trace source for "MprChanges" that exported the MPR set every time it changed, and hook that.  But the existing GetMprSet () works fine too.

+1
Comment 4 Gustavo J. A. M. Carneiro 2009-11-13 09:32:58 UTC
Hi, I am the (mostly-absent) maintainer of the OLSR code, but I see the patch was committed without me having time to even look at the issue, much less approve.  While I love that a unit test was added, I have serious doubts about the patch:

     2.1 --- a/src/routing/olsr/olsr-routing-protocol.cc	Thu Nov 12 14:08:51 2009 +0100
     2.2 +++ b/src/routing/olsr/olsr-routing-protocol.cc	Fri Nov 13 11:47:02 2009 +0300
     2.3 @@ -716,6 +716,10 @@
     2.4                if (max == NULL || nb_tuple->willingness > max->willingness)
     2.5                  {
     2.6                    max = nb_tuple;
     2.7 +                  for (TwoHopNeighborSet::iterator newCovered = N2.begin (); newCovered != N2.end (); newCovered++)
     2.8 +                    {
     2.9 +                      coveredTwoHopNeighbors.insert (newCovered->twoHopNeighborAddr);
    2.10 +                    }
    2.11                    max_r = r;
    2.12                  }
    2.13                else if (nb_tuple->willingness == max->willingness)

We are inside a typical for loop that searches for a "maximum" on a set of values.  Your change assumes that, each time a new maximum is found it will be selected, and immediately updates the coveredTwoHopNeighbors with the assumption that the candidate will be selected.  However, as the search for maximum continues, a new maximum may be found...  This leads to potentially prematurely considering some 2-hop neighbors already covered when in fact they are not.

    2.14 @@ -740,11 +744,12 @@
    2.15        if (max != NULL)
    2.16          {
    2.17            mprSet.insert (max->neighborMainAddr);
    2.18 -          for (TwoHopNeighborSet::iterator twoHopNeigh = N2.begin ();
    2.19 -               twoHopNeigh != N2.end (); )
    2.20 +          // Remove the nodes from N2 which are now covered by a node in the MPR set.
    2.21 +          for (TwoHopNeighborSet::iterator twoHopNeigh = N2.begin (); twoHopNeigh != N2.end (); )
    2.22              {
    2.23 -              if (twoHopNeigh->neighborMainAddr == max->neighborMainAddr)
    2.24 +              if (coveredTwoHopNeighbors.find (twoHopNeigh->twoHopNeighborAddr) != coveredTwoHopNeighbors.end ())
    2.25                  {
    2.26 +                  NS_LOG_LOGIC ("2-hop neigh. " << twoHopNeigh->twoHopNeighborAddr << " is already covered by an MPR.");
    2.27                    twoHopNeigh = N2.erase (twoHopNeigh);
    2.28                  }
    2.29                else

This code makes use of coveredTwoHopNeighbors.  In the previous version only N2 was updated and consulted in this last part of the code (step 4 of the algorythm).  It was assumed that a two-hop neighbor is covered if it is not listed in N2 anymore, so there really was no need to update both coveredTwoHopNeighbors and N2 at the same time, right?

It was never explained in this bug report what MPR set was being selected and which one should in theory have been selected.  Keep in mind that OLSR contains an heuristic for find a _good_ MPR set; it is not guaranteed to find the _best_ MPR set.  Find the best set is an NP-complete problem...
Comment 5 Pavel Boyko 2009-11-17 07:49:38 UTC
  Hi, Gustavo,

  I will support this bug while Kirill is on the vacation. 

(In reply to comment #4)
> Hi, I am the (mostly-absent) maintainer of the OLSR code, but I see the patch
> was committed without me having time to even look at the issue, much less
> approve.  

  I'm terribly sorry, we will never do it again.

> While I love that a unit test was added, I have serious doubts about
> the patch:

  I agree with you that proposed (and commited) "fix" is completely wrong. 

  As I can see, the real error was in the lines:

          for (TwoHopNeighborSet::iterator twoHopNeigh = N2.begin (); twoHopNeigh != N2.end (); )
            {
              if (twoHopNeigh->neighborMainAddr == neighbor->neighborMainAddr)
                {
                  NS_LOG_LOGIC ("2-hop neigh. " << twoHopNeigh->twoHopNeighborAddr << " is already covered by an MPR.");
                  twoHopNeigh = N2.erase (twoHopNeigh);
                }
              else
                {
                  twoHopNeigh++;
                }
            }

  you were used to remove covered 2-hop neighbors from N2 set. This code ignores the fact that N2 set is made of pairs {neighbor, two hop neighbor} and that the same 2-hop neighbor can appear multiple times in N2 with different 1-hop neighbors. All this records must be deleted when two hop neighbor is covered by MPR. 

  The simplest topology which illustrates this is 

  O -- A
  |    |
  B -- C  

  where O selects MPR set and N = {A, B} N2 = {{A, C}, {B, C}}. When A is selected as MPR your code will remove tuple {A, C} from N2 but will not remove record {B, C} and this will case B to be erroneously selected as the second MPR.

  The proposed fix-to-fix is attached (It should be applied to ns-3-dev as is), please take a look.

> It was never explained in this bug report what MPR set was being selected and
> which one should in theory have been selected. 

  I agree that current unit test topology is overcomplicated. If you don't object I will simplify existing unit test to reproduce minimal topology drawn above and add some more MPR selection tests. You can do this by yourself if you like to.  

>  Keep in mind that OLSR contains
> an heuristic for find a _good_ MPR set; it is not guaranteed to find the _best_
> MPR set.  Find the best set is an NP-complete problem...

  I know. Now we are talking about correct implementation of recommended heuristic from RFC 3626 8.3.1.
Comment 6 Pavel Boyko 2009-11-17 07:50:01 UTC
Created attachment 655 [details]
Fix #2
Comment 7 Pavel Boyko 2009-11-17 07:52:33 UTC
Created attachment 656 [details]
Fix #2, to be applied above first fix.
Comment 8 Gustavo J. A. M. Carneiro 2009-11-18 05:36:51 UTC
*sigh* I am stuck spending my time fixing Python bindings, so it might take a while for me to get to this.
Comment 9 Pavel Boyko 2009-11-18 05:41:05 UTC
(In reply to comment #8)
> *sigh* I am stuck spending my time fixing Python bindings, so it might take a
> while for me to get to this.

  No problem. Meanwhile, can I commit new OLSR unit & system tests (not fixes just tests :) without your explicit permission?
Comment 10 Gustavo J. A. M. Carneiro 2009-11-18 05:49:40 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > *sigh* I am stuck spending my time fixing Python bindings, so it might take a
> > while for me to get to this.
> 
>   No problem. Meanwhile, can I commit new OLSR unit & system tests (not fixes
> just tests :) without your explicit permission?

Sure.  Just please leave the MprCompuation code intact for a while until I have time to review it.  It is a very complicated piece of code, not easy to review it... :-/
Comment 11 Gustavo J. A. M. Carneiro 2009-11-19 06:39:46 UTC
Comment on attachment 656 [details]
Fix #2, to be applied above first fix.

Cristal clear patch.  Please commit.