Bug 991

Summary: InterferenceHelper assertion failure (cond="current >= previous")
Product: ns-3 Reporter: Quincy Tse <quincy.tse>
Component: wifiAssignee: Nicola Baldo <nicola>
Status: RESOLVED FIXED    
Severity: major CC: andreev, ns-bugs, quincy.tse
Priority: P3    
Version: ns-3-dev   
Hardware: All   
OS: All   
URL: https://groups.google.com/group/ns-3-users/browse_thread/thread/16e032696f2fe5dc
Attachments: Test case - extension to existing wifi-test.cc
Patch against existing wifi-test.cc
Patch to fixproblem associated with test case.
Patch to fixproblem associated with test case.
One more fix

Description Quincy Tse 2010-09-10 00:37:06 UTC
Under some circumstances, the assertion (current >= previous) in InterferenceHelper::CalculatePer fails.

Other details available on Google Groups thread https://groups.google.com/group/ns-3-users/browse_thread/thread/16e032696f2fe5dc.

Test code reproducing problem not yet available.
Comment 1 Quincy Tse 2010-09-10 00:44:49 UTC
By placing asserts around the file, I've isolated that InterferenceHelper::CalculateNoiseInterferenceW may construct an NiChanges that is out of order.

Assertion check on the ordering of m_niChanges passed.
Assertion check on ni after the for loop passed (as expected)

Assertion FAIL after inserting the start NiChange.

Assertion passed after appending the final NiChange (if the above insert was commented out).

So - it seems the assumption that the start of the event is always before/the same time as the list of NiChanges may be false.
Comment 2 Quincy Tse 2010-09-10 02:28:07 UTC
The following sequence of event (at YansWifiPhy) probably (haven't confirmed yet) cause this assertion failure:

YansWifiPhy::StartReceivePacket, packet dropped (never went into RX state)
YansWifiPhy::StartReceivepacket, interface went into RX state, before or after the previous packet
No further invocation of YansWifiPhy::StartReceivePacket before finish receiving the 2nd packet.

State of m_niChanges in this case, after all StartReceivePacket have completed but before YansWifiPhy::EndReceive (noting only the time of the NiChange events):
i)   {start pkt1, end pkt1, start pkt2, end pkt2} OR
ii)  {start pkt1, start pkt2, end pkt1, end pkt2} OR
iii) {start pkt1, start pkt2, end pkt2, end pkt1}.

In case i):
After the loop, ni is:
{end pkt1, start pkt2, end pkt2}

After the insert:
{m_firstPower, end pkt1, start pkt2, end pkt2} <- the m_firstPower is tagged with start pkt2's event time, making  ni out of order.
----------------
In summary:
* InterferenceHelper::CalculateNoiseInterferenceW expects m_niChanges to start with the latest "add noise" event, with all previous events aggregated into m_firstPower.
* InterferenceHelper::AppendEvent clean up m_niChanges to fit the expectation only when it's receiving, therefore there is a chance that there may be some sequence of noise events before the latest "add noise" event, which occurs when no "add noise" event happen once the interface started receiving.
Comment 3 Quincy Tse 2010-09-10 04:08:03 UTC
I'm still trying to work out how to write the test case (haven't written one for NS-3 yet).

In terms of bug fix - I think we can probably just change AppendEvent to clean up m_niChanges every time AppendEvent is called, but i have no idea why the separate case for m_rxing is there.

I also think maybe NiChanges should also be a std::list<NiChange> instead of a std::vector - I don't seem to see any random access to instances of it, and a std::list might make deletion a bit more efficient. (Would the efficiency concern be the reason why we don't clean up m_niChanges every time?)
Comment 4 Nicola Baldo 2010-09-10 05:57:42 UTC
Quincy, thank you very much for your effort in isolating this bug.

Kirill, you are the author of the latest InterferenceHelper code (see bug 799), can you please give a look?
Comment 5 Kirill Andreev 2010-09-10 06:32:26 UTC
(In reply to comment #4)
> Quincy, thank you very much for your effort in isolating this bug.
> 
> Kirill, you are the author of the latest InterferenceHelper code (see bug 799),
> can you please give a look?

Yes, that is my bug, I accept it
Comment 6 Kirill Andreev 2010-09-10 06:53:44 UTC
(In reply to comment #3)
> I'm still trying to work out how to write the test case (haven't written one
> for NS-3 yet).
> 
> In terms of bug fix - I think we can probably just change AppendEvent to clean
> up m_niChanges every time AppendEvent is called, but i have no idea why the
> separate case for m_rxing is there.
> 


Adding special case of m_rxing is need for the following:
  - when you are not receiving a packet, you need not remember
    all NIChanges before current moment.
  - when you are receiving a packet, you need to remember all
    NiChanges to calculate PER properly.
So, m_rxing keeps a list of NiChanges as small as possible.

All events that happen in the future MUST be stored in NiChanges.



> I also think maybe NiChanges should also be a std::list<NiChange> instead of a
> std::vector - I don't seem to see any random access to instances of it, and a
> std::list

std::list was rejected because it works too slow, vector keeps sorted by time NI changes, and going through the list is slower than going through the vector (GetPosition method loses performance if you use a list, because it uses binary search. I have played with profiler a lot, and I have tried to use a list. Deleting the element kills performance much less than binary search improves it).


> might make deletion a bit more efficient. (Would the efficiency
> concern be the reason why we don't clean up m_niChanges every time?)
Comment 7 Kirill Andreev 2010-09-10 07:02:42 UTC
(In reply to comment #2)
> The following sequence of event (at YansWifiPhy) probably (haven't confirmed
> yet) cause this assertion failure:
> 
> YansWifiPhy::StartReceivePacket, packet dropped (never went into RX state)
> YansWifiPhy::StartReceivepacket, interface went into RX state, before or after
> the previous packet
> No further invocation of YansWifiPhy::StartReceivePacket before finish
> receiving the 2nd packet.
> 
> State of m_niChanges in this case, after all StartReceivePacket have completed
> but before YansWifiPhy::EndReceive (noting only the time of the NiChange
> events):
> i)   {start pkt1, end pkt1, start pkt2, end pkt2} OR
> ii)  {start pkt1, start pkt2, end pkt1, end pkt2} OR
> iii) {start pkt1, start pkt2, end pkt2, end pkt1}.
> 
> In case i):
> After the loop, ni is:
> {end pkt1, start pkt2, end pkt2}
> 
> After the insert:
> {m_firstPower, end pkt1, start pkt2, end pkt2} <- the m_firstPower is tagged
> with start pkt2's event time, making  ni out of order.
> ----------------
> In summary:
> * InterferenceHelper::CalculateNoiseInterferenceW expects m_niChanges to start
> with the latest "add noise" event, with all previous events aggregated into
> m_firstPower.
> * InterferenceHelper::AppendEvent clean up m_niChanges to fit the expectation
> only when it's receiving, therefore there is a chance that there may be some
> sequence of noise events before the latest "add noise" event, which occurs when
> no "add noise" event happen once the interface started receiving.

Well, it is not so easy to understand what is really happens, test scenario will help a lot.
Comment 8 Quincy Tse 2010-09-10 08:20:38 UTC
Writing test case isn't my strong point (it'll probably take me quite a while as I haven't done one in NS3 before).

Scenario:
1. At time t0, packet1 (with duration t1) arrives at a node, does not trigger a transition to RX state (eg. too low Rx Power), then
2. At time t2 > t0+t1, pkt2 (with duration t3) arrive at the node with sufficient power to cause transition to RX state (node should be at CCA_BUSY or IDLE state before)
3. No other packets arrives.

Just after step 1, m_niChanges is the vector consisting of
{(t0, power of pkt1), (t0+t1, -power pkt1)}, Simulation::Now() is t0.

Just after step 2, m_niChanges is
{(t0, power of pkt1), (t0+t1, -power of pkt1), (t2, power of pkt2), (t2+t3, -power of pkt2)}, Simulation::Now() is t2.

At time t2+t3, YanWifiPhy::EndReceive is called (for finish receiving pkt2).
InterferenceHelper::CalculateNoiseInterferenceW is eventually called.

CalculateNoiseInterferenceW goes through a for loop from (m_niChanges.begin() + 1) to the NiChange corresponding to the end of pkt2

At the end of the for loop in InterferenceHelper::CalculateNoiseInterferenceW, *ni is:
{(t0+t1, -power of pkt1), (t2, power of pkt2)}

CalculateNoiseInterferenceW then insert a NiChange at the start of the vector with time t2 to account for m_firstPower, giving:
{(t2, m_firstPower), (t0+t1, -power of pkt1), (t2, power of pkt2)}
*** Note that the ordering is now incorrect!! ***

Finally, CalculateNoiseInterferenceW appends a final NiChange, giving *ni:
{(t2,...), (t0+t1,...), (t2,...), (t2+t3, 0)}
Comment 9 Quincy Tse 2010-09-10 08:23:15 UTC
Also, I'll be away from my computer until Tuesday (GMT+10) - my ability to continue work on this will be limited during this time.
Comment 10 Quincy Tse 2010-09-14 01:49:05 UTC
Sorry the scenario I described isn't the problem (I accidentally transposed the time when m_rxing gets updated when reading the code). Comment #1 is still correct though.

Below, I note NiChange as "(truncated time in hex, delta)"

From analysing my program as it crashed (at InterferenceHelper::CalculatePer:

== m_niChanges is {(0bebc217, 6.447e-11), (0bed67f7, -6.447e-11), (11e1a317, 6.447e-11), (11e348f7, -6.447e-11)}
== m_firstPower is 0
== event corresponds is at time 11e1a317 (corresponds to 3rd item on the vector)
== *ni is {(11e1a317, 0), (0bed67f7, -6.447e-11), (11e1a317, 6.447e-11), (11e348f7, 0)}

This seems to suggests that at the time my simulation crashed, one packet (time 0bebc217--0bed67f7) had finished receiving, but somehow stayed in m_niChanges, causing the incorrect ordering.

Maybe m_rxing somehow didn't get updated?
Comment 11 Quincy Tse 2010-09-14 02:57:16 UTC
Created attachment 969 [details]
Test case - extension to existing wifi-test.cc

Found one reason why m_rxing might not have been updated (I don't think it's the cause of my problem though - I'll have to try it out).

In the attached test case, the following happened:
Node (called rxOnly) started receiving packet
rxOnly changed channel before finishing the packet
rxOnly tried unsuccessfully to receive packet (power too low)
rxOnly started receiving packet
rxOnly tries to calculate PER when finished receiving packet.

This scenario is unlikely to my the problem affecting my code as I do not change channels in my code.
Comment 12 Quincy Tse 2010-09-14 03:00:49 UTC
Created attachment 970 [details]
Patch against existing wifi-test.cc
Comment 13 Quincy Tse 2010-09-14 03:23:52 UTC
Created attachment 971 [details]
Patch to fixproblem associated with test case.

I can't test the patch out yet - I updated to the newest changesets and NS3 doesn't compile.
Comment 14 Quincy Tse 2010-09-14 23:18:42 UTC
Created attachment 972 [details]
 Patch to fixproblem associated with test case.

Moved the m_interference.NotifyRxEnd() call to under the case where the PHY is in RX state only (unnecessary for all other cases). This should help detect inconsistencies between PHY and InterferenceHelper states (if exists).
Comment 15 Quincy Tse 2010-09-15 00:02:16 UTC
Confirmed that the patch (Attachment #972 [details]) fixes the error in the channel switching case.

My code is still crashing though - I will keep on trying...
Comment 16 Quincy Tse 2010-09-15 01:07:01 UTC
(In reply to comment #15)
> My code is still crashing though - I will keep on trying...

Found the error with my simulation crashing - the problem was on my side and has nothing to do with this bug.

I can't identify and further problems with the InterferenceHelper code.
Comment 17 Kirill Andreev 2010-09-16 05:39:35 UTC
(In reply to comment #15)
> Confirmed that the patch (Attachment #972 [details]) fixes the error in the channel

Actually, I did not test the case of channel switching.
> switching case.
> 
> My code is still crashing though - I will keep on trying...

Is it possible to send your code to me? Or it needs a lot of changes in NS-3 code? Your code will help me a lot.
Comment 18 Kirill Andreev 2010-09-16 06:04:06 UTC
(In reply to comment #17)
> (In reply to comment #15)
> > Confirmed that the patch (Attachment #972 [details] [details]) fixes the error in the channel
> 
> Actually, I did not test the case of channel switching.
> > switching case.
> > 
> > My code is still crashing though - I will keep on trying...
> 
> Is it possible to send your code to me? Or it needs a lot of changes in NS-3
> code? Your code will help me a lot.

Sorry, I did not see the testcase.
Comment 19 Kirill Andreev 2010-09-16 08:15:42 UTC
Created attachment 979 [details]
One more fix

I think, that the problem arises when we do not cancel m_rxing flag at the channel swithcing. I have changed EraseAllEvents instead of changing yans-wifi-phy. Thank you very much for proposed test-case. This test works with patch provided.
Comment 20 Nicola Baldo 2010-11-17 11:28:57 UTC
Many thanks to both Quincy and Kirill for the very good work on this bug!

changeset:   6665:27dab861cb2a
user:        Quincy Tse <quincy.tse@nicta.com.au>
date:        Wed Nov 17 17:26:23 2010 +0100
summary:     TestCase for bug 991

changeset:   6666:21a9b1879f27
tag:         tip
user:        Kirill Andreev <andreev@iitp.ru>
date:        Wed Nov 17 17:30:19 2010 +0100
summary:     fixed bug 991