Bug 737

Summary: Backoff procedure is not invoked when transmission is deferred
Product: ns-3 Reporter: Kirill Andreev <andreev>
Component: wifiAssignee: Nicola Baldo <nicola>
Status: RESOLVED INVALID    
Severity: enhancement CC: alexa.gerancho, boyko, kevjay, ns-bugs, yacoubmassad
Priority: P5    
Version: ns-3-dev   
Hardware: All   
OS: All   
See Also: https://www.nsnam.org/bugzilla/show_bug.cgi?id=912
https://www.nsnam.org/bugzilla/show_bug.cgi?id=1465
https://www.nsnam.org/bugzilla/show_bug.cgi?id=2369
Attachments: Proposed fix
illustration of 1x3 grid scenario

Description Kirill Andreev 2009-11-09 12:41:29 UTC
In accordance with 9.2.8 of 802.11-2007 (Ack procedure) "If a PHY-RXSTART.indication does not occur during the ACKTimeout interval, the
STA concludes that the transmission of the MPDU has failed, and this STA shall invoke its backoff procedure upon expiration of the ACKTimeout interval." The same situation is with CTS timeout.

Current implementation starts a backoff procedure inside DCA-TXOP, but only DcfManager knows about CTS and ACK timeouts.

So, the method like "StartBackoffProcedure" shall be added to DcaTxop and EdcaTxop, but in this case backoff procedure will be initiated from both DcaTxop and DcfManager. What about making DcfManager the only place to start backoff procedure for all queues?
Comment 1 Mathieu Lacage 2009-11-16 10:22:59 UTC
I am sorry but I don't see what the problem is: I believe that we do the right thing with regard to the paragraph you quote from the standard.

Maybe you could try to give a few more details about the failing testcase you have ?
Comment 2 Kirill Andreev 2009-12-02 08:20:05 UTC
I have found more clear explanation of what is wrong in DcfManager:
Chapter 9.1.1 says: "After
deferral, or prior to attempting to transmit again immediately after a successful transmission, the STA shall
select a random backoff interval and shall decrement the backoff interval counter while the medium is idle."
So DcfManager::IsBusy () must check, that AIFSn of a queue that requested access has passed after GetAccessGrantStart () (rather than checking if we are not receiving or transmitting, or NAV busy).

I have observed it while looking at pcap traces in mesh: a station does not calculate a backoff when it retransmits a frame, because frame was retransmitted immediately after receiving, when the medium was idle (medium was idle inside a SIFS before ACK!).

Am I right?
Comment 3 Kirill Andreev 2009-12-02 08:41:05 UTC
Also it seems to me that first version of patch in bug 555 solves this problem
Comment 4 Kirill Andreev 2009-12-02 08:44:46 UTC
Addition to previous commit: but in first patch in bug 555 IsBusy () is not needed
Comment 5 Kirill Andreev 2009-12-02 09:55:12 UTC
Created attachment 689 [details]
Proposed fix
Comment 6 Mathieu Lacage 2009-12-31 05:45:41 UTC
(In reply to comment #2)
> a station does not
> calculate a backoff when it retransmits a frame, because frame was
> retransmitted immediately after receiving, when the medium was idle (medium was
> idle inside a SIFS before ACK!).


I am sorry but I really do not understand this description of your testcase. Please, can you try to provide a more detailed description ?
Comment 7 Kirill Andreev 2010-01-11 10:41:09 UTC
I have made a short illustration about how DcfManager operates when a frame is retransmitted immediately after it was received. running mesh script with 1x3 grid and simple debugging print I have observed, that backoff procedure is not invoked even when station defers its transmission.
Comment 8 Kirill Andreev 2010-01-11 10:42:11 UTC
Created attachment 717 [details]
illustration of 1x3 grid scenario

Illustration
Comment 9 Kirill Andreev 2010-01-25 06:35:49 UTC
Also the main problem of this bug is in broken dcf-manager-test, where each test fails.
Mathieu, could, you please, review my previous comment?
Comment 10 Pavel Boyko 2010-01-25 06:54:31 UTC
  Hi, Mathieu, 

  I'd like to draw your attention to this bug, since it appears to be critical in multihop mesh/manet networks. The reason is that none of our models really account for processing delays and all re-transmissions (say forwarding RREQ in AODV) occur simultaneously. Large number of exactly simultaneous transmissions leads to significantly overestimated collision probability and even to wrong protocol operation. Recent example of this behavior was reported to me recently by Kuba Wierusz.

  Fixing bug 737 gives us simple workaround for this problem without the need of explicit accounting for processing delay. Indeed, now when wifi device is asked to retransmit a packet without any delay it will deffer this for DIFS. After proposed bugfix every deferred TX will start backoff -- exactly what is needed to avoid artificial collisions. 

  What do you think?

  Regards,
  Pavel
Comment 11 Mathieu Lacage 2010-02-26 04:34:05 UTC
(In reply to comment #10)
>   I'd like to draw your attention to this bug, since it appears to be critical
> in multihop mesh/manet networks. The reason is that none of our models really
> account for processing delays and all re-transmissions (say forwarding RREQ in
> AODV) occur simultaneously. Large number of exactly simultaneous transmissions
> leads to significantly overestimated collision probability and even to wrong
> protocol operation. Recent example of this behavior was reported to me recently
> by Kuba Wierusz.

Thanks a lot for the detailed diagram by kiril, I do understand this problem better now. The key issue I was confused about was the term "retransmission" in the context of the MAC. For me, it meant retransmission attempt after a failed transmission. For kirill and you, it means, a MAC-level forwarding attempt.

>   Fixing bug 737 gives us simple workaround for this problem without the need
> of explicit accounting for processing delay. Indeed, now when wifi device is
> asked to retransmit a packet without any delay it will deffer this for DIFS.
> After proposed bugfix every deferred TX will start backoff -- exactly what is
> needed to avoid artificial collisions. 

My initial reaction to this proposed solution is that it is wrong: we should not unconditionally start a backoff after a packet reception: it goes against both the spirit and the letter of the 802.11 spec.

It seems to me that the problem is not that we need to model processing delays: we need to model non-deterministic _varying_ processing delays which change from one station to another, and, potentially, from one packet to another within the same station, right ? If so, I would support adding a delay in MacLow when we receive a packet before forwarding it to the upper layers and making that delay be picked from a RandomVariable with a default value of being a gaussian distribution centered around 10us with a non-zero value for the variance. I will ask a collegue what a decent value would be for the mean/variance to model some PC-class hardware.
Comment 12 Pavel Boyko 2010-02-26 05:44:01 UTC
> My initial reaction to this proposed solution is that it is wrong: we should
> not unconditionally start a backoff after a packet reception: 

  We do not propose to unconditionally start backoff after a packet reception. We propose to start backoff for _all_ deferred transmissions including the ones deferred because DIFS is not passed yet (as in the case of too-fast-forwarding).   

> it goes against both the spirit and the letter of the 802.11 spec.

  I don't think so. Take a look at 9.1.1 of 802.11-2007: "After deferral, or prior to attempting to transmit again immediately after a  successful transmission, the STA shall select a random backoff interval and shall decrement the backoff interval counter while the medium is idle." To understand that "deferral" means "medium was busy of DIFS wasn't passed" take a look at Fig. 9.3 ibid: "Defer access interval" = "Medium busy" + "DIFS".
 
> It seems to me that the problem is not that we need to model processing delays:
> we need to model non-deterministic _varying_ processing delays which change
> from one station to another, and, potentially, from one packet to another
> within the same station, right ? If so, I would support adding a delay in
> MacLow when we receive a packet before forwarding it to the upper layers and
> making that delay be picked from a RandomVariable with a default value of being
> a gaussian distribution centered around 10us with a non-zero value for the
> variance. I will ask a collegue what a decent value would be for the
> mean/variance to model some PC-class hardware.

  Sure you are right that good solution is to start account for processing delays. But I am very uncomfortable with all ad-hoc solutions in this field. Why 10 us? Why gaussian (saying nothing about negative delays)? Why "some PC-class"? Why at wifi/mac-low? What about Ethernet, wimax, and all future models? 

  I propose to a) apply suggested DCF patch b) start public discussion of modeling processing delays in ns-3.
Comment 13 Mathieu Lacage 2010-02-26 07:43:58 UTC
(In reply to comment #12)
> > My initial reaction to this proposed solution is that it is wrong: we should
> > not unconditionally start a backoff after a packet reception: 
> 
>   We do not propose to unconditionally start backoff after a packet reception.
> We propose to start backoff for _all_ deferred transmissions including the ones
> deferred because DIFS is not passed yet (as in the case of
> too-fast-forwarding).   
> 
> > it goes against both the spirit and the letter of the 802.11 spec.
> 
>   I don't think so. Take a look at 9.1.1 of 802.11-2007: "After deferral, or
> prior to attempting to transmit again immediately after a  successful
> transmission, the STA shall select a random backoff interval and shall
> decrement the backoff interval counter while the medium is idle." To understand
> that "deferral" means "medium was busy of DIFS wasn't passed" take a look at
> Fig. 9.3 ibid: "Defer access interval" = "Medium busy" + "DIFS".

That is not fully correct. See section 9.2.4:

A STA desiring to initiate transfer [...] shall invoke the CS mechanism [...] to determine the busy/idle state of the medium. If the medium is busy, the STA shall defer until the medium is determined to be idle without interruption for a period of time equal to DIFS [...]. After this DIFS [...] medium idle time, the STA shall then generate a random backoff period [...] before transmitting, unless the backoff timer already contains a nonzero value, in which case the selection of a random number is not needed and not performed.

Note, specifically, the last part of the last sentence: "unless the backoff timer already contains a nonzero value" which is precisely what is happening here.

> > It seems to me that the problem is not that we need to model processing delays:
> > we need to model non-deterministic _varying_ processing delays which change
> > from one station to another, and, potentially, from one packet to another
> > within the same station, right ? If so, I would support adding a delay in
> > MacLow when we receive a packet before forwarding it to the upper layers and
> > making that delay be picked from a RandomVariable with a default value of being
> > a gaussian distribution centered around 10us with a non-zero value for the
> > variance. I will ask a collegue what a decent value would be for the
> > mean/variance to model some PC-class hardware.
> 
>   Sure you are right that good solution is to start account for processing
> delays. But I am very uncomfortable with all ad-hoc solutions in this field.
> Why 10 us? Why gaussian (saying nothing about negative delays)? Why "some
> PC-class"? Why at wifi/mac-low? What about Ethernet, wimax, and all future
> models? 

Oops, I removed the relevant part from my initial comment: that delay would model the interrupt latency between the MacLow and the higher-level layers which is something on the order of 10us on PC-style hardware with a nice RTOS (and closer to something like 10ms with a standard linux OS but with a very high variance). And, of course, you need to make that delay non-negative but that is a detail.

>   I propose to a) apply suggested DCF patch b) start public discussion of
> modeling processing delays in ns-3.

I would be fine with a generic discussion about this (b) but I do not think it is needed to deal with this issue.
Comment 14 Kirill Andreev 2010-02-26 08:42:20 UTC
(In reply to comment #13)
> (In reply to comment #12)

> 
> That is not fully correct. See section 9.2.4:
> 
> A STA desiring to initiate transfer [...] shall invoke the CS mechanism [...]
> to determine the busy/idle state of the medium. If the medium is busy, the STA
> shall defer until the medium is determined to be idle without interruption for
> a period of time equal to DIFS [...]. After this DIFS [...] medium idle time,
> the STA shall then generate a random backoff period [...] before transmitting,
> unless the backoff timer already contains a nonzero value, in which case the
> selection of a random number is not needed and not performed.
> 
> Note, specifically, the last part of the last sentence: "unless the backoff
> timer already contains a nonzero value" which is precisely what is happening
> here.
> 
So, in addition, we must check here, that backoff counter for a given queue is zero, which is performed. So, I can not understand, where is an error in this patch.
Comment 15 Mathieu Lacage 2010-02-26 08:55:48 UTC
(In reply to comment #14)

> So, in addition, we must check here, that backoff counter for a given queue is
> zero, which is performed. So, I can not understand, where is an error in this
> patch.

Where are you doing this ? Which patch are you talking about ?
Comment 16 Mathieu Lacage 2010-02-26 09:06:09 UTC
(In reply to comment #8)
> Created an attachment (id=717) [details]
> illustration of 1x3 grid scenario
> 
> Illustration

When is the RequestAccess method called in this scenario ?
Comment 17 Kirill Andreev 2010-02-26 09:07:56 UTC
(In reply to comment #15)
> (In reply to comment #14)
> 
> > So, in addition, we must check here, that backoff counter for a given queue is
> > zero, which is performed. So, I can not understand, where is an error in this
> > patch.
> 
> Where are you doing this ? Which patch are you talking about ?

The following patch:

diff -r ed0b2d9301a1 src/devices/wifi/dcf-manager.cc
--- a/src/devices/wifi/dcf-manager.cc    Tue Dec 01 18:34:11 2009 +0300
+++ b/src/devices/wifi/dcf-manager.cc    Wed Dec 02 17:55:15 2009 +0300
@@ -375,7 +375,7 @@
    * by notifying the collision to the user.
    */
   if (state->GetBackoffSlots () == 0 && 
-      IsBusy ())
+      GetBackoffStartFor (state) > Simulator::Now ())
     {
       MY_DEBUG ("medium is busy: collision");
       /* someone else has accessed the medium.

When we request access, we check, that backoff counter is zero, and, if zero,
check, where a given queue may start to transmit (taking into account difs,
eifs, etc (GetAccessGrantStart is called)). If queue may start to transmit
immediately, we do not start backoff, and start it otherwise. Note, that we do
not start backoff twice.
Comment 18 Kirill Andreev 2010-02-26 09:09:35 UTC
(In reply to comment #16)
> (In reply to comment #8)
> > Created an attachment (id=717) [details] [details]
> > illustration of 1x3 grid scenario
> > 
> > Illustration
> 
> When is the RequestAccess method called in this scenario ?

Exactly after RX, because a frame to be forwarded goes through the upper layer immediately
Comment 19 Mathieu Lacage 2010-02-26 09:31:15 UTC
(In reply to comment #18)
> (In reply to comment #16)
> > (In reply to comment #8)
> > > Created an attachment (id=717) [details] [details] [details]
> > > illustration of 1x3 grid scenario
> > > 
> > > Illustration
> > 
> > When is the RequestAccess method called in this scenario ?
> 
> Exactly after RX, because a frame to be forwarded goes through the upper layer
> immediately

Are you _sure_ that your backoff slots are zero when RequestAccess is called in your testcase ?
Comment 20 Mathieu Lacage 2010-02-26 09:43:28 UTC
(In reply to comment #19)
> Are you _sure_ that your backoff slots are zero when RequestAccess is called in
> your testcase ?

If they are zero, what is the NAV status in IsBusy ? Does MacLow::NotifyNav call MacLow::DoNavStartNow just before the call to RequestAccess ?
Comment 21 Kirill Andreev 2010-02-26 10:12:01 UTC
(In reply to comment #20)
> (In reply to comment #19)
> > Are you _sure_ that your backoff slots are zero when RequestAccess is called in
> > your testcase ?
> 
> If they are zero, what is the NAV status in IsBusy ? Does MacLow::NotifyNav
> call MacLow::DoNavStartNow just before the call to RequestAccess ?

I have mistaken with the testcase. Suppose the same situation with forwarding broadcast frame.
1. NAV will be zero immediately after RX, medium is idle
2. Backoff slots will be zero (last transmit was long time ago).
The debug output is the following:

RX end OK at 81757537ns this = 0x809d5b8 (from DcfManager::NotifyRxEndOkNow)
Request access at 81757537ns, this = 0x809d5b8 (from DcfManager::RequestAccess)
remaining slots are:0 (from DcfManager::RequestAccess(state))
lastNavend is81757537ns, this = 0x809d5b8 (from IsBusy ())

So, broadcast is forwarded without backoff.
Comment 22 Kirill Andreev 2010-02-26 10:40:08 UTC
The same situation occurs with uniocast, because NAV is not set if hdr.GetAddr1 () != m_self
Comment 23 Pavel Boyko 2010-03-03 07:18:53 UTC
  Mathieu,

> Oops, I removed the relevant part from my initial comment: that delay would
> model the interrupt latency between the MacLow and the higher-level layers
> which is something on the order of 10us on PC-style hardware with a nice RTOS
> (and closer to something like 10ms with a standard linux OS but with a very
> high variance). And, of course, you need to make that delay non-negative but
> that is a detail.

  After some thoughts I definitely agree on adding an (adjustable) interrupt latency with some meaningful default numbers to wifi low mac. Could you do this?

  The question of changing/not changing backoff logic as proposed by Kirill remains.
Comment 24 Nicola Baldo 2010-05-14 11:33:50 UTC
some time has passed, so I'll try to wrap up the discussion:

1) as for the backoff behavior, my understanding is that the arguments for applying the proposed patch are not convincing, so I am closing the bug.

2) I just filed the new bug 912 to keep track of the issue of modeling the processing delays. Please continue the discussion there if you are interested.
Comment 25 Alexa 2014-02-02 20:55:04 UTC
*** Bug 260998 has been marked as a duplicate of this bug. ***
Seen live from the domain http://volichat.com/adult-chat-rooms
Marked for reference. Resolved as fixed @bugzilla.