Bugzilla – Bug 1022
Possibly inappropriate ASSERT in tcp-socket-impl.cc
Last modified: 2010-11-17 19:24:24 UTC
In a long-running sim, the following assert failed -- lines 1516-1519 in tcp-socket-impl.cc, version 3.9 released: //buffer this, it'll be read by call to Recv UnAckData_t::iterator i = m_bufferedData.find (tcpHeader.GetSequenceNumber () ); NS_ASSERT(i == m_bufferedData.end ()); //no way it should have been found This is in TcpSocketImpl::NewRx(), when it handles a duplicate packet. I gather m_bufferedData holds data that the tcp stack has received from the peer, but which the application hasn't read yet. So this assert fails when the stack receives a duplicate packet before the application reads the original packet. Granted that's unlikely, but is that really an error? Doesn't that just mean that the application is slow? Or (as I think happened in our case) the network was slow and the original packet and the duplicate arrived back-to-back? So I think this assert isn't appropriate. If that condition is true, the stack should either discard the newly-arrived duplicate, or replace the original with the duplicate. I think that if we just remove the assert (and the iterator declaration) the subsequent code will do the latter, and the Ptr<> mechanism will release the original packet without a memory leak. When we commented out that assert out and re-ran the sim, it completed without a problem.
(In reply to comment #0) > In a long-running sim, the following assert failed -- lines 1516-1519 in > tcp-socket-impl.cc, version 3.9 released: > > //buffer this, it'll be read by call to Recv > UnAckData_t::iterator i = > m_bufferedData.find (tcpHeader.GetSequenceNumber () ); > NS_ASSERT(i == m_bufferedData.end ()); //no way it should have been found > > This is in TcpSocketImpl::NewRx(), when it handles a duplicate packet. > > I gather m_bufferedData holds data that the tcp stack has received from the > peer, but which the application hasn't read yet. So this assert fails when the > stack receives a duplicate packet before the application reads the original > packet. > > Granted that's unlikely, but is that really an error? Doesn't that just mean > that the application is slow? Or (as I think happened in our case) the network > was slow and the original packet and the duplicate arrived back-to-back? > > So I think this assert isn't appropriate. If that condition is true, the stack > should either discard the newly-arrived duplicate, or replace the original with > the duplicate. I think that if we just remove the assert (and the iterator > declaration) the subsequent code will do the latter, and the Ptr<> mechanism > will release the original packet without a memory leak. > > When we commented out that assert out and re-ran the sim, it completed without > a problem. I tend to agree with you. How long is your long-running test case (bytes transferred, link data rate, propagation delay)?
We're simulating a p2p sharing network with 800 nodes in 5 point-to-point-star clusters. 4 clusters are 100Gbps; 1 is 200 Kbps. We do not simulate packet loss, but (for obvious reasons!) the peers on the 200 Kbps cluster experience considerable delays. The assert failed after about 1200 sec of sim time, or about a day of wall-clock time. Lord only knows how many gigabytes were transferred. When we removed the assert and tried again, it ran for almost 2 days of real time, and terminated cleanly at our scheduled stop time of 2000 sim seconds. In other words, it's not a practical to use that as a regression test.
pushed in changeset 3e7336abae57