Bugzilla – Bug 1009
Tx Sequence number assert fail in TCP
Last modified: 2010-11-17 19:23:55 UTC
When 100's of TCP connections come and go on a slow network, I occasionally get this assert to fail in tcp_socket_impl.cc: bool TcpSocketImpl::ProcessPacketAction (...) { ..... case NEW_ACK: ..... NS_ASSERT(tcpHeader.GetAckNumber () <= m_nextTxSequence); That's line 1024 in 3.9 released. It seems like the problem is in the following code, which sets m_nextTxSequence to the sequence number saved in m_finSequence: bool TcpSocketImpl::ProcessPacketAction (...) { .... case ACK_TX: NS_LOG_LOGIC ("TcpSocketImpl " << this <<" Action ACK_TX"); if(tcpHeader.GetFlags() & TcpHeader::FIN) { ++m_nextRxSequence; //bump this to account for the FIN m_nextTxSequence = m_finSequence; } SendEmptyPacket (TcpHeader::ACK); break; That's lines 945-953 in 3.9 released. There seem to be two problems with that. First, very often m_finSequence is 0, indicating that it had never been set. Second, m_finSequence may be less than the current m_nextTxSequence. Often it's only 1 less, which seems to be okay. However, once in a while it's 100's of packets less than m_nextTxSequence. I worked around the problem by making two changes to that block. First, if m_finSequence is 0, I just ignore it -- I don't set m_nextTxSequence. Second, if m_finSequence is less than m_nextTxSequence, I also ignore it. Eg, I changed that to: case ACK_TX: NS_LOG_LOGIC ("TcpSocketImpl " << this <<" Action ACK_TX"); if(tcpHeader.GetFlags() & TcpHeader::FIN) { ++m_nextRxSequence; //bump this to account for the FIN if (m_finSequence == g_zeroSeqn) { NS_LOG_ERROR("TcpSocketImpl::ProcessPacketAction: ACK_TX+FIN finSeqn==0 ERROR:" << " m_nextTxSequence=" << m_nextTxSequence); } else if (m_finSequence < m_nextTxSequence) { NS_LOG_ERROR("TcpSocketImpl::ProcessPacketAction: ACK_TX+FIN SEQUENCE ERROR:" << " m_finSequence=" << m_finSequence << " m_nextTxSequence=" << m_nextTxSequence); } else { m_nextTxSequence = m_finSequence; } } SendEmptyPacket (TcpHeader::ACK); break; With those changes, everything works okay. I realize this might not be the best solution; I'll leave it to someone who's more familiar with this code and TCP to decide whether this cures the disease or just treats the symptoms. Of course, if you decide it's curing the disease, it would be better to add an explicit "m_finSequenceHasBeenSet" flag rather than assuming that 0 is always an invalid sequence number.
When 100's of TCP connections are created and closed on a slow network, occasionally this assert fails (tcp_socket_impl.cc, line 1024, 3.9 released): bool TcpSocketImpl::ProcessPacketAction (...) { ..... case NEW_ACK: ..... NS_ASSERT(tcpHeader.GetAckNumber () <= m_nextTxSequence); It seems like the problem is in the following code, which sets m_nextTxSequence to the sequence number saved in m_finSequence (lines 945-953 in 3.9 released): bool TcpSocketImpl::ProcessPacketAction (...) { .... case ACK_TX: NS_LOG_LOGIC ("TcpSocketImpl " << this <<" Action ACK_TX"); if(tcpHeader.GetFlags() & TcpHeader::FIN) { ++m_nextRxSequence; //bump this to account for the FIN m_nextTxSequence = m_finSequence; } SendEmptyPacket (TcpHeader::ACK); break; There seem to be two problems with that. First, very often m_finSequence is 0, indicating that it had never been set. Second, sometimes m_finSequence is less than the current m_nextTxSequence. Often it's only 1 less, which seems to be okay. However, once in a while it's 100's of packets less than m_nextTxSequence. I worked around the problem by making two changes to that block. First, if m_finSequence is 0, I just ignore it -- I don't set m_nextTxSequence. Second, if m_finSequence is less than m_nextTxSequence, I also ignore it. Eg, I changed that to: case ACK_TX: NS_LOG_LOGIC ("TcpSocketImpl " << this <<" Action ACK_TX"); if(tcpHeader.GetFlags() & TcpHeader::FIN) { ++m_nextRxSequence; //bump this to account for the FIN if (m_finSequence == g_zeroSeqn) { NS_LOG_ERROR("TcpSocketImpl::ProcessPacketAction: ACK_TX+FIN finSeqn==0 ERROR:" << " m_nextTxSequence=" << m_nextTxSequence); } else if (m_finSequence < m_nextTxSequence) { NS_LOG_ERROR("TcpSocketImpl::ProcessPacketAction: ACK_TX+FIN SEQUENCE ERROR:" << " m_finSequence=" << m_finSequence << " m_nextTxSequence=" << m_nextTxSequence); } else { m_nextTxSequence = m_finSequence; } } SendEmptyPacket (TcpHeader::ACK); break; With those changes, everything works okay. I realize this might not be the best solution; I'll leave it to someone who's more familiar with this code and TCP to decide whether this cures the disease or just treats the symptoms. Of course, if you decide it's curing the disease, it would be better to add an explicit "m_finSequenceHasBeenSet" flag rather than assuming that 0 is always an invalid sequence number.
Opps -- my comment #1 is just a restatement of my original description of the problem. When I submitted the first original description, the bug tracker complained that my session had timed out, so I thought it had lost the first version. I'd delete comment #1, but I don't see any way to do that.
I looked at this and m_finSequence seems to be trying to track different things at different places in the code (the transmit sequence number of the FIN, or the received sequence number of the FIN). Does the patch attached fix the problem for you? It tries to remove the cases in which m_finSequence has any relation to transmit sequence numbers.
Created attachment 1004 [details] patch to test
Yes, that patch fixes the problem; my tests run to completion. Thank you!
pushed in changeset 46e44607166f