Bugzilla – Bug 872
ns3::PcapFileWrapper::Write explodes stack
Last modified: 2010-04-17 02:17:54 UTC
This: bool 93 PcapFileWrapper::Write (Time t, Ptr<const Packet> p) 94 { 95 uint8_t buffer[PcapFile::SNAPLEN_DEFAULT]; makes the stack explode in my simulation scenarios. It's not very pretty.
There are two options here: - create the temporary buffer somewhere else. (say, on the heap) and try to make sure that we share it among all PcapFileWrapper instances. This is not fun to implement and it's going to bring more problems in the multithreaded branch because we will have to make sure that the creation and destruction of this shared instance is correct and that different PcapFileWrapper instances do not use a shared temporary buffer if they are used from different threads. - make PcapFile provide an overloaded ::Write method which takes a Ptr<Packet> and re-implement PcapFile to use stdc++ io to call Packet::Copy (std::ostream &os). A side-effect of this would be avoiding the extra temporary copy done in PcapFileWrapper::Write
> bool > PcapFileWrapper::Write (Time t, Ptr<const Packet> p) > { > uint8_t buffer[PcapFile::SNAPLEN_DEFAULT]; I'm not sure what you mean when you say "makes the stack explode" in your simulation scenarios. The default user stack limit on my home machine and ns-old is, [craigdo@craigdo-fc12] ~ > ulimit -s 10240 ten megabytes. On ns-regression it's [craigdo@ns-regression] ~ > ulimit -s 8192 eight megabytes. I believe pthreads defaults to two megabytes for 64-bit architectures and one megabyte for 64-bit architectures. This is not a recursive call, so there's no way this 64K buffer can blow a stack on what I assume are these typical Linuces and environments, so I assume you are doing something where the stack is much smaller (simu?). What requirements do you have here? What order of magnitude stack makes your code explode? Does this 64K by itself take you over a hard limit? Is 32K stack frame a problem? 16K? So I think there's really a bigger question lurking here. We clearly have a new, severe resriction on stack usage. What is it really? Are there other new restrictions? I think people developing for standard Linux are going to have memory limit numbers closer to megabytes in their heads, rather than 64K. When I wrote this, 64K seemed on the large side for a stack allocation, but not completely outrageous given the amount of headroom there is. Anyway, if this buffer is causing you grief I think you may be shaving it too close on stack sizes, but I will happily remove this if it is in your way.
(In reply to comment #2) > > bool > > PcapFileWrapper::Write (Time t, Ptr<const Packet> p) > > { > > uint8_t buffer[PcapFile::SNAPLEN_DEFAULT]; > > I'm not sure what you mean when you say "makes the stack explode" in your explode == segfault because we hit the stack guard page. > I believe pthreads defaults to two megabytes for 64-bit architectures and one > megabyte for 64-bit architectures. It depends on how many threads you use in your application. i.e., as you create threads, the stack of the main thread is used for other threads. So, you start with a couple of megs for the main thread but you lose it quickly for that main thread if you call pthread_create. > This is not a recursive call, so there's no way this 64K buffer can blow a > stack on what I assume are these typical Linuces and environments, so I assume > you are doing something where the stack is much smaller (simu?). > > What requirements do you have here? What order of magnitude stack makes your > code explode? Does this 64K by itself take you over a hard limit? Is 32K > stack frame a problem? 16K? I think that it's generally a bad idea to allocate anything bigger than 1 or 2K on the stack in one go (which does not say anything about total stack usage) for a library because you don't know how it will be used. > So I think there's really a bigger question lurking here. We clearly have a > new, severe resriction on stack usage. What is it really? Are there other new > restrictions? I think people developing for standard Linux are going to have I do not know. I am merely discovering these limits myself. > memory limit numbers closer to megabytes in their heads, rather than 64K. When > I wrote this, 64K seemed on the large side for a stack allocation, but not > completely outrageous given the amount of headroom there is. > > Anyway, if this buffer is causing you grief I think you may be shaving it too > close on stack sizes, but I will happily remove this if it is in your way. I have no control over what the pthread library does for the main thread so, it's not something I can fix myself.
(In reply to comment #3) > It depends on how many threads you use in your application. i.e., as you create > threads, the stack of the main thread is used for other threads. So, you start > with a couple of megs for the main thread but you lose it quickly for that main > thread if you call pthread_create. oops, I take that back. What is happening here is that it's not the main thread (whose stack is indeed pretty big) which is calling down into this function. It's another thread and, indeed, that thread has a relatively small (16K) stack (which was picked by the pthread library). So, yes, I could change the attributes of that thread to get a bigger stack but 16K to 32K is pretty much the order of magnitude of what I can afford to give each thread. Giving them more makes it really hard to control memory usage for large numbers of nodes/applications. Ideally, we would have dynamically-sized stacks and I have plans to have that at some point but it's not going to happen soon :/
Created attachment 830 [details] use std::ostream in PcapFile and get rid of stack-allocated variables This patch deals with the problem for me. I chose to go with the std::ostream path because I was worried that: - it was not acceptable to have a per-PcapFileWrapper 64k buffer for temporary storage when there is already a buffer in the underlying FILE * - I would not be able to get the sharing right if I tried to share the 64k buffer among PcapFileWrapper instances A couple of notes: - I removed support for the "a" open mode because I did not find any user in our codebase and I felt pain in implementing it correctly. Consequently, I #if 0 the corresponding test. - I changed a couple of public APIs which took std::string for the open mode and replaced them with std::ios::openmode - I know that the proper type is std::ios_base::openmode but std::ios::openmode was easier to type - I removed the bool return value used to indicate read/write/init/open errors to use instead a pair of Fail/Clear functions to mirror the behavior of the underlying std::iostream. This was easier to implement and it did not propagate very far in our API so, I felt it was worth it. - of course, I removed the stack-allocated buffers in PcapFileWrapper::Write* functions Feel free to do what you think is right with this patch.
The fix I had in mind was more like six lines than a 65K patch, but I told you a few months back that you could swap in the slower stream-based code if you wanted to write it and didn't care about the performance. I wouldn't have done it, but I'm a man of my word. Feel free to update your patch so that Tom's new tcp test works, rescan bindings, push and close out the bug.
changeset 9787dc9fdd84
Maybe I did something wrong, but, after I updated my local copy of ns3-dev it does not compile anymore. It looks like this changeset brakes ns3tcp-interop-test-suite.cc: in that file at lines 117 and 122 there are calls to the function PcapFile::Open with parameters "r" and "w". I guess that now those should be replaced by std::ios_base::out and std::ios_base::in As this is clearly related to this bug I did not open a new one.
(In reply to comment #8) > Maybe I did something wrong, but, after I updated my local copy of ns3-dev it > does not compile anymore. It looks like this changeset brakes > ns3tcp-interop-test-suite.cc: in that file at lines 117 and 122 there are calls > to the function PcapFile::Open with parameters "r" and "w". I guess that now > those should be replaced by std::ios_base::out and std::ios_base::in > > As this is clearly related to this bug I did not open a new one. Can confirm these symptoms.
Crap. Looks like I did not build with nsc enabled. Fix underway.
gustavo fixed this in 6211:eb3a5837f84b thanks gustavo.