Bug 645

Summary:	[patch] fixes for opening stats file with OMNeT++
Product:	ns-3	Reporter:	Tom Henderson <tomh>
Component:	general	Assignee:	Joe Kopena <tjkopena>
Status:	RESOLVED FIXED
Severity:	enhancement	CC:	andras, craigdo, faker.moatamri, mathieu.lacage, ns-bugs, rudolf.hornig, tjkopena
Priority:	P3
Version:	ns-3-dev
Hardware:	All
OS:	All
Attachments:	patch to fix

Description Tom Henderson 2009-07-24 10:21:25 UTC

Created attachment 538 [details]
patch to fix

Contributed by Andras Varga and Rudolf Hornig:

We found some issues when we opened the generated files in the OMNeT++
analysis tool, and we did some changes in the code to rectify them. Please
find attached the patch!

This patch is against version 4620:5d448b8564e5.

Changes in the contrib/stats folder:
- we have introduced a new base class called StatisticalSummary for
  MinMaxAvgTotalCalculator, so it can return the calculated values for
  the DataOutputCallback::OutputStatistic. This change allows the gathering
  and writing out of min/max/avg/total values for the same attribute. The
  output variable names are: name-count, name-min, name-max, name-total,
  name-sqrsum, name-stddev (the last two are unimplemented in the data
  collector)

- for the DataOutputCallback::OutputSingleton method. we have changed the
  signature and introduced a CONTEXT parameter, and the former KEY parameter
  is now used as the variable name. In general, now CONTEXT is used to
  specify where the statistics are gathered (i.e. it can be "." for global
  stats or can be like "node[1].wifi" or whatever naming schema is used).
  The KEY attribute is now used to code   the name of the statistic, like
  "length", "dropcount" etc.  When creating an a DataCollector, one should
  specify both the CONTEXT and the KEY. This change was necessary to
  generate a "more correct" omnet result file, because the unpatched version
  writes the variable name where the module path (CONTEXT) should be, 
  and module path info was not present at all.

- of course DataCalculators now have a SetContext method

In the examples folder:

- a new wifi-example-omnet.sh was added; it runs the same example but
  generates omnet output. (this file must be made executable in the repo)

- a minor modification to the wifi-example-db.sh was made and the SQL query
  was modified to reflect the fact that the variable names are now stored
  in the variable column instead of the name column)

Comment 1 Rudolf Hornig 2009-07-27 06:41:27 UTC

The new file format contains now a name and a context (i.e. where the value was collected) the format is:

scalar context name value

Statistics can be also gathered (count,sum,mean,min,max) is collected. Teh format is:
statistics context name
field stattype1 value1
field stattype2 value2



Example:
run run-100-1
attr experiment "wifi-distance-test"
attr strategy "wifi-default"
attr measurement "100"
attr description ""
attr "author" "tjkopena"

scalar . measurement "100"
scalar node[0] wifi-tx-frames 30
scalar node[1] wifi-rx-frames 30
scalar node[0] sender-tx-packets 30
scalar node[1] receiver-rx-packets 30
statistic node[0] tx-pkt-size
field count 30
field sum 1920
field mean 64
field min 64
field max 64
scalar . delay-count 30
scalar . delay-total 5889990
scalar . delay-average 196333
scalar . delay-max 196333
scalar . delay-min 196333

Comment 2 Mathieu Lacage 2009-08-13 03:50:57 UTC

I am curious: is there somewhere an official documentation for the omnet++ file format ?

Comment 3 Andras Varga 2009-08-13 04:19:04 UTC

(In reply to comment #2)

Yes, it's an appendix to the OMNeT++ documentation.
http://www.omnetpp.org/doc/omnetpp40/manual/usman.html#sec477

If there's interest, it can be expanded and turned into a standalone specification.

There's also an alternative implementation in Java:
http://www.omnetpp.org/omnetpp/doc_details/2194-jresultwriter-09

Comment 4 Andras Varga 2009-08-14 04:34:22 UTC

Some explanation to the patch.

The syntax for recording scalar values in the file is this:

scalar <moduleName> <statisticName> <value>

where moduleName is supposed to denote the component the statistic comes from, like "the 802.11 MAC module of host 12", and the statisticName should be something like "num pks received" or "avg delay". moduleName in OMNeT++ is in a dotted notation ("net.host[12].wifi.mac"), in ns3 it could be a name like "/NodeList/0/ApplicationList/0/$Sender/Tx" -- it doesn't really matter.

Global statistics can be recorded under the name of the root object, or using the abbreviation ".".

The current ns3 code generates lines like this:

scalar wifi-tx-frames count 30
scalar wifi-rx-frames count 30
scalar sender-tx-packets count 30
scalar receiver-rx-packets count 30

So the name is written where the moduleName should be, and the statisticName is "count". This should be like this:

scalar ??? wifi-tx-frames 30
^ moduleName should come here

The patch largely addresses this issue. moduleName (i.e. its ns3 equivalent, a hierarchical name) was not available in the original C++ code where the file was written, so we had to change the code to take it there. It became the "context" parameter.

The other change is how statistical summary info (like count, total, avg, min, max, stddev) are recorded. Currently it's like this in the ns3 code:

scalar tx-pkt-size count 30
scalar tx-pkt-size total 1920
scalar tx-pkt-size average 64
scalar tx-pkt-size max 64
scalar tx-pkt-size min 64

Where again, moduleName is missing, but if we introduce it, where to write count, total, min, max? There are two a ways. A simple solution is to treat them as independent statistics, like this:

scalar <moduleName> tx-pkt-size-count 30
scalar <moduleName> tx-pkt-size-total 1920
scalar <moduleName> tx-pkt-size-average 64
scalar <moduleName> tx-pkt-size-max 64
scalar <moduleName> tx-pkt-size-min 64

The file format offers a better solution, and we went for that:

statistic <moduleName> tx-pkt-size
field count 30
field sum 1920
field min 64
field max 64

This is why the StatisticalSummary interface was introduced in the code, made MinMaxAvgTotalCalculator implement it, and the OutputStatistic() method was added.

Hope this makes it easier to interpret the changes introduced by the patch.

Comment 5 Joe Kopena 2009-09-28 14:24:26 UTC

Hi all,

In some ways there are two parts to this discussion: The context, and
the statistical summary.  They're somewhat separate, but not totally.

I am not very familiar with Omnet++, and the documentation link above
is slightly different from the drafts used when this stat package was
originally written.  That said, I don't think the addition of the
context is obviously the best move, nor an obviously bad one.

For each recorded entry, there are arguably at least four pieces of
data: The source, the variable, the statistic, and the value.  For
example, source might be node[0], the variable tx-frames-size, the
statistic minimum, and the value 128.

Omnet++ gives three places to shove those four elements.  In the
original code, the source and the variable were presumed to be shoved
into the key field of OutputSingleton(), and therefore the module name
in Omnet terms.  The proposed patch shoves the variable and the
statistic into the variable field of OutputSingleton(), and therefore
the statistic name in Omnet terms.  In many respects I think the
differences are basically a wash.  Which one is easier to work with
depends entirely on what kind of data you are collecting, and what
tools and techniques you are applying to analyze it.

I would say that the primary shortcoming of the statistics package is
that it does not deal with this issue more flexibly, i.e., providing a
scheme to arbitrarily tag data.  The difficulties to overcome in doing
so are developing the appropriate framework to both present a common
API and shove that kind of flexible format into disparate formats.
For example, it might be easy to accomodate that in a DB, but it seems
like more choices would have to be made to work with Omnet.  In this
case, the choice was made as described in the above paragraph.

The reason the original code bundled the elements as described rather
than as in the proposed code was in part to avoid the problems with
the proposed StatisticalSummary interface.

Note that, as implemented, the StatisticalSummary introduces several
cases of multiple inheritance.  I may have missed something in the
past few months, but my previous understanding was that there was a
concerted effort in ns-3's design approach to avoid such structures.

More generally, the problem with the StatiscalSummary is that it seems
to obviate the rationale for structuring the data output mechanism as
it is.  The reason for the less-than-straightforward control structure
with its hierarchy of data collectors, callbacks, and inverse control
is to disconnect the output from the actual statistics being
collected.  The output format doesn't have to know whether it's a
simple average, a maximum, a confidence interval, etc.  It just takes
what the collector has to give, which the collector in turn identifies
for the analysis scripts via the variable field in OutputSingleton().

The proposed StatisticalSummary hardcodes the output formats to a
fixed set of statistics.  Granted, it includes a scheme to not
implement some statistics, somewhat awkwardly signalled via a NaN
result rather than a typing mechanism.  But it doesn't allow for
adding new statistics to be output without modifying all of the output
formats and their common interface.  That reduces the flexibility of
the scheme, and makes most of the structure unnecesary complication
that could be simplified if that sort of hardcoding is acceptable.

The actual patch is well done, fairly thorough, and adds some good
documentation and so on, but I think contains a few design decisions
that are not clearly better than what's there, and degrade some of the
features, so hopefully there are better ways to meet the objectives.

Thoughts?

Thx

Comment 6 Andras Varga 2009-09-28 18:39:33 UTC

(In reply to comment #5)

Joe, thanks for your comment. It was quite long and since I'm not sure I got
the message, I'll just reflect with some general thoughts here.

Let me begin by stating the obvious: The purpose of the particular ns3 feature
is to generate output files that are consistent with those generated by
OMNeT++, and can be processed with the OMNeT++ Analysis tool.

Now, it is not of particular interest to me what code is used to create
those result files. We have just noticed that the existing code generates
result files which are not entirely consistent with OMNeT++-generated files
(for whatever reason; my guess is that the file format spec we wrote wasn't
clear or detailed enough, and got misunderstood), and we wanted to improve
on that. The most constructive way seemed to be submitting a patch.

Obviously, files of the same format can be generated in several ways;
it really makes no difference to me what code or whose code makes it into
ns3, as long as the output is consistent with OMNeT++ files.Feel free to
use or not use the patch, change it or rewrite it from scratch -- anything is
fine with us, as long as the resulting code writes similar files as the patch.

As for the file format specification: we intend to improve it (clarify etc),
and make it available as a standalone spec (i.e. not part of the OMNeT++
manual). We believe the file format is stable and flexible enough so that
we won't need to touch it for years, and new stuff (if needed) can be added
without breaking backwards compatibility.

PS. a note about the "statistical summary" interface: a similar concept is
also part of the Apache Math library, see
http://commons.apache.org/math/api-1.0/org/apache/commons/math/stat/descriptive/StatisticalSummary.html

Comment 7 Andras Varga 2009-10-06 09:58:34 UTC

I noticed this bug got categorized as "Enhancement". I think that is misleading, because the currently generated files have problems, and cannot be loaded into the OMNeT++ result analyser. So this issue is a bug of some sort, and not an enhancement. Just my 2c.

Comment 8 Faker Moatamri 2009-11-05 10:06:36 UTC

I have been through the patch and I have 2 remarks:
- .project file should never be included in any patch or any repository, this file should always be kept private
- Why in wifi-example-db.sh you changed 'name' to 'variable'? Is there any reason why you did that?
Thanks

Comment 9 Andras Varga 2009-11-05 16:56:36 UTC

(In reply to comment #8)
> I have been through the patch and I have 2 remarks:
> - .project file should never be included in any patch or any repository, this
> file should always be kept private

In general, yes; however, here the .project file is somewhat useful for the end user as well. The OMNeT++ IDE is based on Eclipse, and IDE can only use the ANF file (part of the patch) if it is within an Eclipse project. Of course the end user could create the project himself/herself as well.

Of course you decide whether .project is useful enough to be checked in. We included it in the patch because it's easier to delete than create.

> - Why in wifi-example-db.sh you changed 'name' to 'variable'? Is there any
> reason why you did that?

This seems to be accidental, we didn't change the database output. Sorry about it.

Comment 10 Faker Moatamri 2009-11-06 03:37:17 UTC

> 
> In general, yes; however, here the .project file is somewhat useful for the end
> user as well. The OMNeT++ IDE is based on Eclipse, and IDE can only use the ANF
> file (part of the patch) if it is within an Eclipse project. Of course the end
> user could create the project himself/herself as well.
> 
> Of course you decide whether .project is useful enough to be checked in. We
> included it in the patch because it's easier to delete than create.
> 
Ok since your have a good explanation for that, I don't mind leaving the .project file in the patch.
> 
> This seems to be accidental, we didn't change the database output. Sorry about
> it.
> 
Please fix this and push the patch to the repository. If you don't have enough rights to push it into repository, send me the latest patch and I will apply it.
Thanks
Best regards
Faker Moatamri

Comment 11 Faker Moatamri 2009-11-11 06:50:34 UTC

Patch applied, changeset: 5502:04acce3f7133