Research
- Transmission Control Protocol
- Flowgrind
- Motivation
- Architecture
- Output
- Manpages
- Netgrind
- Autoconfiguration
Architecture of flowgrind
The flowgrind tool is loosely based on thrulay. Flowgrind focuses on testing the TCP performance and has been designed with wireless multi-hop networks (WHMNs) in mind. Unlike many other network performance measurement tools, flowgrind can test against multiple servers at the same time. On each test connection, the so-called flows, blocks of data get transferred. Every flow has two endpoints, the source and the destination. Flowgrind not only supports unidirectional flows in each directions but two-way flows as well. Flows can optionally be rate-limited and various socket options can be applied individually to each flow. A test or test run consists of one or more flows that are configured to run during the test.
The parameters for each flow can be set individually for both directions. One important feature is that flows can be scheduled. Each flow is assigned a duration and an initial delay. These scheduling options can be set individually for both sending data from the source to the destination endpoint as well as for sending data in the reverse direction. With unidirectional flows, the duration is zero in one direction. After starting a test, no data gets sent in a particular direction for a flow until its respective initial delay expires.
The reporting intervals for each flow can be configured individually as well. The intervals can be set with millisecond precision, offering very fine grained reporting. Coarse intervals in the order of seconds or minutes are possible as well. Besides goodput, flowgrind also measures the round-trip time (RTT) and interarrival time (IAT) on data blocks which essentially are the application layer delays. In order not to influence the actual test, flowgrind transmits the block acknowledgements and IATs on a separate connection, termed reply connection, which can optionally use a different route through the network than the actual test data.
In addition, flowgrind can report many TCP-specific performance metrics such as the size of the TCP congestion window. It utilizes the TCP_INFO socket option to obtain these information from the Linux kernel. All metrics collected by flowgrind are measured once at the end of every reporting interval.
Most performance measurement tools use a client-server architecture which has several drawbacks. One problem with these tools is that is it very difficult to perform complex tests with many parallel flows between an arbitrary set of nodes in a network.
To solve this problem, flowgrind uses a different architecture. It is split in two parts, the flowgrind daemon and the controller which communicate through remote procedure calls (RPCs). The controller is responsible for setting up tests and to print the results, whereas the daemons perform the actual tests.
The use of RPC makes it possible create flows between any two nodes running the daemon in the WMHN. Tests can be setup using the flowgrind controller which does not participate in the test itself and thus does not need to be part of the tested network. Likewise, the test results are transmitted using RPC as well, so that the results get aggregated in one location for further analysis.
The flowgrind daemon has been designed to multiplex all flows into a single thread. Tests have shown that the process schedule of the operating system can lead to inaccurate results if using more than one thread or process to handle multiple flows. Restricting the processing of flows to a single thread avoids these fairness issues.
XML-RPC in flowgrind
Flowgrind consists of two separate programs, the flowgrind daemon and the controller, also called flowgrind client. The job of the daemons is to perform the actual tests. Therefore the daemons have to be started on every node in the WMHN that is the endpoint of a flow. The controller on the other hand only needs to run on a single machine. It is not required that the machine the controller runs on participates in the actual test, nor that it has to be in the same network as the daemons. The controller parses the flow parameters given by the user and communicates with the daemons over RPC to set up the flows. Once all daemons have been configured, the controller signals the daemons to start the test. Last but not least, the controller periodically gathers the reporting data from the daemons and displays it to the user.
The RPC technology used by flowgrind is XML-RPC. It is a relatively simple technology and offers sufficient performance. Flowgrind makes use of the xmlrpc-c library which offers both RPC client and server functionality.
Running a test consists of multiple steps that are run sequentially. In the first step, the controller parses the parameters given to it by the user and builds a list of hosts which are endpoint of a flow, namely the flow source and destination. This distinction into source and destination is only necessary for setting up the flows and to identify the endpoints, independent from the actual tests. Once the test connections have been set up, there is no difference anymore between a source and a destination endpoint from the daemon’s point of view. Excluding connection establishment, every endpoint has exactly the same capabilties.
After parsing the parameters, the controller checks that a compatible daemon is running on all involved endpoints using the get_version RPC method. This method simply returns the version number of the daemon. In a next step the controller queries all daemons with the get_status method which returns the number of flows already set up and whether a test is in progress. The controller only continues if all nodes are idle. This ensures that multiple independent test runs aren’t accidentally performed in parallel on the same nodes.
Following these preliminaries the flows will be setup one by one. First, the destination endpoint gets prepared using the add_flow_destination method with all the flow parameters needed by the daemon as arguments. Upon success, port numbers for the test and reply connections are returned. These port numbers along with the flow parameters for the other endpoint then get passed to the source using add_flow_source. The source endpoint then establishes the reply connection and, unless configured otherwise, the data connection as well. The reply connection is used to send the data block acknowledgements and block inter-arrival times to the other endpoint. Alternatively, establishment of the data connection can be deferred to the scheduled flow starting time. Once all flows have been set up successfully, the controller will instruct all daemons to start the test using start_flows.
While the test runs, each daemon periodically generates and stores a report for each flow, depending on its reporting interval. These reports contain the data measured during that particular reporting interval. Once a flow finishes, either sucessfully or due to some error, a final report is generated with some metrics over the entire test duration. Periodically the controller contacs each daemon and gathers the reports using the get_reports method. The controller then prints the reports on the screen or logs them into a file. There is a limit on the number of reports a daemon can temporarily store. If the controller cannot gather the reports quickly enough, for example because the reporting interval is too short or there are connection issues, excess reports will be lost.
Apart from the already presented methods, there are some convenience methods to influence running tests. stop_flow and stop_flows stop a given flow or all flows respectively. A small helper tool exists that just stops all flows on a given node. This can be used in situations where the controller has unexpectedly terminated so that a running test can be aborted.
After test completion or termination, the daemons uninitialize all flows and return to an idle state. The daemons can then be used for further test runs. In addition to the controller the RPC API of the daemons could be used by other programs as well. One possibility would be a status webpage that queries all nodes if the daemon is running and whether it is active or not. It could then offer an option to reset selected nodes or daemons.


