This paper describes and analyses measurements made operating XMPP over HF Radio, using HF modems and a simulated radio link. We look at measurements operating directly over STANAG 5066, and operating over IP.
The measurements show that good performance is achieved over HF using STANAG 5066 for a wide range of parameters. Operation over IP over HF gives good results in some situations, but is not generally recommended.
HF Radio is important, as it provides beyond line of sight communication. In some situations it is the only communications option, and in others an important backup to Satcom. More information in HF Radio is given in the white paper HF Radio and Network Centric Warface.
HF Radio is a very constrained channel with awkward operational characteristics. Special protocols are needed to get useful performance. STANAG 5066 provides a key application interface described in the white paper STANAG 5066: The Standard for Data Applications over HF Radio.
XMPP operation over STANAG 5066 is described in the Isode white paper M-Link Support for XMPP over Constrained Networks. Key features:
- Use of STANAG 5066 SIS Protocol to connect the XMPP Server to STANAG 5066 Server.
- Use of STANAG 5066 Reliable Connection Oriented Protocol (RCOP).
- Use of ARQ communication in support of RCOP.
STANAG 5066 Test Setup
The diagram above shows the test setup. The HF Modems (RM6) and STANAG 5066 Server (RC66) are provided by Isode partner RapidM. The two modems are connected directly to each other by an audio link. A photograph of the setup and more details are given in the white paper STANAG 5066 Performance Measurements over HF Radio.
The Isode M-Link servers connect to the STANAG 5066 Servers using the STANAG 5066 SIS (Subnet Interface Service) protocol, which provides a flexible approach to separate applications from the data link layer.
The above diagram shows the protocol stack used between the two servers. The two XMPP servers communicate using XEP-0361: Zero Handshake Server to Server Protocol and XEP-0365 Server to Server communication over STANAG 5066 ARQ. This maps onto STANAG 5066 RCOP (Reliable Connection Oriented Protocol) which in turn uses the ARQ services of the STANAG 5066 Data Link layer. The tests used STANAG 4539 waveform with short interleaver.
The tests are driven by a pair of special XMPP test clients that can provide latency and throughput tests.
We used a basic latency test where one client sends a message with variable payload to its peer. The peer responds and sends back a message of the same size, and the test continues back and forward. We measure the time for each round trip. This traffic load is intended to look at the effect of 1:1 chat with messages going back and forth.
STANAG 5066 and Short Messages
The application level performance of tests with very small amounts of data over STANAG 5066 can be unexpected (see the ping results in STANAG 5066 Performance Measurements over HF Radio). In particular, the way that times vary with different link speed can be unexpected. Interleavers, which is the mechanism used to reduce the impact of burst errors will lengthen minimum transmit times, and the characteristics will vary with speed.
A basic set of measurements was done at 1200 bits/sec and no payload. 1200 is 'mid range' HF, and a typical operational speed. With all tests, the first round trip was longer than subsequent ones. There are three component of this initial overhead:
- Establishing an STANAG 5066 soft link (always used for ARQ).
- Initializing the XMPP peer agreement.
- Initializing presence for the pair of clients.
From comparison with RCOP performance measurements, it is clear that the first of these is the largest overhead, and that the second two do not significantly increase the overhead.
|Round Trip||Time (secs)|
Subsequent round trips are shorter. There is some variance in the time taken. This round trip time is closely tied to the modem & STANAG 5066 turnaround times. The XMPP application is not adding a significant overhead; This will become clear from the subsequent payload tests.
At 1200 bits per second, the protocol overhead from zero handshake XMPP plus RCOP is about 30 bytes per message. This would add about 0.2 seconds in each direction, which is negligible here. The latency is dominated by performance of the underlying system,
The base measurements were repeated at different speeds, with results shown below. 9600 is the fastest HF speed for standard systems. Round trip times are slightly longer than at 1200, which we believe to be primarily due to the interleaver.
|Link Speed (bits/sec)||First Round Trip (secs)||Average Subsequent Round Trip (secs)|
At 75 bits/second, which is the slowest HF speed, the data size of the PDUs does have an impact on the round trip times. The compressed PDUs for the minimal messages are around 30 bytes, which is about 3.2 seconds overhead in each direction. So for subsequent round trips this overhead is a factor but does not dominate the time.
The initial XMPP exchanges move about 1kByte of data, which will take about 100 seconds, so this is a significant overhead at this speed. Measurements suggest that this is a significant factor in the longer first round trip. For slow links, it is important that the XMPP peer agreement is maintained over a long time (which does not require any data to flow) so that this overhead will only be incurred very infrequently. For this low speed it may also be desirable to filter our all XMPP presence exchange, so that only messages with user data are sent over the link.
|Payload Size (bytes)||First Round Trip (secs)||Average Subsequent Round Trip (secs)|
The next set of tests added payload data of increasing size. It is interesting to note that quite reasonable amounts of data can be added without significantly increasing the round trip time. This reflects the performance that might be seen by a pair of users exchanging short messages.
At 1200 bits per second, 100 bytes will be transferred in 0.7 seconds. For a round trip, there are two transfers. XML is text encoded (hex dump of random data), and the payload can be compressed 51% with the DEFLATE algorithm. These numbers fit closely with the differences for round trip times from 512 to 1024 and from 1024 to 2048. For smaller payloads, there is no clear pattern with the timing, and it is likely there is a more complex interaction with STANAG 5066. The increases in the first round trip are larger. This is almost certainly a consequence of the way DEFLATE works by reference to earlier data, so it is less effective initially.
|Typing Delay (secs)||First Round Trip (secs)||Average Round Trip (secs)||Corrected Average (secs)|
With user to user conversations, there is usually a delay while the user composes a message. We ran modified tests to simulate this, by inserting a delay before response on the second test client. We found that for short delays (1-3 seconds) that the round trip time was not noticeably affected. This suggests that some of the base time is caused in a way that does is not affected by a small delay.
For a delay of five seconds, the delay increases markedly. The reason for this is that at around 5 seconds of 'no activity' the soft link between the two servers is closed down. At this point RapidM collision avoidance techniques are applied, and these lead to an additional delay before a connection is re-established. The implications of this, and approaches that might be taken are discussed later.
Although XMPP is not designed for bulk data transfer, it is useful to measure throughput with constrained networks. The above tests were done at 1200 bits/second. The payload is a hex dump of random data, which will compress to about 51% using DEFLATE. The utilization numbers here allow for this compression, and so are a reasonable reflection of protocol overhead. For larger payload values, these numbers seem reasonable, given that XMPP is not a bulk transfer protocol, and STANAG 5066 will give around 15% overhead.
Throughput measurements were made for 1024 byte payload at the three test link speeds. Good utilization is achieved at all speeds, with utilization increasing somewhat with higher link speeds.
Measurements over IP
The application setup is as for the STANAG 5066 tests, with the M-Link servers operating over TCP/IP. So the applications are not directly aware of HF being used. IP is configured to use an HF Radio Subnet, using STANAG 5066 IP Clients, with IP Client software provided by NATO.
The end to end communication between the XMPP servers is TCP/IP. The XMPP server has no direct visibility of the use of HF or STANAG 5066.
Two XMPP server to server protocols are used. These are described in the Isode white paper M-Link Support for XMPP over Constrained Networks:
- The standard XMPP Server to Server protocol ('XMPP S2S').
- XEP-0361: Zero Handshake Server to Server Protocol
|Link Speed (bits/sec)/Round Trip Type||STANAG 5066||Zero Handshake S2S||XMPP S2S|
|9600 First||21.1||41.7||Not Measured|
|9600 Average||13.9||14.9||Not Measured|
The basic latency measurement were repeated over IP using XEP-0361, and for XMPP S2S at 1200. Notes on the observations:
- No IP measurements were made at 75 bits/sec as it is not possible to establish a TCP connection at this speed. This and other general notes on use of IP applications over HF are given in the Isode white paper Performance Measurements of Applications using IP over HF Radio.
- At 1200 and 9600 bits/second the stable round trip times are slightly higher when operating over IP. This reflects similar low level utilization with some overhead.
- For XEP-0365, the time taken for the first round trip is about double STANAG 5066 (and the second round trip is also quite a bit slower than the average).
- For standard XMPP S2S, first round trip is about 20 times slower. We did not measure at 9600, but expect a similar result. This is due to the overhead of TCP startup, noted in the white paper referenced above.
|Link Speed (bits/sec)||STANAG 5066||Zero Handshake S2S|
Throughput tests were made using 1024 byte payload. Notes on the results
- At 75 bits/second, it is not possible to establish a TCP connection over HF.
- At 1200 bits/second, quite reasonable throughput was established. It took around 3 minutes to reach this level of efficiency, so for moderate volume exchanges (which would be more typical of XMPP use), this level of efficiency would not be achieved.
- At 9600 bits per second, 11% efficiency was achieved after 47 minutes. The reason for this is discussed in detail in the white paper referenced above. In order to get high efficiency, the TCP window needs to 'open up', and this only happens very slowly.
Operation over STANAG 5066
Good performance (considering HF fundamental characteristics) is observed over the full range of HF speeds, for both latency and throughput tests. It is expected that this operation will co-exist well with other applications using STANAG 5066, and will degrade well in the event of errors and change of bandwidth or other operational characteristics.
Where responses come back with delays greater than 3 seconds, additional delays were noted. This is discussed in more detail below.
Server vs. Client Access
The approach to HF radio is to operate server to server. Given the performance characteristics at the application level, it is clear that this architecture is sound, and that clients should not be directly exposed to HF performance.
ARQ vs. non-ARQ
All of the tests use ARQ at the STANAG 5066 level. This is desirable for support of variable bandwidth and for error handling. However, there are delays associated with soft link establishment that could be avoided if a non-ARQ mapping was used. This could be a sensible approach, in conjunction with error handling at the application level.
With the current RapidM non-ARQ collision avoidance, the additional delays would give significantly worse performance for typical XMPP use. However, with an improved collision avoidance approach, such as that offered by STANAG 5066 ed3, use of a non-ARQ mapping may be worth considering.
Soft Link Management
The round trip delays associated with responses after 5 seconds are tied to STANAG 5066 soft link management. This is compounded by the RapidM collision avoidance techniques, which adds an overhead when a connection is established shortly after one is terminated. There is an architectural difficulty with use of ARQ in a multi-node system. It may be helpful to keep the soft link established, if one of the nodes is going to transmit again. However, the soft link needs to be released if other nodes have data. With the simplex nature of HF, there is no way to know what is best.
The current M-Link implementation uses default soft link management provided by RapidM. A useful optimization Isode is considering is to add explicit soft link management into M-Link (STANAG 5066 provides this). This will enable M-Link to hold the soft link open for longer (switching back and forth between the two ends). For a two node system, this might be done for a quite long period of time. For a multi-node system, an intermediate value may be a useful trade-off.
IP vs. STANAG 5066
There are some situations where the performance obtained operating over IP is reasonable, and comparable to that obtained operating directly over STANAG 5066. In general, performance is much worse. Specific notes:
- Operation over IP is only possible for higher HF speeds.
- Connection establishment with standard XMPP S2S is very slow, and XEP-0361: Zero Handshake Server to Server Protocol is significantly preferable when operating over IP.
- Reasonable performance over IP relies on keeping connections open for long periods of time. This means that use of IP should be avoided where this is not possible (e.g., very poor network conditions). It is also a problem when using ARQ operation in a multi-node network, as only one pair can be active at a time.
- Operation over IP is poor for bulk data transfer, and this will also cause problems when XMPP is co-existing with bulk applications such as messaging.
- Operation over STANAG 5066 will handle network errors and performance variation much better.
This paper has given performance results for M-Link operation over HF Radio. Performance results for direct operation over STANAG 5066 are good, and this approach is recommended.