STANAG 5066 Annex U defines an IP Client specification.

This whitepaper looks at requirements for IP applications over HF. It then analyses the Annex U IP specification. Then a set of measurements are reported on using a pre-release version of Isode’s Icon-PEP product.

Finally, overall analysis and recommendations to NATO are made. While the IP Client specification has some utility, this analysis suggests that it is a sub-optimal approach for general provision of IP services over HF.


What is IP Client

IP Client provides a mechanism for transferring IP packets over an HF subnet following the classic IP architecture shown above. It achieves this by a simple mapping of IP packets on to the STANAG 5066 Unidata service.

Requirements for IP Services over HF

In order to analyze the suitability of a solution, the requirements for IP services need to be understood. This section sets out some notes on IP requirements, against which IP Client can be measured.

Key Applications: Messaging and XMPP

Some mission critical applications define mappings directly onto STANAG 5066. There are a number of options for messaging protocols, as set out in Isode whitepaper [Messaging Protocols for HF Radio]. Performance of these protocols can give better than 90% link utilization as described in the Isode whitepaper [Measuring Performance of Messaging Protocols for HF Radio].

The approach for XMPP is described in the Isode whitepaper [Operating XMPP over HF Radio and Constrained Networks]. This approach has been demonstrated in UK MoD trials to work well down to 300 bps, with low latency and high reliability.

These optimized protocols obviate the requirement to operate these services using IP Client, which would be significantly less efficient.

TCP and Web Services

TCP is the dominant protocol for applications running over the Internet, and it is clearly important to operate a range of standard and special protocols over TCP (e.g., Database Access).

HTTP (Web) operates over TCP. Web is important in its own right, particularly for access from Mobile Units to shore. Many standard and most modern proprietary protocols operate over HTTP, rather than directly over TCP.

Support of TCP and Web applications is a key target for operation over HF.

UDP Protocols

UDP is the other primary layer protocol over IP. This section considers UDP protocols that might be used.

DNS

DNS (Domain Name Service) is the major protocol that operates over UDP. It is important for Mobile Units to look up domain names of shore services. Mobile Unit domain information will typically be available on shore side services, so the reverse direction is less important.

SNMP

SNMP is the other major standard application that operates over UDP. However, SNMP is of decreasing importance in modern deployments. Modern application management is usually Web based. Core network management has shifted to Netconf (which operates over TCP) in conjunction with YANG. SNMP over HF is not expected to be a high priority.

QUIC

QUIC is a new protocol developed by Google for fast Web access using UDP.

Other UDP Applications

Streaming media is covered separately. Some use of UDP has been made by specialized peer to peer applications. It is unclear if these (or any other UDP) applications are of interest to operate over HF.

It seems likely that there are only a limited number of standard UDP applications of interest.

General Purpose Multicast Protocols

Use of multicast protocols has been a research topic, but in practice, IP applications are deployed point to point. Multicast protocols over the Internet remains a research topic. Specialized multicast protocols are used on a single subnetwork (e.g., ARP – Address Resolution Protocol). There does not appear to be benefit in operating such protocols over HF.

Custom Military Protocols & Multicast

We understand that a number of custom military protocols operate over UDP. These are oriented at high latency and multicast networks. Isode would be interested to get more information on such protocols, to validate operation over HF.

IP Routing Protocols

IP routing protocols operate over IP in various ways (TCP, UDP, Special), dependent on routing protocol choice.

In many HF configurations, Mobile Units will have very simple IP configurations (typically a single subnet). The sensible way to support this will be with static routing. In this sort of configuration, there is no requirement to exchange routing protocols over the HF link,

For faster HF networks and nodes with complex configurations, dynamic routing might possibly make sense. If such requirements exist, it would be sensible to review routing and IP requirements in detail.

Voice, Video and Streaming Media

Traditionally Voice has been deployed over HF by dedicated use of the link.

Use of Streaming Video has been made to demonstrate IP operation over WBHF. This shows the link capacity, with received video delayed and buffered. While this can be made to work, it does not seem to be a sensible operational approach. It would be preferable to record the video and then transfer it as a file (e.g., as an email attachment). This will make more efficient use of the link and avoid issues of the stream being impacted by outages or reduced performance.

Interactive Voice and Video conferencing are attractive services for the target audience. Superficial analysis suggests that operation of VoIP, Video and other IP streaming media services over HF using IP Client is not going to work in a useful manner. This is not considered further in this paper. Deployment of Voice and Data simultaneously over HF and basic Video over HF are topics that Isode believe warrant serious investigation.

ICMP

ICMP (Internet Control Message Protocol) is an infrastructure protocol that is important for IP operation. ICMP Ping is also useful for testing. ICMP needs to be supported, although generally traffic levels will be low.

Other IP Applications

There are other protocols that operate over IP, some of which are used for streaming media and routing (covered separately). We are not aware of any other protocols over IP that may be useful to operate over HF.

Specification Review

IP Client is a simple and straightforward specification. IP packets are mapped simply to STANAG 5066 Unidata. Use of either ARQ or non-ARQ services are allowed. Some specific observations are made in the following subsections.

IP Client Overhead

IPv4 has a per packet overhead of 20-60 bytes (IPv6 has a fixed 40 bytes overhead, but a different extension mechanism may lead to other overheads). Although this is not particularly large for what IP is doing, it is a significantly larger overhead than typical for protocols that are optimized for HF. TCP headers are a further 20-60 bytes, which applies to both data and control packets.

ARQ and non-ARQ

IP Client allows operation using both ARQ and non-ARQ services.

In Order

STANAG 5066 allows specification of data to be delivered “In Order”.

MTU Size

The MTU of an IP packet over HF must be small enough for the complete PDU to fit into a standard STANAG 5066 Unidata (max size 2048 bytes). Routing configuration must ensure this, and MTU path discovery must be supported.

TTL

For each IP hop, TTL (Time To Live) must be decremented. Technically IPv4 should decrement according to hop time. In practice, the modern convention (and IPv6 usage) is to decrement by 1. The approach should be documented in the specification. Decreasing by hop time would be difficult to calculate and would almost certainly lead to operational problems.

Measurements

Test Architecture

The above architecture is used to measure performance. Components as follows:

  1. HF Network simulated by Isode’s MoRaSky tool.
  2. Icon-5066 (Isode STANAG 5066 product) provides STANAG 5066 service.
  3. Icon-PEP (Isode product) provides IP Client. It connects two IP Routers to provide an IP Subnet service over HF.
  4. LH router connects to test host.
  5. RH router connects to test host and Internet.

The following tests can be run from the LH Host:

  1. ICMP Ping to any node on the RHS.
  2. DNS Lookup using NSLookup tool to the Internet.
  3. Web Browsing to the Internet.
  4. TCP Measurements using a special tool running on each of the test hosts.

STANAG 4539 was used for speeds of 9600 bps and below. STANAG 5069 with 48 kHz bandwidth was used for the 240 kbps tests. All tests used short interleaver.

Use of ARQ

Measurements with TCP show clearly that use of ARQ gives the best results. For most services, it is clear that reliability is needed and ARQ is the sensible way to achieve this. For this reason, most measurements make use of ARQ. Where measurements do not use these options, this is explicitly noted.

Ping

Ping tests were performed at a range of speeds using ARQ. 5 pings were done at each speed, with a gap between each ping so that only one ping was outstanding.

Speed Avg Min Max
240,000 bps 6.8 secs 5.4 secs 7.8 secs
9600 bps 25.0 secs 12.9 secs 33.1 secs
1200 bps 12.6 secs 8.12 secs 16.5 secs
300 bps 29.2 secs 15.8 secs 41.9 secs

Notes on these results:

  1. Initially 75 bps was planned as lowest speed, but it was not possible to get this to work at all. It was not clear why pings failed at this speed.  Therefore 300 bps was used as the bottom speed. 150bps was not tested.
  2. It is clear that ICMP Ping, which makes very simple use of IP can work over a wide range of speeds.
  3. As noted below, it proved difficult to eliminate all “background” IP traffic from the test system. There was a low level of IP traffic being generated in addition to the test traffic. We believe that this had the following impact on test results:
    1. It is likely the reason that tests at 75bps did not work.
    2. It means that a CAS-1 soft link was permanently open, so that this traffic did not have the overhead of establishing a CAS-1 soft link.
    3. The interaction with this traffic is a likely explanation for the wide variation of response times.
  4. The 9600bps seem slow, in context of the other results. The reason for this is unclear.
  5. The results show that basic use of IP Client is viable across a wide range of HF and WBHF speeds.

The following test was made with non-ARQ:

Speed Avg Min Max
9600 bps 14.8 secs 9.9 secs 16.4 secs

This showed a response time that was better than ARQ at the same speed but was broadly in line with the ARQ results. The improvement in performance may be due to non-ARQ not requiring strictly alternating direction of transmission.

DNS

DNS (Domain Name System) measurements were made using the standard nslookup tool to look up the isode.com domain from the public Google DNS server (8.8.8.8) accessed over the test HF network. Response times were measured with manual stop watch.

DNS needs reliable response, so ARQ mapping was chosen.

Speed Response Time
240,000 bps 5 secs
9600 bps 11 secs
1200 bps 14 secs
300 bps 21 secs

Notes:

  1. DNS worked across the speed range with acceptable response times in line with ping results. The times are dominated by turnaround effects, rather than data transfer.
  2. Apart from the fastest speed, response times are significantly longer than the typical DNS repeat timer (5 seconds on Linux, 1 second on Windows).   This means that queries and responses will be sent over the network several times. This is undesirable, particularly at lower speeds.

Web Services

Some Web Browsing was done using the Firefox browser to examine behaviour over the HF link. The goal was a general purpose test to examine viability of Web browsing over HF.

Example.com is a very simple web page that does not use HTTPS. It loaded in 26 seconds at 240,000 bps. At 9600 bps, access to the example.com Web site with Firefox browser timed out after 3 minutes. Access was retried and it loaded in 1 minute.

Isode.com has a typical Web site, which is fast for normal access. It partially displayed after 10 minutes at 240,000 bps, and was partially usable at that point. It fully loaded in 23 minutes, including a “cookie dialogue”.

Observations of the underlying STANAG 5066 dialogue suggests that the HF network is being used very inefficiently. These results suggest that IP Client is not going to be viable for general purpose Web browsing. The following analysis of TCP suggests that there may be some special situations where operation of HTTP over IP Client will be viable.

Web Browsing with QUIC

Web browsing to google.com was tried with Chrome, which will use QUIC. This timed out. This suggests that work will be needed to make QUIC work.

TCP

TCP performance was measured with a special test tool that opens a connection and transfers a specified volume of data to a responding tool, which measures latency throughput (based on time from initial TCP request to all the data being delivered. Data volumes are measured in kilobytes (kB) used as 1,000 bytes. Data transfers of up to 10 MBytes were made.

Size & Speed Analysis

The following tests were made over clear link for varying speeds and data sizes.

Speed Size (kB) Utilization Connect Time Total Time Final Data Latency
240 kbps 1 0.3% 11.5 secs 11.5 secs 0.0 secs
240 kbps 10 2.2% 6.2 secs 14.9 secs 8.3 secs
240 kbps 100 8.4% 12.6 secs 39.8 secs 9.4 secs
240 kbps 1,000 26.6% 12.7 secs 125 secs 64.6 secs
240 kbps 10,000 7.2% 3.8 secs 4660 secs 1119 secs
9600 bps 1 2.2% 37.6 secs 38.7 secs 4.1 secs
9600 bps 10 17.5% 26.9 secs 47.6 secs 24.6 secs
9600 bps 100 56.2% 22.0 secs 148 secs 130 secs
9600 bps 1,000 80.7% 27.9 secs 1033 secs 566 secs
9600 bps 10,000 Failed      
1200 bps 1 17.2% 30.9 secs 38.7 secs 9.5 secs
1200 bps 10 60.3% 19.2 secs 111 secs 95.3 secs
1200 bps 100 74.9% 19.3 secs 890 secs 548 secs
300 bps 1 32.2% 48.7 secs 82.9 secs 55.1 secs
300 bps 10 39.4% 110 secs 677 secs 603 secs

Notes:

  1. Throughput at narrowband speeds is quite reasonable.
  2. The time to establish the TCP connection was a key factor in lower throughput for the smaller transfers. TCP connect times are in general slow.
  3. Default TCP window size is around 100 kBytes. This means that most data for transfer, gets buffered in the OS of the initiating system at low speeds.
  4. Where there is no data loss, the window size slowly increases over time, and so more data is buffered in the OS.
  5. Data latencies are very high at end of transfers. This has potential to cause application problems.
  6. At 240 kbps, the TCP window size is too small for optimum throughput. This leads to poorer performance at this speed.
  7. An attempted transfer of 10 Mbyte at 9600 bps failed with TCP write timing out on sending side. This was likely an effect of the increasing latency. At this point approximately 3.7 Mbyte had been transferred with utilization of 84%. The IP queue was approaching 1 Mbyte of data.
  8. After about 1 Mbyte of transfer at 240 kbps, TCP behaviour leads to both inefficient use of the link and retransmissions. The following table records performance degradation after that point.
Time Bytes Transferred Latency Utilization
26 secs 86,904 1 sec 10%
44 secs 260,680 13 secs 19%
60 secs 463,400 17 secs 25%
89 secs 724,040 38 secs 27%
127 secs 1,049,840 60 secs 27%
203 secs 1.440.800 115 secs 23%
303 secs 1,896,920 179 secs 20%
454 secs 2,483,360 284 secs 18%
703 secs 3,200,120 463 secs 15%
1,094 secs 4,112,360 720 secs 12%
1,741 secs 5,350,400 1,107 secs 10%
2,735 secs 6,783,920 1,364 secs 8%
3,904 secs 8,217,440 1,572 secs 7%
4,589 secs 9,650,960 1,048 secs 7%

Analysis of TCP over ARQ Behaviour

This section considers what is happening “under the hood”. This analysis assumes familiarity with the protocols being used. At the TCP level, this is what is happening:

  1. -> SYN. TCP Connection initiated.
  2. <- ACK + SYN. Response comes back.
  3. -> ACK + 8 bytes data. Sometimes additional 1,000 byte data. This is the final part of the three way handshake, and some data is sent at the same time,
  4. <- ACK (or ACK + ACK). Data acknowledged
  5. -> Two 1500 byte data. TCP window being ramped up.
  6. <- Two ACKs,
  7. -> Four 1500 byte data.
  8. <- Four ACKs
  9. -> Eight 1500 byte data
  10. <- Eight ACKs

This shows the core flow. Some of the data gets repeated due to timeouts, such as the initial SYNs.

These transfers are mapped onto ARQ data. Once the connection is initiated, there is a very natural two way flow. Data goes one way, On return, there are STANAG 5066 ARQ Acks of the data, and TCP ACKs of the data.

When ARQ data arrives, STANAG 5066 will respond immediately with a STANAG 5066 ACK which advances the window edge and reports on missing D_PDUs. Any queued data will be sent at the same time.

For the longer transmissions, TCP ACKs to the early IP packets will arrive in plenty of time and be queued up.

At the connection stage, the responses may arrive in time to be transmitted back or the may miss it. There is a minimum one second transmission, so at higher speeds there is more time for the TCP returning data to be in time for the transmission. Note that one of the 240 kbps connects took 3.8 secs. Here, all of the data arrived in time to be transmitted immediately.

If the TCP data misses the transfer, the turn is at the wrong end. Transmission is delayed by the Annex K and S5066-EP6 procedure. In the test configuration, there is a base good conditions delay of 6 seconds (twice slot length). The test nodes have slots 2 and 3 in the CSMA configuration, so delays are 9 and 12 seconds. If both directions are missed, this leads to a delay of about 25 seconds. If one direction is missed, an intermediate value. This is why there is a high variance in connect delays.

The variance in delay is compounded by the presence of small amount of IP traffic additional to the test traffic.

There is a gradual ramp-up of window size. This will mean that the early STANAG 5066 transmissions will be short and sub-optimal. As the TCP window opens up, transmissions lengthen and better performance is achieved. At the slower speeds, this leads to the longer tests achieving good link utilization over the period of the test.

For larger transmissions, the window continues to open. We believe that this leads to the sender timing out some transmissions leading to extraneous retransmissions and performance collapse. It is unclear how this could be prevented.

Non-ARQ measurements

The following table is the ARQ results at 9600 bps, extracted from the earlier table for convenient comparison with the following table.

Size (kB) Utilization Connect Time Total Time Final Data Latency
1 2.2% 37.6 secs 38.7 secs 4.1 secs
10 17.5% 26.9 secs 47.6 secs 24.6 secs
100 56.2% 22.0 secs 148 secs 130 secs
1,000 80.7% 27.9 secs 1033 secs 566 secs

The following table shows Non-ARQ results at 9600 bps.

Size (kB) Utilization Connect Time Total Time Final Data Latency
1 5.8% 13.4 secs 14.5 secs 4.1 secs
10 19.3% 20.5 secs 43.1 secs 26.7 secs
100 56.3% 20.0 secs 148 secs 60.9 secs
1,000 79.3% 16.1 secs 1051 secs 629 secs

Notes:

  1. Non-ARQ performance over a clear link is similar to ARQ performance.
  2. Faster TCP connect with non-ARQ leads to somewhat better non-ARQ throughput for small volumes.
  3. ARQ turnaround times are lower than non-ARQ (controlled in the test by STANAG 5066 Annex K and S5066-EP3 slotted behaviour). This leads to relative ARQ performance improvements for higher volumes.

Tests with Link Errors

Error rate tests were made at 9600 bps with 100 KiB of data transferred, using different BER values for the link.

Error Rate ARQ Utilization (in order) ARQ Utilization (any order) Non-ARQ Utilization STANAG 5066 ARQ Utilization
Clear 56.2% 58.4% 56.3% 93%
BER 10-6 53.9% 57.6% 46.6% 91%
BER 10-5 40.0% 44.7% 34.9% 70%
BER 10-4 24.7% 23.6% Not viable 50%

Notes:

  1. The TCP connect time is highly variable (10-40 seconds) and this has a significant impact on the exact throughput numbers. 
  2. ARQ utilization changes broadly in line with underlying STANAG 5066 link utilization. This reflects that STANAG 5066 ARQ is providing reliability, and TCP behaviour remains unchanged.
  3. Non-ARQ performance falls of more rapidly. Non-ARQ does not provide reliability, and so reliability is provided by the TCP mechanisms. These are not optimized for HF, so do not work so well.
  4. The TCP reliability mechanisms failed completely for BER of 10-4
  5. Tests for ARQ were done with "in order" selected and not selected. We had anticipated that this value would make little difference, but it appears that better results are obtained when "in order" is not selected

These measurements suggest clearly that use of ARQ is going to be the best approach for TCP over IP Client, as it deals better with HF errors than non-ARQ.

Our analysis here has led to the choice in Icon-PEP to not use in-order by default, although it can be configured. We believe this choice will be particularly helpful if multiple TCP streams are being handled, as STANAG 5066 "in order" applies across all data.

Notes on IP Client for Target Applications

The following sections provide some analysis and notes arising from the measurements made.

Multicast Applications

ACP 142, which is Isode’s only multicast application, is best operated directly over STANAG 5066.

Custom military applications may support multicast. These need to operate over non-ARQ service. Measurements would be needed to determine the optimum number of retransmissions.

Although STANAG 5066 does allow In Order to be used with non-ARQ, Isode strongly recommends against this, as loss of data blocks data delivery. Isode recommends that STANAG 5066 be modified to not allow this combination.

ARQ

It is clear that services such as DNS need reliable data and that use of ARQ is a preferable way to achieve this to using DNS repeat transmissions. Similarly, it is desirable to not lose ICMP messages.

Measurements with TCP show clearly that use ARQ is the preferable choice, due to better behaviour in error loss conditions. For most target applications, it seems clear that use of ARQ is the best choice.

IP MTU Size & IP Fragmentation

The IP MTU needs to be small enough so that all IP packets can fit into a 2048 byte STANAG 5066 maximum Unidata size. Given ARQ transmission, keeping IP MTU as large as possible is desirable to minimize relative overhead of IP and TCP headers. An MTU of 1500, which is the standard Ethernet size, seems a sensible choice.

We found in tests that applications (running on LAN) would select much larger IP MTU sizes, which were then fragmented to go over HF. We view that use of IP fragmentation is undesirable. For IPv6, fragmentation is not allowed. Icon-PEP constrains MTU size to the specification and support MTU discovery. However, if an application uses a larger MTU, the router we used performs fragmentation. This appears difficult to avoid in practice. IP fragmentation did not appear to significantly affect performance.

IP Client Queuing

When STANAG 5066 SIS service flow controls IP Client, the IP Client can choose to queue or discard incoming IP packets.

Discarding packets is problematic for TCP performance, as TCP retransmission is control by RTO (Retransmission Timeout) which is three times the RTT (Round Trip Time). The RTT is very large for HF, which means that retransmissions will take a long time and lead to delays. Retransmission will also lead to reduction of TCP window size, which will reduce throughput.

When there is no data loss, window size increases. We observed that most of this increased window size is handled by OS buffering. The IP Client implementation needs to consider how to handle STANAG 5066 flow control. It is clear that IP packet discard should be avoided for the reasons noted above.

However, large IP queues are problematic. This is particularly the case when delays lead to application timeouts, when transmission of the queued IP packets will lead to problems,

Isode’s basic strategy with Icon-PEP is to queue everything, to avoid the problems of discard. Then there are configurable limits to queue size and oldest queued IP packet. When these limits are exceeded, the entire queue is discarded. At this point, the application over IP will need to deal with the loss. Making all of the losses at one point, will minimize the number of such resets. This will typically be done by TCP layer, but UDP applications such as DNS will generally retransmit if there is no response.

CAS-1 Soft Link Failure

Under poor conditions and extended fades, the CAS-1 soft link that under-pins ARQ data transfer will break. The STANAG 5066 server will reject all queued data.

Icon-PEP strategy is based on the likelihood that CAS-1 failures will often be long enough to lead to application failure. So, Icon-PEP will not retransmit rejected messages. It will also discard all queued IP packets for the STANAG 5066 peer. This approach forces recovery up to the level above IP. If TCP is the layer above IP Client, it will handle this recover, but performance is likely to be significantly degraded.

Spurious IP Traffic

The test configuration was set up with one node as a Mobile Unit, with default routing to the “shore” system. This is desirable, as it will lead to correct behaviour of IP applications on the Mobile Unit. A good deal of IP traffic just appeared over the IP link. This included:

  • Routers sending ICMP traffic to validate link up state.
  • Applications on the (default Centos 7) Linux server sending UDP packets to all sorts of IP addresses.

This traffic was surprisingly difficult to eliminate. On a fast link, this traffic would not be noticed. It would impact a slow HF link.
Traffic can be eliminated by any of:

  • System configuration to not generate the traffic.
  • Router configuration.
  • Icon-PEP configuration.

This needs to be considered when setting up an IP system operating over HF. IP connectivity between networks is designed to be “always up” and this would impact operation if the intent is “on demand” behaviour.

Performance Summary

The following applications are not considered, as Isode views that they are not suitable for IP over HF:

  1. Routing Protocols. Static routing is seen as the best option for IP over HF setups.
  2. Voice and Streaming Media protocols.

Notes on performance for the following applications:

  1. ICMP.
    1. Traffic levels are expected to be low, and IP Client is suitable.
    2. ARQ mapping recommended.
  2. DNS.
    1. ARQ mapping recommended.
    2. IP Client performance acceptable at higher speeds.
    3. Duplication of queries and responses undesirable.
  3. QUIC
    1. This does not work out of the box.
    2. If this becomes an important requirement, further investigation is needed.
  4. UDP Point to Point Applications.
    1. Minimal requirements for standard applications.
    2. Possible military requirements.
    3. ARQ mapping likely to be best.
    4. IP Client likely to be suitable.
  5. Multicast UDP Applications.
    1. Possible military requirements.
    2. Non-ARQ mapping needed.
    3. IP Client likely to be suitable,
  6. TCP & HTTP.
    1. ARQ mapping is preferable.
    2. IP Client not suitable for general purpose Web browsing.
    3. IP Client sub-optimal in many situations.
    4. IP Client can be fragile, even where performance seems reasonable

Recommendations to NATO

Based on this analysis and measurements, Isode makes the following recommendations to NATO:

  1. Standardization of a protocol to support TCP PEP is a high priority. Use of TCP and HTTP applications is a top priority over and IP Client is a sub-optimal choice for this. Use of SLEP (SIS Layer Extension Protocol) specified in S5066-APP3 appears to be a good direction
  2. An optimized approach for DNS using a DNS-PEP over STANAG 5066 is desirable, as there is clear need for DNS support. Duplicate traffic is undesirable for HF. This work is aligned to current IETF activity on different bearers for DNS.