XMPP, the Internet Standard eXtensible Messaging and Presence Protocol, is the open standard for Instant Messaging (IM), Group Chat and Presence services. XMPP is widely used for military deployments, where operation over constrained and degraded networks is often essential, particularly for tactical operation.

Radio and Satellite networks often have constrained bandwidth, high latency and difficult operational characteristics. HF Radio, which is the primary alternative to Satellite for Beyond Line of Sight (BLOS) communication has particularly awkward characteristics. This paper looks at the problems of deploying XMPP over such networks and shows how XMPP can be effectively deployed in such environments. It describes standards that have been developed to support constrained operation and how these are supported in Isode’s M-Link products.


Why XMPP over constrained Neworks

XMPP, the Internet Standard eXtensible Messaging and Presence Protocol, is being widely adopted by military organizations and others who make use of constrained network communications. XMPP is an important building block for handling 1:1 instant messaging, multi-user chat, and presence. XMPP provides this in an open standards framework, which supports security, extensibility and distributed deployment. It is highly desirable to deploy one communication system that can support any environment.

Although military communication increasingly uses high speed networks, there are many deployed situations where slower networks, including SATCOM, VHF, UHF and HF radio are important. HF Radio is of particular importance, as it is the only generally viable Beyond Line of Sight (BLOS) alternative to Satcom, and necessary for situations where Satcom cannot be used or as a backup to Satcom.

Why XMPP?

XMPP is the protocol family of choice for military networks for one simple reason: Standardization. It enables interconnection of heterogeneous components, and integration of partner networks from other countries. In particular:

  • The standard client/server protocol enables integration of users on a wide variety of systems, from specialized deployed units to office systems at HQ.
  • The standard server/server protocol enables easy peer system integration.
  • It standardizes security, unlike older chat protocols.

XMPP is a rich protocol family, which high functionality and security capabilities. It supports the core services of IM, Group Chat, and Presence. It also supports advanced capabilities, such as geo-location shared by extended presence and is a communications platform suitable to support future applications.

Why XMPP for Tactical Networks?

Modern tactical communication has a complex mix of requirements:

  • Deployed units with a variety of communication links.
  • Participants from multiple countries working closely together.
  • Close involvement of field HQ.
  • Involvement of remote personnel, for example to provide specialist advice, or legal involvement with decisions to engage.

The Instant Messaging family of services is a useful and important component of tactical communications providing three core services.

Multi-User Chat

Multi-User Chat (MUC) is a central service for military communication. If data is being provided, it makes sense to share it so that all interested parties can see it. For example, it will enable external strategists or lawyers to observe communication in real time, and provide input as appropriate. It often makes sense to share information in the field, for example a group of ships jointly working out who will target what and how. MUC is an important operational capability.

One-to-One Chat

There will be situations when using 1:1 IM to send or exchange short messages will be more effective than formal messaging or voice communication:

  • When communication links have capacity to send data but not voice.
  • In very noisy situations where voice cannot be heard.
  • In situations where absolute silence must be maintained.
  • To provide information from a location where typing is easy (e.g., field HQ) to field locations in order to provide information that can complement voice.

Presence

Information on online presence can be useful information in support of other communication and supplements both 1:1 and MUC.  Extended presences (additional information associated with presence) can also enable useful sharing. In particular geo-location can be supported as extended presence, enabling presence as a means of location tracking.

Radio & Satellite Constraints on Tactical Networks

Tactical communication needs to use data communication links of widely varying speed and quality. It is important to be able to gain the benefits of fast networking when it is available to support a range of modern applications. However, it is also important to be able to use slower links, when they are the only option available. As well as speed, latency and reliability are important characteristics that impact applications using data communications links. Key network technologies are:

  • Satellite. Modern satellite systems provide bandwidth of 1 Mbps and higher, although many deployed systems are much slower (e.g., 4800 bps). Geostationary satellites have a latency of about 0.5secs. It is quite common to chain multiple satellite links, giving greater end to end latency.
  • Line Of Sight Radio (VHF). VHF Radio is widely used in tactical communications. Data links usually operate at 9600 bps (single VHF channel). Multiple channels can be combined to give full duplex communication and higher data rates. Although the physical latency is low, for a standard half duplex link, the low data rate will lead to turnaround times of half a second or more.
  • Line Of Sight Radio (UHF and faster). Higher frequency radio will provide higher bandwidth than VHF. Different bands give different operational characteristics, ranges and opportunities for deployment. All are restricted to line of sight communications.
  • Beyond Line Of Sight Radio (HF). Traditional HF Radio provides data rates from 75-9600 bps.  The newer wideband HF standards (STANAG 5069) increase this up to 480 kbps.  Data rates can be highly variable, with high and erratic error rates. HF will generally be used with half duplex operation and turnaround time is typically a few seconds.

In many deployments, data communication links are shared between multiple applications. Link capacity may be partitioned, to ensure that specific applications do not take more that an allotted share of the bandwidth. This may reduce available bandwidth for a specific application to considerably less than the physical limit.

Client/Server vs Server/Server for Constrained Links

The following diagram shows two options for providing XMPP service over a slow link to a client using standard XMPP protocols. In XMPP a client connects to a single server, and then there are direct server to server connections to support communication with clients on other servers.

In the first option (Client/Server), the client connects to its server over a slow link. In the second option (Server/Server), the client is local to its server (fast) and the server communicates with other XMPP servers over a shared slow link.

This paper will examine these options and show that Server/Server is preferable and should be used where possible.

XMPP Protocol Performance

This section give an overview of the protocol overhead associated with XMPP. XMPP protocol uses an XML text encoding.  The following example message is from the XMPP standard:

<message from='juliet@example.com'    
to='romeo@example.net'    
xml:lang='en'> 
<body>Art thou not Romeo, and a Montague?</body>
</message>

A minimal message such as this example will have an overhead of around 100 bytes. Typical XMPP clients will use more features leading to a typical operational overhead of 200 bytes per message.   XMPP protocol can use compression, and XMPP protocol and typical text messages compress well.   So a typical message with 100 bytes of text and 200 bytes protocol might compress to 100 bytes.   The time to transfer such a message is negligible in all but the very slowest links.

Once connected, XMPP the standard XMPP client to server and server to server protocols are fully asynchronous.   This means that XMPP will work well for high latency networks.

However, there are additional considerations on connection initialization.   There is exchange of about 10 kBytes of data for both client/server and server/server connections.   This is trivial on a fast network but can be significant on some target constrained networks. 

Client/Server connections have additional start-up overhead, typically in the range 20-50 kBytes of data.  This is associated with various necessary or desirable interactions to synchronize the client with the server. Sophisticated shared state between client and server is to be expected in a modern IM protocol.

A second consideration is that XMPP protocol initialization uses about ten sequential handshakes.   For high latency networks, this is problematic.  It leads to operational problems with Satellite networks and slow radio links. There is ongoing XMPP standardization work to reduce this number of handshakes, but this is not expected to be available for some while.

For server to server (S2S) communications, it is very common to have two connections, which doubles the overhead. This can be avoided using XEP-0288: "Bidirectional Server-to-Server Connections", which allows S2S operations using a single connection.

The overhead for messages to group chat is similar. The main difference is that for a client sending a message to a remote MUC room, that then the same message will come back again (from the room) leading to the link being used twice.

Presence updates (Chat State Notifications) are a similar size to messages (2-300 bytes). One of these will be received whenever a roster member changes status. When the client changes status, one will be sent and then returned back from the server.

XMPP Compression

XMPP provides compression using the DEFLATE algorithm. This can be applied in one of two ways:

  • Directly with the XMPP protocol.

The compression effect is the same, but comp

  • Within TLS (Transport Layer Security)

ression within TLS increases startup overheads. The figures quoted previously apply when neither TLS nor compression is in effect. DEFLATE works well for XMPP because:

  • XMPP is a text encoded protocol, and DEFLATE will give an immediate benefit for typical traffic.
  • XMPP has a regular structure, and common elements are often repeated. DEFLATE optimizes for this by reference to data transmitted, and will give substantial compression as use increases. For example, if a peer user is changing presence status between a small number of values, the same packets will be used to report this change, and DEFLATE will give very high compression.

It is worth considering how much compression is provided. The DEFLATE specification in RFC 1951 notes that "English text usually compresses by a factor of 2.5 to 3" (i.e., to 33-40% of the original size). Given that IM and MUC traffic is the primary user data carried by XMPP.

XMPP protocol also compresses effectively. Ad hoc measurements of a short lived connection suggest that typical presence updates will compress from 100 to 50 bytes, and typical message overhead will compress from 300 bytes to 120 bytes.

XMPP Design and Scaling

When looking at XMPP performance numbers in the context of very slow networks, it might appear that XMPP has poor optimization. It is worth considering the broad characteristics and design goals of XMPP:

  • XMPP is designed to provide an extensible communications and information publishing infrastructure. XML is a natural choice to achieve this, and provides an extensible approach that can be easily used in many environments. Although XML is not very compact, the data sizes are small on modern networks, particularly in comparison with voice, video and other data in wide use.
  • On a modern network, XMPP's network usage is very light.
  • XMPP clients are generally developed to provide "best service" to the user. There is no need for focus on optimizing network traffic.
  • The hard problem for a distributed or federated IM system is support of presence. Message switching load scales in a natural manner, with load proportional to usage. With presence, there is a need to update many clients over the network for each status change. Care needs to be taken to ensure that this scales well, and the XMPP design has taken considerable care on this point. This is described in "Interdomain Presence Scaling Analysis for the Extensible Messaging and Presence Protocol (XMPP)".

Client/Server Deployment

With this basic understanding in place of XMPP performance, we can consider performance of XMPP Client/Server interaction over a medium speed network of 28 kbits per second (3.5 kbytes per second) and 1 second latency.  This will give client start-up times limited by:

  • Network speed.  If there is 50 kBytes of data, this will lead to a transfer time of around 15 seconds.
  • Latency.  Ten handshakes on start-up will lead to a delay of around 20 seconds.

This will be a poor user experience. Whether or not this is acceptable will depend significantly on the stability of the underlying network.   If the network is very stable, XMPP client/server connections will be long-lived and the impact of this delay will not be significant.  However, if the network fails regularly, leading to reconnections, then even at medium speed client/server performance will be unacceptable.

At lower speeds and higher latencies, which will often need to be addressed, the initial connection time will become much larger.

This means that operating client/server over slower networks will often not be viable.

Point to Point Deployment over Constrained Networks

The second approach is to put the constrained network between a pair of XMPP servers, which is Isode’s recommended approach.

A core benefit of this architecture is that it isolates the XMPP Client from the constrained link and gives good local service to the client.   The client/server protocol will be quick, and the basic user interaction will be good.

The second point of this architecture is that the server/server protocol can be optimized for the slow link.  There are several reasons why this is preferable to optimizing the client/server protocol:

  • There are many more XMPP client implementations.  By addressing constrained network support at the server level, choice of XMPP client is not restricted.
  • It would be hard for an XMPP client to hide slowness from the user.
  • Because of shared state between client and server, much more data needs to be exchanged between client/server than server/server.
  • Server/Server optimization provides a framework to optimize for multi-user chat communication.

The following sections look at approaches to optimize constrained link communication between servers.

Zero Handshake Protocol

The communication between a pair of XMPP servers is essentially a flow of stanzas in each direction. A stanza is an element of data on an XMPP stream, which can be "message", "presence", or "iq" (information query).

XMPP uses one or two TCP connections to support this. This use of two connections is a historical consequence of dial-back and single connection is possible by use of XEP-0288: “Bidirectional Server-to-Server Connections”.

Setup of each connection involves a number of handshakes and data exchange. These include:

  • TCP handshakes.
  • TLS handshakes – use of TLS for server to server connection is recommended.
  • SASL handshakes for authentication.
  • XMPP stream binding.

This leads to ten or more end to end handshakes, and exchange of several kilobytes of data. For a low latency network, this overhead is minimal, particularly given that server to server connections will generally be very long lived (days or weeks).

For a constrained network, this overhead is a big problem, particularly where network reliability means that long lived connections will generally not be possible. For some networks, the number of handshakes is a particularly severe problem.

Isode's approach for M-Link is to use the "XEP-0361: Zero Handshake Server to Server Protocol” to reduce data volumes and removes all handshakes at the XMPP level. XEP-0361 uses three approaches to improve performance over constrained links:

  • Use of a single (bi-directional) stream.
  • To configure options at both ends of the connection (using peering controls) to avoid the need for negotiation at run time. This saves both data volume and handshakes.
  • To have full pipelining of the remaining stanzas. This means that when a connection is initiated there will be a sequence of initializing stanzas followed by messages. There is no requirement for any returned data to start sending. This achieves "zero handshake" at the stanza level.

This stanza level exchange is abstracted, so that it can be mapped onto multiple transports.

The base transport is TCP, which is a good choice for Satcom links. There will be a single TCP handshake to establish the TCP connection, and then data can flow without further handshakes. TCP will optimize use of the available link bandwidth.

XEP-0361 may be operated over TLS. M-Link supports this to provide data confidentiality with an option to use peer authentication using X.509 strong authentication. With many constrained networks these services will be provided at the data link or network layer, and there is no functional requirement to provide them at the application layer. TLS adds some protocol overhead, and with current versions of TLS the handshaking will add significant latency. The new TLS 1.3 significantly improves this.

Compression

Minimizing data transferred is important, and so use of compression is desirable.

Standard XMPP compression is used with Isode's optimized server to server protocol. This compression is stream based and uses the DEFLATE algorithm. A key benefit of this compression approach is that it is adaptive to both the protocol used (e.g., the XMPP protocol options and XML namespaces used) and to user messages and addresses exchanged. This means that a very high level of compression can be achieved in many situations.

DEFLATE references previous data in the stream, and it becomes more effective for larger data sets (or longer use in the case of a stream). This means that for compression to be effective connections between servers must be reasonably long lived. M-Link operates to achieve this. Where TCP is used, it is important that the underlying network is sufficiently reliable to hold the connection open for extended periods and that the overhead of TCP keepalives is not an issue. TCP will generally be a good choice for Satcom.

Presence Caching

A key benefit of XMPP is to provide up to date information on a user's presence status. To support this, XMPP servers exchange presence information. An important optimization in support of a low bandwidth link is for a server to cache presence values, so that if this gets requested (e.g., by a client logging on) then this can be handled locally rather than making a query to the remote server. Presence updates are pushed (not polled) and so if done correctly, this caching will still lead to clients being given correct information.

Traffic Filtering

All of the changes described so far, optimize performance without impacting the XMPP service provided to the client. Traffic filtering removes data, and so will modify the service provided to the end user. With traffic filtering there will be a trade-off between service and performance. Removing traffic and information of low value to the user will improve performance for high value data. The details of the filtering and the trade-off will vary, and traffic filtering is likely to be used aggressively for very slow networks, and less for somewhat faster networks.

Filtering options available are:

  • Removal of selected types of message (or other stanza).
  • Removal of selected elements from messages (message folding).
  • Removal of selected elements from presence stanzas (presence folding).

Removal of messages seems a drastic measure, but can be helpful. A class of message that it generally makes sense to remove is "chat state notifications". These give real time notification as to user "state" and in particular if the user is typing. Chat state notifications lead to client indications such as "Joe is typing". It will often be desirable to save the network overhead of these messages for a constrained network.

A more extreme filtering that M-Link offers is to remove all presence messages ('Joe is online'/'Joe is away from his desk'). This would reduce the communication to an exchange of user messages, and there would be no setting or update of user presence. Clients would need to be chosen that are appropriate for this type of deployment, as some will expect presence updates.

Another option would be filtering of IQ (information query) stanzas, which clients use to gather information and negotiation protocol features and extensions. There are a number of protocol features and extensions which are unsuitable for use in constrained networks, such a features allowing establishing Audio/Video streams over XMPP. Many of these features and extensions can be disabled via IQ filtering. M-Link provides flexible controls to filter traffic.

XMPP is an extensible message protocol, and a wide range of XMPP applications and services use this extension mechanism. Extensions and additions to a message are clearly identifiable in the XML of an XMPP stanza. M-Link allows extensions and message elements in general to be removed. This is called "message folding". Message folding can be specified either as a list of elements allowed (i.e., everything else will be stripped) or as an explicit list of elements to strip. Possible uses of this:

  • Work out the list of fields that are operation critical, and then strip out everything else.
  • Remove specific fields that are known to be not required, for example security labels.
  • Remove the HTML variant of a message (which some clients insert) and leave only the simple text version.

M-Link provides an equivalent "presence folding" mechanism for Presence stanzas. Presence can be used to convey additional information, which may or may not be needed. Presence folding allows presence messages to be reduces to a simple online/offline status, with no additional information. Things that might be stripped include:

  • Information on Avatars.
  • Additional presence information such as "extended away".

Multi-User Chat & FMUC

Multi-User Chat (MUC) is often an important service in a constrained bandwidth environment, and it introduces a number of performance and reliability problems. These problems and Isode’s approach to solving them using Federated MUC (FMUC) are described in the whitepaper [Federated Multi-User Chat: Efficient and Resilient Operation over Slow and Unreliable Networks].

Optimised Support for HF Radio (STANAG 5066)

XEP-0361 operates over TCP/IP, which will work well for Satcom and (relatively) fast radio links. However, it will not work so well for slower radio links and will be particularly bad for HF radio. The reasons for this are explained in the Isode white paper "Performance Measurements of Applications using IP over HF Radio". The most significant problem is that the TCP windowing mechanism for flow control interacts badly with the very long HF turnaround times.

The solution is to use STANAG 5066 which provides a standard approach for integrating applications to run over HF Radio. Use of STANAG 5066 directly by the application is key to getting good performance over HF Radio, and this is what has been done in M-Link. STANAG 5066 is often used with VHF and UHF radio, and will give useful performance increases here too.

The mapping of XEP-0361 onto STANAG 5066 standardized in "XEP-0365: Server to Server communication over STANAG 5066 ARQ”.   Key capabilities:

  • The application level protocol is a sequence of stanzas. The same mapping needs to be done for each direction, noting that there is no application level handshaking.
  • Stanzas that are ready to transmit are grouped together. This will optimize throughput and maximise compression, possibly at the expense of latency. Experience with HF suggests that it is generally sensible to optimize for throughput.
  • This packet block is transferred as a block using STANAG 5066 RCOP (Reliable Connection Oriented Protocol), which as the name suggests reliably transfers the data to the peer XMPP server.

This approach optimizes point to point operation over HF Radio.

Multicast and EMCON

Many slow networks use underlying broadcast transmission, and it is desirable that the application can make use of this. A related problem is that it is desirable to support end point in radio silence (EMCON or Emission Control). This means operation without acknowledgements. The architecture for this is shown above.   This architecture is not currently supported by any protocols or products, but looks to be a sensible future direction.

SATCOM Case Study

An Isode customer required to use Iridium Satellite communication that provided data rates of 2400 bits/second and latency which could be several seconds. TCP connections over these links failed quite regularly. Use of a client/server XMPP architecture led to connection establishment times of 6-8 minutes, which was not acceptable.

Shifting to a server/server architecture and M-Link using XEP-0361 zero handshake protocol reduced this reconnect time to around 30 seconds, most of which was associated with lower level link setup.

Use of traffic filtering and Federated MUC provided additional performance improvements and resilience.

Conclusions

XMPP is important technology for supporting military tactical communication. It is useful directly, and as a basis for interoperable situational awareness systems. The protocol has good functionality, extensibility and scaling characteristics and can be deployed directly over fast and medium speed networks.

This paper has looked at the requirements for XMPP operation over constrained networks and has shown the benefits of operating server to server over the constrained link.  It has described key technologies implemented in Isode’s M-Link product:

  • XEP-0361 Zero Handshake server to server protocol
  • Compression
  • Presence and Message filtering
  • Federated Multi-User Chat
  • Operation over HF, using STANAG 5066 and XEP-0365