Summary

This paper describes and analyses measurements made operating XMPP over a slow IP link with variable delay to simulate Satcom. These results are applicable to use of XMPP with any constrained IP network. This paper compares measurements of standard XMPP and Isode's optimized server to server protocol. Comparison measurements with IRC (Internet Relay Chat) are also given.

XMPP & Optimized XMPP

XMPP, the Internet Standard eXtensible Messaging and Presence Protocol, has a number of performance issues when used with constrained networks. These are described in in the Isode white paper M-Link Support for XMPP over Constrained Networks, referenced here as "The White Paper".

The deployment approach recommended in The White Paper is to always operate server to server over a constrained link. Therefore, this paper makes server to server measurement of:

  1. The standard XMPP Server to Server protocol ('XMPP S2S').
  2. Isode’s optimized XMPP Server to Server protocol ('Optimized S2S').

Optimized S2S carries the full XMPP service, and so there is no loss of service to the end user. The key differences of Optimized S2S relative to XMPP S2S are:

  1. Use of a single TCP connection (rather than two).
  2. Reduction of the amount of data needed to establish a connection.
  3. Removal of all protocol handshakes to establish a connection ('zero handshake').

Note that both XMPP S2S and Optimized S2S use compression.

This paper measures the effect of these differences. A deployed tactical chat system using constrained networks is also likely to use additional capabilities of M-Link described in the white paper to reduce the traffic volume, so that the end user will benefit both from raw performance improvement and reduced load.

Test Setup

The test setup uses two XMPP Servers that can be connected either by XMPP S2S or by Optimized S2S. Latency and Throughput tests are driven by a specialized XMPP test client.

The constrained link between the two servers was provided by a serial line, with IP operating over PPP (Point to Point Protocol). A configurable delay was added by configuration at the IP driver level on the servers. Link speed was configured for the serial link.

Two link speeds were used for the tests:

  1. 2400 bits/second. Although many Satcom systems now provide data at much higher rates, slow Satcom links are still widely used. For example the Iridium system provides data at 2400 bits per second. 2400 was chosen as a realistic 'slow' number used for most of the tests.
  2. 19200 bits/second. Some latency tests were made at 19200, which was chosen as a relatively low speed which is sufficiently fast for latency (rather than data rate) to be the dominant performance factor in startup.

Three values of latency were used in the tests:

  1. 0 seconds. This gives a reference point, and helps show delay that is due to data transfer.
  2. 0.5 seconds. This corresponds to the minimum latency for a geo-stationary satellite, associated with the signal time to satellite and back.
  3. 1 second. In practice, additional latency is likely for Satcom systems, and some Radio systems may have even higher latency. This value is used to illustrate the effect of higher latency.

Latency Measurement

The first set of tests measures latency, with a test setup that is similar to a pair of users exchanging messages. Client 1 sends a message to Client 2. Client 2 immediately sends a message back to Client 1. Client 1 then responds, giving an ongoing flow of messages.


(Optimized) Round Trip Time in Secs with 2 Byte Payload

The graph shows a typical result of the latency test, measuring the round trip time (from client 1, to client 2, and back to client 1). Observations on the graph:

  1. The round trip time is very stable, with little variation. It is this stable number that is used in subsequent tables as the round trip time.
  2. The first round trip is always higher than the stable value.
  3. The second round trip is sometimes slightly higher than the stable value.

There are two conditions of start-up.

  1. Where there is no XMPP server to server connection between the two processes. In this scenario, the first ping will cause the server to server connection to be established. This measurement enables the overhead of this connection setup to be measured. This setup overhead is examined in detail later.
  2. Where the server to server connection is in place. When a test is run, there is no need to establish a connection. However, system processes may have been idle before the test, and the start of the test will cause processes to load, and this leads to some additional delay in the first few round trips as process 'page in'.
Payload (bytes) XMPP S2S (secs) Optimized S2S (secs) Theoretical (secs)
2 1.2 0.7 0.013
100 1.85 1.4 0.66
200 2.3 1.85 1.33
500 3.75 3.2 3.33

Round Trip Time with Varying Payload

The latency test uses a configurable amount of user data (random binary, which will not compress) in order to look at the effect of varying the amount of data carried. The tests were done on a link of 2400 bits/sec, with no added delay. A Linux ICMP ping (which carries around 50 bytes of data) takes 0.8 seconds, which is slightly longer than the fastest XMPP ping that carries less data.

Measurements were made for varying sizes of payload of the stable round trip time for XMPP S2S and Optimized S2S. A theoretical time is given. This is calculated based on the time needed to transfer the payload in both directions over 2400 bits/second.

The stable round trip times for Optimized S2S were consistently 0.5 seconds faster. We believe that this is a consequence of there being only one TCP connection. A possible explanation is that data and acknowledgement data is concatenated, which could not happen with two TCP connections.

The round trip times seen are not expected to have a significant impact operationally, and should not noticeably slow down chat between human users.

Delay (secs) XMPP S2S (secs) Optimized S2S (secs)
0 1.85 1.4
5 6.4 (1.4) 6.4 (1.4)
10 11.4 (1.4) 11.4 (1.4)
15 16.4 (1.4) 16.4 (1.4)

Round Trip Time with Varying Typing Delay

A variant of the test has client 2 wait for a period before responding. This is to simulate the effect of a real user taking some time to read a message and to respond to it. 100 byte payload was used. The results show the effect of various delays. The time in brackets shows the round trip time with the delay removed.

For all results, expect Standard S2S with no delay, the value after the delay is deducted is the same (1.4 seconds). The effect that causes the additional delay for XMPP S2S as a consequence of using two TCP connections goes away when this delay is added.

Network Latency (secs) XMPP S2S Stable (secs) Optimized S2S Stable (secs) XMPP S2S First (secs) Optimized S2S First (secs)
0 1.2 0.7 36.3 2.9
0.5 2.3 1.8 57 4.9
1 3.3 2.8 78.5 6.9

Round Trip Time with Varying Network Latency (2400 bits/sec)

Measurements were then made with varying network latency (the delay for data to make a single traversal of the network). This measures the stable value and the first transfer where no connection is in place between the two XMPP servers. This is at 2400 bits/sec with a 2 byte payload.

The stable round trip time for each latency value increases by twice the network latency, for both XMPP S2S and Optimized S2S. This is to be expected, as a round trip requires two network traversals. These numbers again show the 0.5 second difference between XMPP S2S and Optimized S2S.

The first round trip takes longer, due to the need to establish a connection between the two XMPP servers. For optimized S2S, there is just over two seconds of extra traffic for the first round trip, to handle server to server initialization. As latency increases, this value goes up by four times the network latency. This is to be expected, as two network traversals are needed to establish the TCP connection, and two more to establish the server to server connection and to carry the data. These figures clearly show that the 'zero handshake' protocol is working, and that the overhead for initializing a server to server connection is low.

For standard XMPP S2S, setup is much slower, even with zero latency, because of the amount of data that needs to be exchanged. This increases significantly with latency, due to the number of handshakes needed.

Network Latency (secs) XMPP S2S Stable (secs) Optimized S2S Stable (secs) XMPP S2S First (secs) Optimized S2S First (secs)
0 0.16 0.10 4.6 0.4
0.5 1.15 1.10 25.8 2.4
1 2.15 2.10 46.9 4.4

Round Trip Time with Varying Network Latency (19200 bits/sec)

The same tests were run with the link at 19,200 bits per second, and the results shown above. The goal of this test was to show performance effects when latency is the determining factor.

For stable round trip times, the pattern is the same as for 2400, with the absolute time at zero latency very low. Optimized S2S is very slightly faster.

For Optimized S2S startup, the pattern is as before, with zero latency startup down to 0.4 seconds.

For standard XMPP S2S, zero latency startup is reduced to 4.6 seconds. It can be seen that this increases by 40 times the latency value. This is due to two connections being established sequentially, with ten handshakes each.

Throughput Measurements

Payload (bytes) Link Utilization
10 22% (15%)
100 46% (40.5%)
1024 70% (66.5%)
10240 74% (73%)

The second test used was to measure throughput. Although XMPP is not generally used for bulk data transfer, these measurements help to give a useful understanding of the characteristics of operating XMPP over a constrained link. The throughput test uses XMPP to carry a payload with packets of a configurable size. The payload data is binary, so that it will not compress. Then a stream of stanzas is sent in one direction.

The above tests were measured at 2400 bits/second, with zero network latancy.
Initial throughput was less than the final value – this was particularly marked for small payload. Throughput increased as TCP window opened up, and the stable values were measured.

Throughput figures were very similar for XMPP S2S and Optimized S2S. This is to be expected, as the packets are the same. It was observed that a given connection would reach a stable value of throughput, but that the value reached would vary between tests. The highest values recorded are shown here, with the lowest value in brackets. We believe that this is an effect of TCP window negotiation, and connections reaching a stable but different values in a non-deterministic manner. This effect was particularly marked for small payload.

This analysis is supported by the fact that the stable rate seems to depend on which test was run previously (i.e., the starting TCP window size will have an effect on the final window size and throughput achieved).

The overhead for each XMPP stanza is around 28 bytes (after compression). This enables a calculation to be done to show the split between payload, XMPP protocol, and network (TCP, IP and PPP) overhead.

If XMPP is going to be used for data transfer, it is clear that a larger payload will be used. Here, it is a reasonably efficient way to carry data. For small values, the XMPP overhead is high in percentage terms, but seems reasonable in absolute values, and the data sizes are not a problem at 2400 bits/second. For very small payload, the network overhead is lower, as multiple XMPP stanzas can be carried in a single IP packet.

Analysis

Is XMPP Fast Enough for 2400 bits/sec?

Once a connection is established, XMPP performance at 2400 bits/second with either low or high latency is good. Optimized S2S gives marginal improvement. Overhead per message is small (28 bytes typical) and this is a small factor when larger messages are sent. The delays for transferring messages are low, and would not have a significant impact on human use of XMPP.

For bulk data transfer (not a normal XMPP target), the protocol is reasonably efficient. Because both server to server protocols provide compression, in practice a higher throughput would be expected where data can be compressed (as the tests used data that would not compress).

The big performance issue is connection establishment. This will be at least 36 seconds at 2400 bits/second, and there is a startup delay of 40 times the network latency that will impact at any speed. For a high latency network, this could mean startup time of several minutes.

For a very stable network, this slow startup is not a big problem: you start once and keep the connection open. However, this will often not be the case in environments where slow networks are used:

  • Slow networks are often unreliable, and so connections will break. After connection restore, it will often be desirable to send pending messages quickly, and a slow startup is not good.
  • End points may often be turned off. If equipment is turned on in order to send a message, the extra delay of establishing a connection is undesirable.

For these reasons, use of Optimized S2S, which substantially reduces connection startup time will be highly desirable.

Alternate Protocols

An alternative to using XMPP, would be to use an entirely different protocol for constrained networks. It is clear that a highly optimized protocol could be slightly faster (e.g, the 28 byte per message overhead could be reduced). However at 2400 bits per second this would make minimal operational improvement, relative to Optimized S2S.

Given the benefits of using a single protocol for tactical and strategic nets, and the wide adoption and modern functionality of XMPP, there seems little reason to look at alternatives.

Internet Relay Chat (IRC)

We made measurements of IRC, which is a widely deployed distributed chat system, with an equivalent setup using two IRC servers, and a test client connected to each server. We used the ircu server used by the Undernet. We chose this server as it is well established, is being actively developed and seems popular (highest page rank on Google). Key comparative measurements.

  IRC Optimized S2S
Per message overhead 40 bytes 28 bytes
Max message size 512 bytes No Limit
Ping round trip (10 byte payload) 3.5 secs (with 6.5 second spikes) 0.7 secs
Ping round trip (100 byte payload) 4.5 secs (with 7 second spikes) 1.4 secs
Throughput (10 byte payload) 1.6% 22%
Throughput (100 byte payload) 11% 46%
First ping time - 2.9 secs (1.5 secs)
Server startup 16 secs -

Notes on the measurements:

  • The IRC protocol overhead is slightly higher, which would lead one to expect similar but slightly slower performance. We have subsequently found that some IRC servers use compression, which would lead to lower network overhead.

  • IRC has a relatively small max message size, which would constrain applications wanting to maximize throughput.
  • The round trip time for Optimized S2S was very stable. The IRC round trip time was less stable, with very sharp spikes for one or two messages every 25-30 messages.
  • Ping times and throughput rates are dramatically worse for IRC. The network trace showed TCP packets that had surprisingly small values. This would have a performance cost, but does not explain the difference. The extra delays did not appear to be due to inefficient protocol or network use.
  • The IRC servers established connection on startup, and this time was measured with a network trace. This took significantly longer than Optimized S2S (1.5 seconds of the first ping time is associated with startup). However, it is better than standard XMPP S2S which took 36.3 seconds.

These results are surprising. IRC has a reputation for high performance and in particular high message switching rates. We had expected to see similar results to Optimized S2S, and that XMPP was just as fast as IRC. We found that Optimized IRC was much faster than the Undernet IRC server. It is possible that our configuration was non-optimal or that a different IRC server would have given better results.

Server vs. Client Access

The architecture investigated here is to use server to server protocols. A key point to note is that XMPP Client to Server protocol is very similar to server to server. The start-up overhead would have a similar amount of data, noting that only one link is established, so basic start-up overhead would be half that of XMPP S2S (which is still much higher than Optimized S2S). Detailed observations:

  • For a long term stable link, ongoing performance would be similar to XMPP S2S.
  • Clients are likely to start more often than servers, so the overhead of connection setup would be higher.
  • The slowness of client connection establishment would be visible for the user. This would be compounded by the client doing roster update after connection establishment.
  • Introducing an optimized client/server protocol would constrain use to clients that supported the protocol.

This suggests that the server to server architecture is preferable.

Analysis and Conclusions

Our main conclusion is that for operation at 2400 bits/second, the basic performance achieved by both standard XMPP S2S and Optimized S2S can work well, both for low and high latency.

However, there is a significant performance overhead for establishing connections with standard XMPP S2S. In configurations where connections cannot be very long lived (e.g., where communications are intermittent) this overhead can have operational impact. Because the performance problems will hit immediately after network connectivity is restored, this may be a particular problem. This suggests a significant operational advantage to using Optimized S2S.

We also found that Optimized S2S gives significantly better performance than the IRC server we measured.