Summary

XMPP (the Internet Standard eXtensible Messaging and Presence Protocol) is being used for mission critical communication, where reliability is essential. Although use of XMPP can seem very reliable, a basic XMPP system has characteristics that are not reliable in some situations. This paper looks at situations where XMPP is not reliable, and discusses how to provide a reliable XMPP system, using advanced XMPP capabilities.

Where XMPP is Not Reliable

There are a number of ways in which components of an XMPP system can fail, that would lead to lack of reliability. Components that should be considered are:

  • XMPP Server. If an XMPP server fails, communication with the associated users will be prevented. Also, if the server fails while messages are in transit through the server, they will be lost.
  • Links. XMPP messages are sent down a TCP stream (without acknowledgement). This means that if a link fails (either between client and server or between two servers), messages may be lost.
  • MUC Room/Domain. The system operating a MUC (Multi-User Chat) room or domain holding MUC rooms may fail. MUC failures are considered in more detail below.
  • XMPP Client. There could be a failure in an XMPP client involved in communication, or the hardware on which it is running.
  • User. There could be a failure by a participating user, for example not noticing a critical message.

The importance of each of these types of failure will depend on the target environment, risks, and criticality of messages.

Server Failure & Clustering

XMPP Clustering is where multiple servers support the same XMPP domain, so that any of the set of cluster servers can handle the domain. This has a number of advantages:

  1. In the event of server failure (short or long outage) there is another server in place to take over. Clients and servers connected to the failed server can simply reconnect to one of the other cluster nodes.
  2. It can help with network failure and network partition. This can be conscious partitioning (e.g., operating a pair of cluster nodes, with one internal and one in the DMZ with external access). By placing cluster nodes at separate network locations, it can also deal with temporary network outages, minimizing client/server failures by always connecting clients to local servers.

XMPP Clustering is not standardized, so clustering support is vendor specific. Support by some products is much better than others, and this capability should be examined with care when selecting an XMPP server for a high reliability deployment.

When an XMPP server crashes, messages will generally be lost. XMPP servers, unlike message switches, do not write messages to permanent storage before sending them on (much like an IP router). It is desirable that XMPP servers switch high volumes of messages very quickly, so this is an architectural decision (and not a design error). The consequence of this is that there are situations where messages can be lost on server failure, although in practice this will be very rare. Design changes could reduce risk of losing messages in transit (but not eliminate the risk), and these changes would have negative performance impact. Dealing with this (low risk) message loss is covered later, using end to end acknowledgements.

Link Failures & XEP-0198

XMPP messages, both between client and server and between two servers, flow down either one or two TCP streams with messages unacknowledged. This is an efficient approach, and appropriate for high speed low latency message switching provided by XMPP. A consequence of this design is that messages that have been "sent", may be lost if the network or destination fails and the sender will be unaware of the loss. In deployments where all clients connect to a single server over a LAN, such message loss is very rare and so this protocol unreliability is not of operational concern. However, where connections are made over the Internet or less reliable links, some message loss is likely.

In environments where any message loss is of operational concern, or where network reliability is such that a link failures are to be expected, the standard protocols do not provide sufficient reliability. The solution is XEP-0198: Stream Management. XEPs (XMPP Extension Protocols) provide standardized extensions to the core XMPP protocols. The diagram below shows how it works. Essentially, messages may request an acknowledgement, which will cause an acknowledgement to be sent. This is done on the normal XMPP streams, so no (synchronous) hand-shaking is introduced.

Where a message is not acknowledged, a server can automatically resend the message (over a new connection). This will often happen in conjunction with connection reset and re-establishment. This automatic resend is essential for reliability. For a client, and alternative strategy is to tell the user about un-acknowledged messages, and let the user take appropriate action, which may or may not be message re-send.

This basic resend strategy solves the reliability problem, but will also cause resend when the acknowledgements are lost, which will lead to message duplication. The solution to this is the "resume" option of XEP-0198, where the server records messages that have been successfully received. On resumption the sender can use this to determine which unacknowledged messages have actually been sent.

End to End Acknowledgements

If you need end to end reliability, you need end to end acknowledgement. The reason for this is explained in "End-to-End Arguments in System Design".

There are two types of end to end acknowledgement that can be used with 1:1 XMPP messages:

  1. Delivery Report. This confirms that the message has reached the recipient’s client. It can be generated automatically, and will usually come back very quickly. This addresses failures in the recipient’s XMPP client and failures in intermediate servers and links.
  2. Read Receipt. This confirms that the message has been read by the recipient, and requires manual action by the recipient. This addresses recipient user-related failures.

Read receipts can be useful where it is critical that the recipient gets a specific message (e.g., an order to do something). They would be very awkward if used for large numbers of messages, so this needs to be an option reserved for messages where it is critical to be certain that the message has been read.

End to end acknowledgements are defined in XEP-0184: Message Receipts. The current version of the standard does not adequately differentiate between the two types of acknowledgement, and we anticipate that this will be addressed shortly in updates to this specification.

It is useful to consider the relationship of delivery reports and XEP-0198. A simple view would suggest to use delivery reports, as these will deal with all of the failures cases addressed by XEP-0198. However, use of XEP-0198 is most important, and should be used if delivery reports are used. The basic reason is that most errors (typically those due to network failures) will be handled by XEP-0198; the additional errors that will be caught by delivery reports are very rare (e.g., a server crashing while a message is being switched). For many deployments, use of XEP-0198 without delivery reports will give sufficient reliability. Errors caught by XEP-0198 usually relate to connection problems, and these are best handled locally in conjunction with re-connect. With delivery reports, there is no mechanism to distinguish between a lost delivery report and a lost message (i.e., there is no equivalent of XEP-0198 resume) and so use of delivery reports only will lead to a higher level of duplicate traffic.

Multi-User Chat (MUC)

Multi-User Chat (MUC) is the XMPP mechanism for communicating between multiple recipients and is often the preferred communication approach in mission critical environments. MUC introduces increased complexity over 1:1 chat, and a number of new failure modes. This is particularly the case when the MUC room is on a different server to the one used by the clients, as in the diagram above.

Resets & MUC Clustering

Both XMPP client and MUC room can fail without the other party knowing. Failure of the MUC room leads to particularly nasty problems, particularly when the MUC room is running on a server different to the one which the client is connected to. Consider an active MUC room with many participants, and the MUC server fails. When the server restarts an empty MUC room will be available. Locally connected clients will reconnect and join the MUC room. However remote clients will be unaware of the change – they will believe they are still joined to the MUC room and will miss traffic until they reconnect.

Having clients poll the MUC would be very expensive, particularly for a MUC with a large number of members. Client approaches to solve this problem are at best partial solutions, so the best approach is to never have MUCs fail.

There are many reasons why a single server will not have 100% availability (hardware failure; software failure; software or system upgrade). So a good solution to have high MUC availability is to cluster the MUC over several servers, keeping MUC state synchronized between the servers. This means that if one server fails, the MUC status will be maintained by another server, and so the MUC reset problem is avoided.

Ghosts

There is a related problem when an XMPP client, or the connection between an XMPP client and its server fails. After a failure, the client will often try to automatically rejoin the room. Because the client did not leave the room cleanly, the MUC room may still consider that the user is still in the MUC room and the rejoin will lead to the user being present a second time. The original MUC member is referred to as a “ghost”, as it remains in the MUC without really being there. Ghosts can also occur due to server failure and server to server link failure. A ghost will sometimes cause a client to get locked out of the room, because the MUC server refuses to let the client join, because it is already in the room. These ghost problems can be reduced to a level where they are extremely rare by careful client and server implementation.

Reliable MUC communication

It is useful to consider MUC communication reliability in two stages: getting the message to the MUC and then onward distribution to MUC members from the MUC. Use of XEP-0198 on all links is desirable, as this will remove problems of message loss due to link failure.

Reliability to the MUC

When an XMPP client sends a message to the MUC, the MUC will then send the message back to the user (as a MUC member). This means that the message originator can definitively know that the message has reached the MUC. This means that the client can detect failure to send to the MUC and can take appropriate action, which might be to rejoin the MUC and to retransmit the message, either automatically or prompt the user.
Once the message has reached the MUC, it will be held on the server with the MUC, and managed reliably as part of the MUC history. In a clustered environment, this should be independent of the reliability of any single server in the cluster.

Reliability from the MUC

There are two basic mechanisms to ensure reliability of messages sent to the MUC members:

  1. XEP-0198, will detect message loss.
  2. If a client is disconnected, its first action on reconnect will be to fetch MUC history (i.e., recent messages send to the MUC). This will ensure that the client does not miss any messages, and so in combination with XEP-0198, good basic reliability for delivery beyond the MUC can be obtained.

Originator Acknowledgements

As noted for 1:1 communication, the only way to ensure reliability to all recipients is end to end acknowledgements: and acknowledgement back from each MUC member. This would lead to a lot of additional traffic. Tracking acks would need to be done by the originating client, which would be complex. Acks could be on delivery or on message read. An ack from all MUC members on message read might have value for a particularly critical message. There are no standards to provide this, but a specification could be developed, which would be similar to XEP-0184.

Why M-Link?

Many of the reliability techniques described here are provided by the server, and Isode’s M-Link server supports all of the server side reliability techniques described in this paper. Two are of particular note.

Clustering

clustering

XMPP clustering is not standardized, and the techniques used vary between products that do clustering. M-Link uses a peer to peer cluster architecture, with all cluster nodes directly connected to each other, with updates synchronized to all cluster nodes. A common configuration will have all cluster nodes connected by a fast LAN.

WAN clustering

M-Link also supports wide area clustering, where cluster nodes are connected over "Internet" quality links. This is important to support reliability in a distributed survivable environment.

MUC clustering is also supported (most XMPP servers do not do this). This provides MUC reliability when a node fails, and prevents reset problems described. MUC distribution is also optimized so that messages are sent out to locally connected users and peers, which is particularly important when wide area clustering is used.

XEP-0198

M-Link supports XEP-0198 for both client/server and server/server communication. Currently this does not include resume, which will be added in a future release. XEP-0198 is important where message reliability is needed.

Xep-0198 message acknowledge

This shows the benefits of XEP-0198 support client/server. The link between client and server was broken, and a "throbber" on unacknowledged messages appears, to warn the user that the messages have not been acknowledged. In slow network conditions, the throbber will vanish when the acknowledgement arrives.

XEP 0918 message acknowledge failure

If the link to the server fails, the unacknowledged messages are clearly marked, and the user can decide what action to take.

Conclusions

This paper has described a number of techniques for providing a highly reliable XMPP service, some for client implementation and some for server implementation. Server clustering, including MUC clustering, and XEP-0198 support are particularly important.