On this page you'll find information on features designed to minimise the impact of Server Failures and/or Link Failures in order to provide a reliable XMPP service. M-Link offers LAN and WAN clustering to protect against potential server failures as well as reliable messaging features to address the consequences of link failure.

Server Clustering

The core XMPP model is one server per domain. A single M-Link Server can support multiple domains, with delegated administration of users within each supported domain. XMPP Clustering is a technique to enable a single domain to be supported by multiple servers. Server Clustering is a feature of the M-Link User Server product. Clustering is not supported for M-Link IRC Gateway or M-Link Edge. M-Link Edge may be deployed in an active/active configuration, which does not need clustering.

XMPP clustering needs to synchronize 'state' between servers to ensure that messages are routed to correct destinations and that presence information is correct. It is also important that information from various services (Presence, Multi User Chat (MUC), and Publish Subscribe (PubSub)) are set on the local server where possible. For example, where MUC subscribers are on multiple servers, participant groups should be managed locally on each server, and messages sent directly to other local users without having to go to another server first. A related characteristic is that MUC and PubSub will continue operation in the event of any cluster node failing.

Isode's XMPP Clustering implementation is designed to work well for both LAN Clustering and Wide Area Clustering environments. While M-Link Clustering may be used with many nodes, Isode generally recommends deployment of two node clustering. It is strongly recommended to review with Isode when considering clusters of more than two nodes.

Local Area Network (LAN) Clustering

In LAN Clustering there are multiple clustered XMPP servers operating on a common fast highly reliable local network. Clustering in this environment is important for large deployments, as it enables servers to be added to support load levels greater than can be handled by a single server. This horizontal scaling is important for service providers and large enterprises. It also provides reliability, so that service can continue in the event of failure (accidental or planned) of a server.

Wide Area Network (WAN) Clustering

In Wide Area Clustering the XMPP servers are interconnected by links that may be slower and less reliable than a LAN. There are various scenarios where this is important:

  • Off site operation of a server, so that service can continue in event of site failure (Disaster Recovery).
  • Support of organizations with multiple sites, so that a server can be run at each site.
  • Support of a distributed military deployment with, for example, one server at HQ and another in the field.

Supporting Wide Area Clustering requires protocols and algorithms that will deal with wide area network throughput/latency and periods where connectivity is lost. Servers need to be kept in sync, but operations should continue as well as possible when there are network failures.

Having a server close to a client with good connectivity will give a fast and robust client experience. It is important that local traffic is optimized, and does not switch between servers except where needed. Handling traffic locally to a server without unnecessary switching is particularly important for Wide Area Clustering.

Reliable Messaging

There are a number of ways in which an XMPP service can become unreliable, usually involving a failure in one or more components of the service. In a constrained network deployment, where link failures can be common, Isode's XMPP products (both the M-Link server and the Swift XMPP client) include capabilities to alert the user to and protect them from link failures.

Message Acknowledgements

Users of messaging systems (email or instant messaging) operating in environments with internet quality links often make the usually justified assumption that a message has been delivered. Users in constrained networking environments, where link failures are common, cannot afford to make that assumption.

M-Link and Swift both support XEP-0198: Stream Management for message acknowledgements clearly showing the status of messages and allowing the user to decide on remedial action in the event of non-delivery.

Federated Multi-User Chat

In standard Multi-User Chat (MUC) a room is hosted on one server and participants joining the room may be local to that server or joining via another server using standard XMPP server federation. A link failure will disrupt the ability of users on a federated server to participate in the MUC.

M-Link supports XEP-0298: Federated MUC for Constrained Environments, federating the provision of MUC, just as the distribution of XMPP servers federates the provision of 1:1 chat. More information on Federated Multi-User Chat can be found in the whitepaper [Federated Multi-User Chat: Efficient and Reliable Operation over Slow and Unreliable Networks].