Federated Multi-User Chat Efficient and Resilient Operation over Slow and Unreliable Networks
XMPP (the Internet Standard eXtensible Messaging and Presence Protocol) Multi-User Chat (MUC) is normally provided by a single server, with clients accessing a MUC Room via their local XMPP servers. This standard approach gives performance and resilience problems when operating over constrained networks. This paper looks at how federating the MUC service can address these problems. Isode's approach to Federated MUC as implemented in the M-Link XMPP server is described in the context of evolving XMPP standards, and benefits of Federated MUC for purposes other than Constrained Networks are considered.
Isode whitepapers are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
MUC for Military Deployments
Multi-User Chat facilitates efficient information sharing between a group of users, and is of major importance for military deployments of instant messaging. It is used to support live operations, and key functions such as time sensitive targeting. Such deployments will often use constrained networks, which may be slow, high latency and unreliable. It is important that MUC performs well in these environments. Although the capabilities described in this paper are general purpose, they are of particular interest to military deployments.
1:1 Chat over Constrained Networks
Standard XMPP works reasonably well over constrained networks, but performs poorly on startup/reconnect, particularly for high latency networks. Isode has developed a number of capabilities to address this, including operation over HF Radio. In particular this paper looks at how M-Link operates over Satcom and HF Radio networks, and peer-to-peer protocol operations, showing how connections can be optimized by:
- Using protocols with a minimum number of handshakes. XEP-0361: Zero Handshake Server to Server Protocol.
- Reducing the amount of data exchanged on connection setup.
- Selectively filtering messages and data.
These techniques give highly optimized performance for 1:1 chat and presence exchange.
Performance and Resilience Issues with Standard MUC
Although MUC can use Zero Handshake Server to Server Protocol, MUC introduces a number of additional issues. A standard MUC room is associated with a single XMPP server. This room can be joined by clients on local or remote servers, as shown below.
This architecture leads to a number of problems when operating over a constrained network:
- If the link fails completely for a period, the users on servers other than the one where the MUC room is hosted are completely disconnected from the MUC. This can be a severe problem. Consider the scenario when some users are on a ship and others are on shore. When the ship/shore link fails, it is highly desirable to be able to continue "local" conversations while the link is down.
- When a message is sent to the MUC, a copy is sent to each MUC recipient over the slow link. Where there are multiple MUC members on a single remote server, this is inefficient.
- When a user connected by the slow link sends a message to the MUC, it will be sent over the link and then sent back again after the MUC expansion. This is inefficient.
- When a user joins a MUC, the user is given a history of messages. This is helpful, but will cause traffic over the slow link whenever a user joins the MUC from a remote server.
- When users join or leave a MUC or change presence status, this is reported to the MUC (and to all MUC members). This is useful information, but can lead to significant traffic which may be too costly for a slow link.
This paper considers how all of these issues can be addressed.
Federated MUC Architecture
The core approach to solving the problems of standardized MUC is to federate the provision of MUC, just as the distribution of XMPP servers federates the provision of 1:1 chat and presence. The core architecture is shown below.
A Federated MUC (FMUC) room is provided on a set of servers, with a MUC room on each server participating to provide the FMUC room.
Each of the MUC rooms on each server acts as a "standard" MUC room. This means that if a MUC server is (temporarily) separated from other servers in the federation, that it can continue to serve the local users with local traffic. This federated architecture is resilient to network breaks.
The MUC rooms in the federation can be thought of as subscribing to each other. This enables messages and presence information to be exchanged, and gives the effect to users of the federated MUC of all of the local and remote users being in the MUC. So the user appearance of the federated MUC is essentially the same as for the centralized MUC. A consequence of this approach is that message and presence information is sent only once between a pair of MUCs, avoiding the additional link traffic of the centralized MUC.
M-Link supports the architecture illustrated above to address this. MUC rooms can be independently configured on each MUC server in the federation, and then the rooms configured so that they subscribe to each other. Two benefits arise from this in all deployments:
- Messages and Presence information from the MUC are only sent once over the slow link (rather than once per client).
- MUC History is provided locally to clients, and so there is no requirement to share history over a slow link.
A FMUC Room can comprise of two or more MUC rooms on seperate servers connected as an acyclic graph, as shown above. Participants in each local MUC Room will have the effect of participating in the single Federated Room.
Isode is working to develop a standardized protocol for Federated MUC as XEP-0289: Federated MUC for Constrained Environments.
Reporting Link Status
When MUCs and XMPP in general are deployed in a high speed reliable environment, servers will be accessible for a high percentage of the time. The core XMPP specifications are designed to work well when servers are available. Where XMPP is deployed in an environment where network failures are not unusual, this raises client issues.
In XMPP, user status is shown based on client availability. If the link between a pair of clients is broken, the user status is shown as offline. Where the link breaks between a pair of MUC Rooms participating in a FMUC Room, the only options currently available are to show the remote users as offline or to leave status unchanged. Neither option is good.
What is needed is a mechanism to show the link status, so that the server can display that the link is down in conjunction with the user's last known presence. Such a mechanism would be useful for 1:1 chat and particularly useful for FMUC. Isode is standardizing XEP 0310: Presence State Annotations which allows this information to be shared. This would enable an XMPP client to make clear to the local user, that there are connection problems with one or more remote users.
Consider a scenario where a shore-based user is participating in a MUC room federated between ship and shore. The shore-based user might see the MUC status of a ship-based user as "Link to Server ship.navy.mil is unavailable. Last known status for firstname.lastname@example.org was 'online'". This additional information will sometimes be very important.
Fast MUC Reconnect
When a user joins a MUC group, the client will ask the server for "MUC history". This usually works well, as the user will typically be joining after being offline for a fairly long period. If there is a network break and a client reconnects after a short gap, this mechanism is inefficient. It is a particular problem over a slow link, as it leads to lots of unnecessary traffic. Isode has addressed this by developing a new standard XEP -0311: Fast MUC Reconnect. Fast MUC Reconnect allows the client to say to the server "This was the last message I received; Send me any history I need". This ensures that the client does not lose any messages, while avoiding unnecessary traffic.
In an FMUC deployment, clients will usually have fast connections to the local MUC server, and so Fast MUC Reconnect is not of critical importance for client/server connections. However, MUC and MUC history are used between FMUC servers, and Fast MUC Reconnect can be used between a pair of FMUC servers. So Fast MUC Reconnect is an essential component of a constrained bandwidth FMUC deployment.
When a user participates in a MUC room, all of the other participants in the room are shown, along with their current presence status. This can be valuable contextual information. When a user changes status (e.g., from "available" to "away" when the user steps away from their terminal for a few minutes) this status change is communicated to all other MUC members. This causes network overhead.
Commonly a MUC room will have a limited number of active participants. Often the status of these active participants will be important (for example in the case of a MUC room being used for Time Sensitive Targeting). The value of sharing presence information for these users is high, and the network traffic is justified. Other users are simply tracking what is going on, and there is minimal value in sharing their presence status with other members of the MUC.
Isode will address this by providing an option to extend the current "Visitor" role of a MUC room member. Such a member will be shown in the room, but is not allowed to send messages. M-Link will provide an option to "hide" observers, so that normal MUC members will not see them. MUC room moderators will see them, so they can be controlled. A key benefit of this control is that it will enable presence traffic due to MUC Observers to be completely removed between a pair of FMUC servers.
M-Link provides an option to eliminate all presence traffic between a pair of servers. This is extremely useful for very slow links (e.g. 75 bits/sec between submarines) but removes useful information. Unfortunately, MUC requires some presence support, so this mechanism as implemented will not work for FMUC.
We are adapting M-Link so that presence traffic can be minimized for FMUC. This will include removal of all bar the minimum necessary traffic, and an intermediate option, to eliminate all presences traffic other than online/offline. This will ensure that MUC membership is shown in a conformant manner and users can see when other users leave and join a room. This approach will eliminate exchange of all other presence information. This will be useful when it is seen that eliminating this traffic is more important than the information provided.
Other uses of Federated MUC
An additional benefit of FMUC is that it can enable any MUC service to be distributed over multiple MUC servers. Reasons that this may be useful:
- Load sharing. For a very large MUC, FMUC will allow the distribution of messages to be shared by multiple MUC servers.
- Delegated administration. FMUC will allow a site to join a MUC, and then have local membership of the MUC managed locally.
- Hiding site membership. FMUC will enable a site to join a MUC and then to hide which users and how many users are members of the MUC at that site.
The core FMUC works in a completely symmetrical and distributed manner, which is highly efficient and supports operation over network breaks. One consequence of this is that MUC users in different locations may see messages arrive in a different order.
Where FMUC is deployed over fast reliable networks, it can be operated in a "Single Master" mode, where one FMUC node is treated as the master. All messages must flow to the master and then back to the other nodes. This mode increases traffic and prevents disconnected operation. However, the single master enables message ordering such that all clients will see messages in the same order. This mode of operation may be preferable for some deployments over fast networks.
Alternative Protocols to FMUC
There are two protocols similar to FMUC:
We view that these are less suitable for the constrained network environment, broadly because they do not offer such effective traffic savings as FMUC.
DMUC1 has the following key differences to FMUC:
- Single level of distribution, with one "firsthost" and one or more "peerhosts".
- Protocol to set up relationship between firsthost and peerhosts.
- Client discovery of all peerhosts.
- No option to present all clients with MUC messages in same order.
DMUC2 has the following key differences to FMUC:
- Single MASTER and multiple SLAVEs connected directly or indirectly
- Messages always presented to client in order (no option for single message transfer)
- Keepalive protocol between MASTER/SLAVE and SLAVE.
We anticipate that DMUC1 and DMUC2 will not be progressed, and that FMUC will become the preferred standard.
M-Link Implementation Status
M-Link R15.2 contains Isode's first implementation of FMUC that provides core XEP-0289 support including disconnected operation. For future updates and releases Isode plans to add:
- Presence State Annotations.
- Fast MUC Reconnect.
- Not showing MUC observers.
- Optimized presence filtering for MUC and FMUC.
We will also expect to provide a version of the Swift XMPP Client that supports Presence State Annotations. The other FMUC capabilities do not need special client support.
This paper describes Federated MUC, and how this can significantly improve performance for MUC over Constrained Networks.