The core of any message switch is the set of messages that it is holding on disk. M-Switch products have a general queue format, that allows for efficient processing of messages in transit, and provides a highly flexible manipulation of the messages for protocol and format conversion.
The message queue is managed by a Queue Manager (QMGR) process which is responsible for scheduling the various channels taking into account the resource management requirements of the Message Switch. Other processes provide the information needed by the QMGR to build and update its queue, and so the QMGR never needs to read files on disk. This approach means that the QMGR is extremely fast. The structure allows sophisticated cross linking of information about queued messages, and thus enables the whole product to provide a very powerful scheduling mechanism which takes many factors into account.
The QMGR controls the dynamic operation of the whole product, scheduling other processes to run, and maintaining an overall view on the status of the message queue.
Queue Manager in depth
M-Switch message processing, subsequent to message arrival, is controlled by the Queue Manager. This processing includes transfer-out or delivery, and also other tasks for messages, such as content conversion, delivery report generation and removing messages from the on-disk queue.
M-Switch has a multi-process architecture, giving some significant advantages:
- It is more robust, as the different tasks which the MTA must perform are separated into appropriate modules. Failure in one part does not affect other processes.
- It is also more flexible, as arranging concurrent processing is straightforward, using operating system provided features.
A multi-process architecture can have some disadvantages. One issue is that the startup cost for a process can be large. There is both the operating system cost of creating a new process, and there is the application cost for initializing this process and perhaps creating inter-process communication links with other processes. Another issue arises if there are many processes competing for system resources. This can become very inefficient. The operating system can spend a significant amount of time switching between process contexts.
Queue Manager tasks
The primary task of the Queue Manager is to start processes which perform the required functions on messages, and pass messages for processing to these processes. A single process can handle more than one message and can handle messages for transfer to different MTAs. It can close the association with one MTA and open an association with another.
Another Queue Manager function is the handling of non-permanent errors on message processing tasks, tasks which can be re-tried at some point in the future. The Queue Manager uses a 'back-off' strategy for this. On each failed attempt, the next attempt is scheduled for a time in the future. The time between attempts increases linearly with the number of attempts, up to a maximum time interval.
When errors occur on one peer MTA, this does not prevent the transfer of messages to other peer MTAs. Similarly, if a message-specific error arises for one message, this does not prevent the transfer of messages for the same peer MTA.
The routing process assigns to each recipient of a message the next-hop MTA (or for local delivery) and a channel. The channel is the object which processes the appropriate protocol. A channel runs as a separate process started by the Queue Manager. The Queue Manager can run more than one process for a given channel.
The Queue Manager organizes the messages in a tree structure. The different channels are at the top. Attached to each channel are the peer MTAs reached by that channel. Then attached to each MTA is a list of “Mlists”. An Mlist is the internal name given to the list of recipients for a particular message to be transferred to a particular peer MTA.
If a message has multiple recipients and the recipients are to be transferred to different peer MTAs, there are multiple Mlists. These are treated separately, as if from separate messages, so the Mlists can be processed in parallel. Whether there is parallel processing depends upon the scheduling decisions made by the Queue Manager.
If there are several Mlists available of the same priority but for different MTAs on a given channel, the Queue Manager will choose for processing an Mlist for the MTA which has the smallest number of messages currently being processed.
If there are no processes running for a channel, then the Queue Manager will need to start such a process if there is at least one Mlist for that channel.
If there is at least one process running for a channel, but there are Mlists for the channel which are available for processing, then the Queue Manager has a choice: start a new channel process or wait for one of the currently running channel processes to finish the current message it is processing, so that it can then process another message. This is the fundamental scheduling decision.
Rapid Processing vs. System Resource Use
It is apparent that there are trade-offs to be made here between the rapid processing of messages and the use of system resources. There are two extremes:
- One process per channel: This makes good use of system resources, but it means that messages for the channel are processed only serially.
- A process for each message: This is very inefficient, in that the competition for system resources is high, and the system may thrash. Also, if there are messages for the same peer MTA, this may have problems with many messages being transferred in parallel.
The optimum throughput for the MTA lies somewhere between these two extremes. There is some point when increasing the number of channel processes does not gain any increase, and may decrease the throughput as there is contention for various resources. However, it is hard to calculate or discern the actual optimum. It depends significantly on a number of factors, including the character of the traffic, for instance number of messages for different MTAs, the size of the messages, and also factors like the speed and latency on network links to peer MTAs.
Queue Manager Scheduling Decisions
The Queue Manager in M-Switch currently uses some simple heuristics for making the scheduling decisions. These were derived some time ago and have been well tested in operation, having been used in the MTA under a number of different workload patterns. This mechanism has proved robust.
Number of channel processes
The first control is over the overall number of channel processes which can be run (qmgr_maxchans). This has the effect of limiting possible thrashing. However, the limit on processes depends upon the message priority. The Queue Manager is prepared to run more processes when the priority of the available message is urgent, than when it is non-urgent. The effect of this is that, even if no more processes would be started for a non-urgent or normal priority message, if an urgent message arrives, a new process for that message can be started. The number of channel processes, and the number to reserve for urgent (qmgr_reserve_urgent) and normal priority messages (qmgr_reserve_normal) can be configured by the MTA administrator.
Start rate of processes
The second control is over the rate at which the Queue Manager starts new processes for a given channel. The idea here is that if the channel processes the messages quickly, then it is more efficient to have few process than to have many processes competing for resources. There are two different rates which apply here, which are expressed as time intervals from the time when a channel process has been started. The first time interval (qmgr_chan_time2) is a short time, and no new processes for the channel will be started in this time. The second, longer time interval for preventing additional channel processes (qmgr_chan_time3) applies if the load on the channel is not great. Both these times can be configured.
Single message control
The third control stops the Queue Manager continually starting and stopping channel processes for single messages, which would be very inefficient, given the system cost of process creation. This is done by another time interval (qmgr_chan_time1). If all processes for a channel are stopped, the Queue Manager will not start another process within this time. This gives time for messages to accumulate for the channel, which then can be processed more efficiently by a single process. Again, this time can be configured.
If the Queue Manager is running different channel processes, and the total number of processes is restricted, there is a mechanism which attempts to balance the processes between the different channels. That is, if there is demand for processes, some channel processes will be stopped (when they have finished processing the current message).
The trade-off is between starting processing messages quickly (by decreasing the time between allowed channel process starts) and processing messages efficiently (by using a smaller number of processes to avoid system resource contention). This balance is best determined by tuning a system with the actual load.
Also, tuning the maximum number of processes is also best done in the light of actual circumstances. If large messages are transferred over slow links, the system will be less stressed with many processes (which are spending most of the time waiting for the remote end to respond), than if short messages are transferred over fast links, when a smaller number of processes will be more efficient.
Basic queuing theory tells us that if the average load (i.e. the message arrival rate) does not exceed the system capacity, then the average queue size of messages which can be processed immediately will be small (there may be queuing of messages which cannot be processed as a result of, for example, failure to connect to a peer MTA). Under these circumstances, the choices made by the Queue Manager in fact have little effect. Messages are processed in a timely fashion.
The main circumstance where you see the effect of the choices made by the Queue Manager are when there is an abnormally high arrival rate. The Queue Manager then has to adjust to attempt to clear the resulting queue in a timely fashion. We believe the current mechanisms for this are a reasonable and produce good results for average message throughput.