Isode server products (M-Vault, M-Switch, M-Store X.400
and M-Box) are deployed in a wide variety of situations, and usually
there is a high service reliance placed on them. In some cases, a single
server provides a complete standalone service. In other systems there
are large numbers of Isode servers forming just one part of a complex
system.
Isode’s approach to server design and management is that the
products are building blocks, with maximum use of open standard protocols
for interconnection. Management is almost entirely client/server, as
discussed in the white paper Isode
Management Architecture: Client/Server and Directory.
This combination of building block + client/server means that the approach
to operational management needs to be considered as part of the overall
system design. This paper explains the approach Isode has taken and
the options provided, that can be used to build an operational system.
Overall Operator Monitoring

In most deployments, Isode servers will run day in and day out without
errors of any kind. When faults occur, they are often in response to
failures of other (or external) components. The combination of these
two points leads to most sites choosing to use a general purpose Management
Station such as HP Openview or BMC Patrol to be used by system operators
to provide one point to monitor a wide range of systems. Use of such
a tool gives many advantages:
- One tool can be used to monitor many servers.
- Operator training costs are reduced.
- There is good flexibility in monitoring and notification options.
This top level approach is primarily used to ensure that the system
is running correctly, and most of the time it will be operating smoothly
and without error. In the event of a failure, the operator will switch
to using a specialised tool, or contact an appropriate expert.
A Management Station will typically monitor:
- Isode Servers, and in particular:
- Up/down status. Knowing that the server is up and running is
a top priority.
- General "health" parameters are important as a secondary
check. If there are large decreases (or increases) of activity,
this is likely to indicate that there is a problem. Example parameters:
- Number of open connections.
- Protocol or operation response time.
- Message throughput.
- Message latency (M-Switch).
- Faults & Events (discussed later).
- Local system resources associated with the Isode servers, as problems
may be due to the underlying system. In particular:
- System up/down.
- Disk space available.
- Processor usage.
- Network resources. The status of routers, switches and other network
components is important, as failures often affect applications.
- Other components and applications within the total system being
monitored.
It is usually straightforward to monitor network and local system components
from any standard Management Station product. To monitor the up/down
status and general server health parameters, Isode uses SNMP (Simple
Network Monitoring Protocol) and the Internet Standard MADMAN (Mail
and Directory Management MIBs (Management Information Base), that were
originally designed by Steve Kille (Isode CEO).
Use of SNMP is a good choice, as it is supported by most Management
Stations. It is important that this basic monitoring is done by the
Management Station polling the applications from time to time, as SNMP
uses an unreliable data transport (User Datagram Protocol) and servers
in severe difficulty should not be relied on to report errors. This
is exactly the function provided by MADMAN.
Faults & Events

A central component of Isode's management architecture is the event
subsystem. Isode has an extensible list of events, each associated with
a "facility" which is a functional area of the product set.
Each event has an associated severity, as set out in the table below:
| Severity Level |
Description |
Example |
Operator Intervention Required |
Administrator Intervention Required |
| Critical |
A serious error has occurred, leading to total loss of service. |
License file expired. |
Requires immediate intervention. |
As for operator. |
| Fatal |
A serious error has occurred, which is likely to cause partial
loss of service. |
Running out of disk space. |
Requires immediate inspection. |
As for operator. |
| Error |
An error has occurred, which may cause partial loss of service.
The sub-system will usually recover from this without intervention. |
Association Rejected to a remote MTA. |
May require inspection. |
May be appropriate to investigate repeated errors, or unusual
error patterns. |
| Warning |
Something unexpected has happened but is not causing a loss of
service. |
Protocol violation by remote system. |
May be useful for operator to observe. |
Administrator should perform non-urgent investigation. |
| Authfail |
An authentication or authorisation failure. |
LDAP Client authentication to server fails. |
Operator may need to investigate. |
Administrator should perform investigation of unusual warnings. |
| AuthOK |
A successful authentication or authorization. |
LDAP Client authentication to server succeeds. |
Not usually useful for operator. |
May be useful additional information for administrator. |
| Notice |
Informative logging, recording major stages of operational processing. |
Called service: smtp-external. |
May be useful in monitoring low volume systems. |
As for operator. |
| Information |
Informative Logging, providing more detail than notice level. |
Record each routing option reviewed. |
Not appropriate. |
May be useful to provide additional logging detail. |
| Detail |
Informative logging at a more detailed level. |
Log each X.400 checkpoint. |
Not appropriate. |
May be useful to provide additional logging detail. |
| Success |
Informative logging, at a level similar to Detail. |
Complete content conversion calculation. |
Not appropriate. |
May be useful to provide additional logging detail. |
| PDU |
A logging option to record specific types of PDU (Protocol Data
Unit). |
Record LDAP Add PDUs |
Not appropriate. |
For use by experienced administrator. |
| Debug |
Records information about progress and parameters within the program. |
ckadr.c:360 normalised address OK (96). |
Not appropriate. |
Generally only useful when investigating complex problems in consultation
with Isode support. |
When an event occurs, the Isode application will make a call to the
event system. The Isode event system will be configured to send this
event to zero or more event streams. Event streams are of several different
types:
- File. The event is written out to a file, in a regular format. This
file may be used directly or viewed remotely with Isode’s Event
Viewer program.
- Protocol streams which send events by protocol. Isode supports three
event protocols:
- Syslog (the standard Unix event protocol).
- Windows Events (a standard Interface for events on Windows platform).
- SNMP. Use of Traps to send alerts (Planned for Q3 2006).
In a typical system, errors at Authfail level and above will be recorded
in log files, so that they are available for operator and administrator
inspection. Critical, Fatal, and perhaps selected Error level events
will be fed by protocol to a Management Station, so that the operator
will be made aware of events that require urgent attention.
Detailed Application Monitoring
Use of a general purpose Management Station is ideal for top level
monitoring of the whole system, with a small amount of information on
each server. Configuring a general purpose management station to deal
with detailed management of a specific application would be a lot of
work, and produce a rather inadequate result. For this reason, Isode’s
management architecture make use of application specific tools for more
detailed management.
Isode does this by have "Management Consoles" for each server
product. These are DConsole (for M-Vault), MConsole (for M-Switch),
XMSConsole (for M-Store X.400) and a future product for M-Box. There
are two primary uses for these console products:
- In a "head up" display, operating in a fixed configuration
on a visible screen. The Consoles are all designed to operate in a
"monitor mode" which will show the current status of key
aspects of the servers being monitored (e.g., message queues for M-Switch
and replication agreements for M-Vault). This display will enable
changes from normal status to be noticed quickly and easily.
- For use by skilled operators and administrators to do advanced monitoring
and preliminary problem diagnosis. The Consoles allow an operator
to look in more detail at the server and to make operational changes
(e.g., to delete a message from a queue).

Configuration Changes
Isode applications make use of directory based configuration, and have
special tools for managing this configuration in the directory. In general,
configuration will be separate to operational monitoring, although there
is a clear interaction and in some situations configuration changes
will be made to address operational problems.
Reporting & Statistics
Generation of reports is not usually an operational task, but often
makes use of similar information. Isode’s approach to reports
on operational information is to record information in an audit database.
As well as supporting statistics, this information is used for operational
services such as message tracking, archive searching, and quarantine
management.
Supporting Service Level Agreements
Many operational systems operate according to service level agreements
(SLAs). Establishing an approach to conformance to SLAs is a task for
the system designer. It is important that Isode provide the right building
blocks to enable this to happen. Some specific things that Isode provides:
- Reporting and Statistics. SLAs will generally require showing that
targets have been achieved, and appropriate reports are key to achieving
this.
- Where operator action is needed, the Management Station configuration
will be key to meeting some types of agreement (e.g., to respond to
all outages within 5 minutes).
- Where an SLA has actions dependent on a complex combination of conditions
from various components, the Management Station is the natural place
to manage such SLAs.
- In order to support complex SLAs, it is important that Isode products
support appropriate underlying functions. We have worked to design
event structure and SNMP polling to provide appropriate infrastructure.
We are happy to extend our support to new events, where current coverage
does not fit a required SLA.
Conclusions
This paper has explained the operational management approach taken
by Isode.