There are many situations where it is useful in a directory service for directory data to be available in more than one directory server. This paper looks at three techniques for achieving this, and discusses when each is appropriate:
- Replication, using a directory replication protocol.
- Direct synchronization between a pair of directory servers.
- Indirect synchronization between two or more servers.
Replication has the best operational characteristics and lowest functionality. Indirect synchronization has the highest functionality and poorest operational characteristics. Direct synchronization is intermediate. The paper discusses the trade-offs between the approaches.
There are a number or reasons why data in one directory server needs to be made available in another one:
- Availability. By placing data in more than one server, a directory client is not dependent on availability or accessibility of a specific server. It is good practice to have at least two directory servers to ensure high availability.
- Load sharing. Where there is very high load on the directory, replicating data between servers allows load to be shared.
- Locality. It is sometimes desirable to replicate data to ensure that clients have access to a local or "reasonably local", for example to provide high bandwidth and low latency access. For some directories, such as military directory, very high replication is important, so that data is always available in a local server to protect directory client from network failure.
- Getting at data in other servers. Some clients may be only able to access a local directory server. In order to make additional data available to these users, it needs to be replicated into the directory server that the clients access.
- Data Access Restrictions. Some data in a directory may be sensitive, and there is a need to selectively replicate data to another directory in the context of these restrictions. For example, selected data in a corporate directory may be made available in an external publicly accessible directory.
- Data Mapping. Data may need to be in different formats in different directory servers. This could be small changes in representation of specific attributes or differences in technology.
Different solutions are appropriate to meet these differing requirements.
This paper looks at three distinct approaches for moving data between directory servers:
- Directory Replication. Here data is moved directly between a pair of directory servers using a special protocol in which both servers actively participate.
- Direct Synchronization. Synchronization moves data between servers without any direct interaction between the directory servers. In direct synchronization, data is moved between a pair of servers by an intermediate process.
- Indirect Synchronization. With indirect synchronization an intermediate application (with its own database) is used, and two or more directory servers are synchronized by this central application.
The following sections describe each of these techniques in more detail.
Directory replication is where data is replicated directly between a pair of directory servers, with active participation of both servers, as illustrated above.
X.500 DISP (Directory Information Shadowing Protocol) is the only open standard for directory replication. It is supported by Isode's M-Vault product and by other directory servers. Some directory products support proprietary replication protocols. The rest of this section looks at characteristics of directory replication in terms of capabilities provided by DISP. Key replication features:
- Master to shadow data copying, so that in DISP data is transferred from a "supplier" to a consumer.
- Secondary shadowing: data can be transferred onward from a shadow to further replicas.
- Shared state. Both ends of the replication are aware of an agreement and its state. In the event of inconsistencies, a "total update" can be performed automatically to ensure that the shadow copy is an accurate copy of the master.
- Incremental replication down to attribute value level, so that changes are replicated very efficiently.
- Option for on demand replication, to give immediate update of the shadow copy, or scheduled batched updates.
- Push or pull replication.
- Peer authentication and data integrity using digital signatures.
- Can replicate access control and "knowledge" information.
- Flexible replication of selected parts of the DIT (Directory Information Tree), including omission of sub-trees.
- Filtering of attributes based on object class.
Where it can be used, directory replication is the ideal approach as
it is efficient and robust. It is ideal for replicating large directories
and providing immediate updates to shadow copies.
There are situations where replication is insufficiently flexible and synchronization needs to be used. These include:
- Two servers do not support a common directory replication protocol
- Schema is different for the data being transferred. Replication requires both directory servers to have a common view of schema for the data being replicated.
- Data needs to be modified or renamed. Replication supports data
filtering, but does not allow data to be changed.
In direct synchronization, a process copies data from one directory server into another one. The directory servers do not communicate with each other, and neither server is “aware” that data is being synchronized with the other server.
This approach is widely used in directory deployments, and is typically implemented by custom developed scripts to perform the synchronization functionality. Isode's Sodium-Sync product, integrated with Isode's Sodium directory administration tool implements direct synchronization. This section describes direct synchronization in terms of Sodium-Sync’s current and planned features:
- Support of directories using LDAP and X.500 DAP.
- No support for other types of directory or database. This is a product choice, rather than an architectural limitation of direct synchronization. By restricting to LDAP and DAP (both of which are based on the X.500 information model) a single GUI interface optimized for these protocols can be given onto both directory servers participating in the synchronization. This enables data on both sides and the synchronization to be managed from a single GUI.
- Digital signature based peer authentication (LDAP and DAP) and signed operations (DAP).
- Support of a file interface using LDIF (LDAP Data Interchange Format).
- Extensive data and entry filtering.
- Attribute mapping and transformation.
- Entry renaming, including modification of attributes with directory name value (e.g., Role Occupant).
- Data merge means that information can be added to the "copy" server. This means that names and selected attributes can be synchronized from the "master" server, and other attributes added directly to the "copy".
- Synchronize single entries (e.g., to copy a CRL attribute) or flexibly configured parts of the DIT.
- Total update. Because direct synchronization is state-less, this can always be executed to correct errors and to bring the copy up to date with the master. In most setups, it makes sense to perform regular total updates.
- Schedule of total update at regular times (e.g., daily) or at fixed intervals.
- Incremental update. Entry addition and modification can be efficiently synchronized using standard DAP/LDAP. Efficient synchronization of deletes needs non-standard features. Most directory product provide capabilities to do this efficiently. Isode plans to provide optimized support of M-Vault and for Active Directory.
- Minimum "state" held. For total update, no state is needed. For incremental update, information on the last change made is needed. If this state is lost, a total update can be made.
- Support multiple synchronizations from one instance.
Direct Synchronization provides a reasonably efficient way to provide flexible transfer of data for many scenarios, and use of multiple independent direct synchronizations can be used to deal with data from many servers. However, there are situations where more flexibility is required. In particular:
- Handling data from multiple heterogeneous data sources to merge data.
- Two way synchronization where arbitrary changes to an entry may
be made on either server.
The feature that distinguishes Indirect Synchronization is that directory servers synchronize independently with a central database, and not directly with each other. The diagram above broadly reflects the architecture of a number of directory synchronization (meta directory) products that are available on the market. This description will cover many products, although some will be more or less aligned. MaXware’s DSE product, is a good example of this model. Isode has worked with MaXware over many years, and DSE is used by Isode customers.
Key features of indirect synchronization are:
- There are multiple "channels" or "connectors", which integrate with a wide range of databases and directory servers, to enable unification of a variety of directories and directory-like databases (e.g., HR Databases, Relational Databases, Microsoft Exchange, Lotus Notes).
- Data can move in both directions.
- Because of the complex inter-relationship between the components, the "state" of the synchronization is a critical part of the overall system. This state is typically held in a database, which may be central or a database associated with each connector. In general, a simple reset from the directory servers being synchronized is not possible.
- Very flexible data mapping and transformation is central to the systems function.
This type of system provides high flexibility to achieve any synchronization desired. Setup of such systems will often require professional services from the synchronization vendor.
Directory replication provides robust and very efficient transfer of data from supplier to shadow copy, and should be used where possible. It is ideal for systems replicating data for performance and availability, and where update propagation times must be minimized.
Indirect synchronization provides a mechanism for highly flexible integration of data from a wide range of sources.
Direct synchronization offers an intermediate capability. With a product such as Isode’s Sodium-Sync, synchronizing data between LDAP and DAP servers is efficient and easy to set up.