Many enterprises are now looking to deploy directories to support LDAP services, usually based on X.500 infrastructure. Typically such enterprises have already deployed relational databases, often containing information and providing information. It is important to understand the relationship between these two services.
This whitepaper explains the differences between directory and relational database, and shows how the two can be used within the enterprise. It shows that there are areas of overlap, and describes how to deal with these.
Both directory and relational databases are types of database. This section looks at the different characteristics of these databases. Both share the characteristic that they have mechanisms for dealing with schema and structure of information and are suitable for data, which is systematically organized.
The key characteristics of a relational database are:
- Objects have a complex relationship to each other, which is key to the way a relational database works. Queries can be based on complex relationship between objects.
- Relational databases support sophisticated transaction-based updates, and provide update tools that make use of these.
- The database is essentially centralized. In some cases, there is limited replication to give copies of the entire database. This is a practical consequence of the first and second characteristics.
- The schema is entirely application/user defined. This is important, as a relational database is a general purpose tool, which can be used to deal with a very wide range of problems. Generally, multiple databases do not share common schema.
It is the first characteristic, which is the key benefit of a relational system, which cannot be provided by a directory. For example if there is a corporate database which holds people information, and each entry holds both manager and location, it would be natural and efficient in a relational database to find 'all of the people whose managers are located in New York'. If the information was stored in a directory, it would be relatively inefficient to make this sort of query (you would search to find all of the managers located in New York, and then search to find those people managed by those managers. This could be done in two operations, but this places load on the client and it is likely to be slow if there are a lot of managers).
The key characteristics of a directory are:
- It can be provided in a highly distributed manner.
- Objects are essentially independent in the directory, and linked into a hierarchy. It is this independence which enables straightforward distributed provision.
- There is a fixed core schema for naming basic types of objects and managing them in a hierarchy. This common core is key to making a directory 'hang together'.
- The schema for separate objects in the directory is highly flexible and extensible.
It is the first characteristic that is the key benefit of a directory, which cannot be provided by a relational database. Where centralized provision does not make sense, a directory is essential.
A consequence of the core schema is that generic applications can rely on the core schema. This makes sense for functions, which will be the same in many organizations.
Solving Problems in the Enterprise
This section considers problems where the structure of information makes a pure WWW solution unsuitable and looks at the choices between directory and database.
Problems where only Database is suitable
Whenever there is a need to perform sophisticated analysis on data and the relationship between elements of data is non-trivial, a relational database is a good choice and directory would not work. Examples:
- Accounting System.
- Enterprise Resource Planning System.
Problems where only Directory is suitable
There are two classes of situation where a directory is suitable and a relational database is not. The first is where distributed provision is essential. For example if an enterprise structured with various autonomous units wishes to provide a structured information service, directory is the only viable option.
The second class of situation is where the open access to directory using open protocol and a core common schema enables integration of clients from multiple vendors around a common directory core. Examples of this are:
- Provision of address book functionality in Mail Clients.
- Supporting of message routing for a messaging infrastructure.
- Support of an X.509 based Public Key Infrastructure.
Where either could be used: The Overlap
There are some problems, which could be solved by either relational database or directory. The most important of these is provision of corporate white and yellow pages type functionality. Although this is seen as the key target for directory, building this type of service onto a relational database gives the same benefits of data structuring and user access can be via WWW.
Why the Overlap can be a problem
In some cases the overlap is not really a problem. An organization can choose to solve a problem by use of directory or relational database and it does not really matter which option they choose. For example, if a small company has a need to store information about its customers, either technology would work fine.
Problems with the overlap occur where an enterprise has requirements to use both technologies. For example, white pages information may need to be in the directory in order to support information lookup from LDAP clients and in a relational database forming an integral part of an Enterprise Resource Planning system. In particular, there is a problem to ensure consistency of both systems in light of updates. The rest of this white paper discusses how to deal with this.
Dealing with the Overlap
This section talks about dealing with the overlap. It focuses on the white pages service problem, as this is the major function which needs to be dealt with. The analysis would also be true for other functions in the overlap.
The consideration here is of target solutions without legacy systems, and does not consider issues of migration from legacy data, although some of the techniques discussed here are relevant.
Use of database only is not a viable option, because of the requirement for LDAP clients to access white pages data.
Use of directory only may be a good option in many organizations, as for most core uses, the directory provides good white pages functionality. Where it will not suffice is if there is a desire to use a relational database for broader management and analysis of corporate information. Typically, there will be a corporate database strategy that reflects this goal.
A simple answer to dealing with the overlap is simply to run both services independently and duplicate data and management. The major problem with this is duplication of effort to correctly maintain the data, and operational inconsistencies that will arise.
A superficially attractive option is to produce one system, which will do both. It is not possible to build a useful relational database on top of a directory. Building a directory on top of a relational database seems more promising. The following seems attractive:
The major problem with achieving this is that the directory and relational information models are significantly different. In order to gain satisfactory performance and meet directory functionality, the directory portion needs to be structured to meet directory needs and typically in a manner that can only be managed by the directory. This problem appears to be inherent to a solution which is built on top of a relational database, although it is possible that a database vendor might be able to make a satisfactory hybrid solution. Isode views that the basic characteristics of the systems are sufficiently different that there will not be a viable hybrid solution.
Gatewaying (cross access)
Rather than duplicate information by synchronization, it is possible to access them 'on demand'. For example, a relational database could go to the directory to find a certificate rather than holding a copy directly. This approach will optimize data storage. However, attributes, which are only available in a remote system, will typically not be available for searching (directory) or joins (relational database). This approach may be useful for 'second order' attributes, on which there is no need to search in the system on which they are not stored.
Another solution is to consider that data is mastered in one location, and to synchronize between the two. An extreme case of synchronization is to master all of the data in one service, and to synchronize all of the data into the other service.
Isode views that synchronization will be the primary technique for co-existence. In the short term, this will primarily be a means of populating directory from relational databases.
Isode's products are for provision of a directory service. The key elements are LDAP/X.500 Enterprise Directory Server, Enterprise Directory Management and Web to LDAP/X.500 Access Server.
Isode provides a scripting toolkit for loading data into and dumping data from the directory. This toolkit can be used to manage simple synchronization configurations.
Isode recommends a partner product, the Maxware Directory Data Manager (MDDM) for more complex synchronization. MDDM support ODBC and flat file access, and provides graphical configuration to help control synchronization. It supports incremental access to optimize transfer performance.
- The key strength of directory is distributed provision.
- The key strength of a relational database is the ability to make complex queries about the relations between objects.
- Enterprises will mix and match directory and relational databases for different problems.
- Enterprises will need directory for white pages and other functions.
- Where white pages information is needed in a relational database, use of a directory synchronization product is the best approach.