Purpose
Many enterprises are now looking to deploy directories to support
LDAP services, usually based on X.500 infrastructure. Typically such
enterprises have already deployed relational databases, often containing
information and providing information. It is important to understand
the relationship between these two services.
This white paper explains the differences between directory and relational
database, and shows how the two can be used within the enterprise.
It shows that there are areas of overlap, and describes how to deal
with these.
Technology Comparison
Both directory and relational databases are types of database. This
section looks at the different characteristics of these databases.
Both share the characteristic that they have mechanisms for dealing
with schema and structure of information and are suitable for data,
which is systematically organized.
Relational Database
The key characteristics of a relational database are:
- Objects have a complex relationship to each other, which is key
to the way a relational database works. Queries can be based on complex
relationship between objects.
- Relational databases support sophisticated transaction-based updates,
and provide update tools that make use of these.
- The database is essentially centralized. In some cases, there is
limited replication to give copies of the entire database. This is
a practical consequence of the first and second characteristics.
- The schema is entirely application/user defined. This is important,
as a relational database is a general purpose tool, which can be
used to deal with a very wide range of problems. Generally, multiple
databases do not share common schema.
It is the first characteristic, which is the key benefit of a relational
system, which cannot be provided by a directory. For example if there
is a corporate database which holds people information, and each entry
holds both manager and location, it would be natural and efficient
in a relational database to find 'all of the people whose managers
are located in New York'. If the information was stored in a directory,
it would be relatively inefficient to make this sort of query (you
would search to find all of the managers located in New York, and then
search to find those people managed by those managers. This could be
done in two operations, but this places load on the client and it is
likely to be slow if there are a lot of managers).
Directory
The key characteristics of a directory are:
- It can be provided in a highly distributed manner.
- Objects are essentially independent in the directory, and linked
into a hierarchy. It is this independence which enables straightforward
distributed provision.
- There is a fixed core schema for naming basic types of objects
and managing them in a hierarchy. This common core is key to making
a directory 'hang together'.
- The schema for separate objects in the directory is highly flexible
and extensible.
It is the first characteristic that is the key benefit of a directory,
which cannot be provided by a relational database. Where centralized
provision does not make sense, a directory is essential.
A consequence of the core schema is that generic applications can
rely on the core schema. This makes sense for functions, which will
be the same in many organizations.
Solving Problems in the Enterprise
This section considers problems where the structure of information
makes a pure WWW solution unsuitable and looks at the choices between
directory and database.
Problems where only Database is suitable
Whenever there is a need to perform sophisticated analysis on data
and the relationship between elements of data is non-trivial, a relational
database is a good choice and directory would not work. Examples:
- Accounting System.
- Enterprise Resource Planning System.
Problems where only Directory is suitable
There are two classes of situation where a directory is suitable and
a relational database is not. The first is where distributed provision
is essential. For example if an enterprise structured with various
autonomous units wishes to provide a structured information service,
directory is the only viable option.
The second class of situation is where the open access to directory
using open protocol and a core common schema enables integration of
clients from multiple vendors around a common directory core. Examples
of this are:
- Provision of address book functionality in Mail Clients.
- Supporting of message routing for a messaging infrastructure.
- Support of an X.509 based Public Key Infrastructure.
Where either could be used: The Overlap
There are some problems, which could be solved by either relational
database or directory. The most important of these is provision of
corporate white and yellow pages type functionality. Although this
is seen as the key target for directory, building this type of service
onto a relational database gives the same benefits of data structuring
and user access can be via WWW.
Why the Overlap can be a problem
In some cases the overlap is not really a problem. An organization
can choose to solve a problem by use of directory or relational database
and it does not really matter which option they choose. For example,
if a small company has a need to store information about its customers,
either technology would work fine.
Problems with the overlap occur where an enterprise has requirements
to use both technologies. For example, white pages information
may need to be in the directory in order to support information lookup
from LDAP clients and in a relational database forming an integral
part of an Enterprise Resource Planning system. In particular, there
is a problem to ensure consistency of both systems in light of updates.
The rest of this white paper discusses how to deal with this.
Dealing with the Overlap
This section talks about dealing with the overlap. It focuses on the
white pages service problem, as this is the major function which needs
to be dealt with. The analysis would also be true for other functions
in the overlap.
The consideration here is of target solutions without legacy systems,
and does not consider issues of migration from legacy data, although
some of the techniques discussed here are relevant.
Database only
Use of database only is not a viable option, because of the requirement
for LDAP clients to access white pages data.
Directory only
Use of directory only may be a good option in many organizations,
as for most core uses, the directory provides good white pages functionality.
Where it will not suffice is if there is a desire to use a relational
database for broader management and analysis of corporate information.
Typically, there will be a corporate database strategy that reflects
this goal.
Duplication
A simple answer to dealing with the overlap is simply to run both
services independently and duplicate data and management. The major
problem with this is duplication of effort to correctly maintain the
data, and operational inconsistencies that will arise.
Integrated Service
A superficially attractive option is to produce one system, which
will do both. It is not possible to build a useful relational database
on top of a directory. Building a directory on top of a relational
database seems more promising. The following seems attractive:

The major problem with achieving this is that the directory and relational
information models are significantly different. In order to gain satisfactory
performance and meet directory functionality, the directory portion
needs to be structured to meet directory needs and typically in a manner
that can only be managed by the directory. This problem appears to
be inherent to a solution which is built on top of a relational database,
although it is possible that a database vendor might be able to make
a satisfactory hybrid solution. Isode views that the basic characteristics
of the systems are sufficiently different that there will not be a
viable hybrid solution.
Gatewaying (cross access)
Rather than duplicate information by synchronization, it is possible
to access them 'on demand'. For example, a relational database could
go to the directory to find a certificate rather than holding a copy
directly. This approach will optimize data storage. However, attributes,
which are only available in a remote system, will typically not be
available for searching (directory) or joins (relational database).
This approach may be useful for 'second order' attributes, on which
there is no need to search in the system on which they are not stored.
Synchronization
Another solution is to consider that data is mastered in one location,
and to synchronize between the two. An extreme case of synchronization
is to master all of the data in one service, and to synchronize all
of the data into the other service.
Isode views that synchronization will be the primary technique for
co-existence. In the short term, this will primarily be a means of
populating directory from relational databases.
Isode Products
Isode's products are for provision of a directory service. The key
elements are LDAP/X.500 Enterprise Directory Server, Enterprise Directory
Management and Web to LDAP/X.500 Access Server.
Isode provides a scripting toolkit for loading data into and dumping
data from the directory. This toolkit can be used to manage simple
synchronization configurations.
Isode recommends a partner product, the Maxware Directory Data Manager
(MDDM) for more complex synchronization. MDDM support ODBC and flat
file access, and provides graphical configuration to help control synchronization.
It supports incremental access to optimize transfer performance.
Conclusions
- The key strength of directory is distributed provision.
- The key strength of a relational database is the ability to make
complex queries about the relations between objects.
- Enterprises will mix and match directory and relational databases
for different problems.
- Enterprises will need directory for white pages and other functions.
- Where white pages information is needed in a relational database,
use of a directory synchronization product is the best approach.