OpenDJ : Visualizing the Replication Topology

My coworker Chris Ridd has spent a little bit of spare time writing a small utility that can parse the output of OpenDJ monitoring information to extract the details of the replication topology. Give the output to some graphical tool and here’s the result (based on one of our biggest customer -anonymized- data) :

ReplTopo

This is a worldwide deployment with many directory services in 4 regions and 8 replication services fully connected. Each directory service is connected to a single replication server, but can failover in matter of seconds, by priority in the same region.

If you want to give it a try on your own replication topology, it’s simple. The tool is open source and part of the OpenDJ utilities that Chris has pushed to GitHub. Just feed it with the output of ldapsearch on cn=monitor.

Disabling Replication in OpenDJ 2.4.

Enabling replication between multiple instances of the OpenDJ LDAP directory server is pretty simple and straightforward. You can check for yourself in the Replication chapter of the Administration Guide.

But fully disabling replication can be tricky with OpenDJ 2.4, mostly because of a known issue with the dsreplication disable –disableAll command : OPENDJ-249 : Doing dsreplication disable –disableAll is throwing a javax.naming.CommunicationException when removing contents of “cn=admin data”.

We are fixing this issue in OpenDJ 2.5, but for those who have deployed OpenDJ 2.4 and want to know how to fully remove all references to a replica in the topology, here are the steps to manually disable replication :

Note, all these steps should be done using ldapmodify, or an LDAP browser such as OpenDJ Control-Panel’s Manage Entry or Apache Directory Studio.

  1. For each replica to be disabled connect to it on the admin port (4444) and:
    1. MANDATORY: set the “ds-cfg-enabled” property to “false” in “cn=Multimaster Synchronization,cn=Synchronization Providers,cn=config”
    2. OPTIONAL: recursively remove the entries beneath “cn=Multimaster Synchronization,cn=Synchronization Providers,cn=config” using individual delete operations. Note that the configuration backend does not support the sub-tree delete control, so this has to be done iteratively. This step is also not mandatory, since replication was fully disabled in the previous step
    3. MANDATORY: remove each entry beneath “cn=Servers,cn=admin data” except the entry itself. I find the easiest way to do this is to perform a sub-tree delete and then add back the base entry
    4. OPTIONAL: remove (purge) unused instance keys from beneath “cn=instance keys,cn=admin data” *except* own key. This step is really independent of replication: administrators should periodically purge unused instance keys anyway when they are sure that they are no longer needed (e.g. used for signing backups, etc)
    5. MANDATORY: delete “uniqueMember” in “cn=all-servers,cn=Server Groups,cn=admin data”
  2. On one of the remaining enabled replicas, connect to it via the admin port and:
    1. MANDATORY: remove each disabled server beneath “cn=Servers,cn=admin data”
    2. OPTIONAL: remove (purge) each disabled instance key beneath “cn=Servers,cn=admin data” (see 1.4)
    3. MANDATORY: remove each disabled server from uniqueMember in “cn=all-servers,cn=Server Groups,cn=admin data”
    4. MANDATORY: get list of all remaining servers from “cn=all-servers,cn=Server Groups,cn=admin data”
  3. For each of the remaining enabled replicas obtained in step 2.4 connect to it via the admin port and:
    1. MANDATORY: remove each disabled server(rsPort) from ds-cfg-replication-server in “cn=replication server,cn=Multimaster Synchronization,cn=Synchronization Providers,cn=config”
    2. MANDATORY: remove each disabled server(rsPort) from ds-cfg-replication-server in “cn=*,cn=domains,cn=Multimaster Synchronization,cn=Synchronization Providers,cn=config”

OpenDJ: Quick Replication setup

OpenDJ Servers Replication

As we develop OpenDJ, we spend a lot of time testing, whether it’s a new feature or a correction to an existing one. We usually write some unit tests to validate the code and then some functional tests to check the feature from a “user” point of view. While the unit tests are typically run with a single server, the functional or integration tests are run with configurations that match our customers deployment. And one of the given fact for any directory service deployment with OpenDJ that I’m aware of, is that the service is made of two or more OpenDJ directory servers with Multi-Master Replication enabled between them.

Setting up Multi-Master Replication with OpenDJ is quite easy and I’m going to demonstrate it here:

Lets assume we want to install 2 OpenDJ servers on the following hosts : ldap1.example.com and ldap2.example.com. For simplicity and because for test we avoid running tests with root privileges, we will configure the server to use port 1389 and 1636 for LDAP and LDAPS respectively.

On ldap1.example.com

$ unzip OpenDJ-2.4.2.zip
$ cd OpenDJ-2.4.2
$ ./setup -i -n -b "dc=example,dc=com" -d 20 -h ldap1.example.com -p 1389 \
  --adminConnectorPort 4444 -D "cn=Directory Manager" -w "secret12" -q -Z 1636 \
  --generateSelfSignedCertificate

Do the same on ldap2.example.com, the parameters being the same except for the -h option that should be ldap2.example.com

Now, you have 2 instances of OpenDJ configured and running with 20 sample entries in the suffix “dc=example,dc=com”. Let’s enable replication:

$ bin/dsreplication enable --host1 ldap1.example.com --port1 4444\
  --bindDN1 "cn=directory manager" \
  --bindPassword1 secret12 --replicationPort1 8989 \
  --host2 ldap2.example.com --port2 4444 --bindDN2 "cn=directory manager" \
  --bindPassword2 secret12 --replicationPort2 8989 \
  --adminUID admin --adminPassword password --baseDN "dc=example,dc=com" -X -n

And now make sure they both have the same data:

$ bin/dsreplication initialize --baseDN "dc=example,dc=com" \
  --adminUID admin --adminPassword password \
  --hostSource ldap1.example.com --portSource 4444 \
  --hostDestination ldap2.example.com --portDestination 4444 -X -n

For my daily tests I’ve put the commands in a script that I can run and will deploy 2 servers, enable replication between them and initialize them, all on a single machine (using different ports for LDAP and LDAPS).

Now if you want to add a 3rd server in the replication topology, install and configure it like the first 2 ones. And join it to the replication topology by repeating the last 2 commands above, replacing ldap2.example.com with the hostname of the 3rd server. Need a 4th one ? Repeat again, keeping ldap1.example.com as the server of reference.

Configuring Replication Groups: A small but important new feature of OpenDS 2.0

Opends2.0

I’m mostly done with the series of post concerning the new features of the latest release of OpenDS, the opensource LDAPv3 directory service. Yesterday, Mathieu, the developer behind Assured Replication, reminded me of a small but important new feature of OpenDS, in the area of replication: the ability to configure Replication Groups.

A replication group is a simple way to relate replicated OpenDS directory servers together. It’s useful when there are more than 2 replicated servers, when the replicated servers are within different data-centers or to distinguished primary servers from secondary servers.

Replications groups are identified by a group ID which is a unique number assigned to a replication domain on a directory server and to replication servers. Group IDs determine how a directory server domain connects to an available replication server. From the list of configured replication servers, a directory server first tries to connect to a replication server that has the same group ID as that of the directory server. If no replication server with a compatible group ID is available, the directory server connects to a replication server with a different group ID.

In practice, it allows to prioritize how the replication traffic is done between the servers. In the case of multiple data-centers, it’s preferable that all directory servers in a data-center are connected to replication servers in the same data-center. Only in the case of absence of a local replication server, a directory server will connect to a remote replication server.



Note that when configuring replication with OpenDS 2.0 and the
dsreplication utility, both the replication server and the directory server are configured in the same process and thus the same host. It will be very rare if the replication server is not working for its directory server.



The figure below is an illustration of 2 Replication Groups, one for each data center.

OpenDS 2.0 Replication Groups with multiple data-centers

Now to configure a replication group ?

A replication group is configured on each directory server and replication server that should be part of the same group.

On the directory server, the replication group is configured per replication domain (i.e. per replicated suffix).

First identify the replication domain

$ bin/dsconfig -D “cn=directory manager” -j /tmp/passwdfile -n -s list-replication-domains –provider-name “Multimaster Synchronization”

cn=admin data (domain 29167)

cn=schema (domain 9674)

dc=example,dc=com (domain 14741)

Then set the group ID for the domain

$ bin/dsconfig -D “cn=directory manager” -j /tmp/passwdfile -n set-replication-domain-prop –provider-name “Multimaster Synchronization” –domain-name “dc=example,dc=com (domain 14741)” –advanced –set group-id:5

For the replication server

$ bin/dsconfig -D “cn=directory manager” -j /tmp/passwdfile -n set-replication-server-prop –provider-name “Multimaster Synchronization” –advanced –set group-id:5

Repeat this to the other directory servers and replication servers that should be part of the same group.

Note that there is a group by default with the group ID 1.

Configuring replication groups have some impact when using Assured Replication, since Assured Replication only works within a single group. So groups can be used to limit the impact of network latency when using Assured Replication, or to constrain the changes to be more consistent in a single data-center.

You can find more information about replication groups in the Replication Architecture reference manual and in the Replication section of the Administration Guide.

Technorati Tags: , , , , ,

Assured Replication: A New Feature of OpenDS 2.0

OpenDS 2.0 has just been released and there are several new and exciting features in it.

To me, the biggest innovation in this release is "Assured Replication", an extension to the loose consistency multi-master replication feature that brings tighter consistency of data between replica. "Assured Replication" is not to be taken for a full synchronous and transactional replication mechanism. A change is not transactionally applied to a set of or all replicas of a topology. With "Assured Replication", the response to an LDAP modification is delayed until the change has been received or applied by other servers, in a best effort mode. It provides a greater assurance that a change is not lost even if the server receiving it crashes.

Opends Assured Replication with Safe Data level 2

Assured Replication can function in 2 modes :

  • Safe Data Mode: an update must be propagated to a defined number of Replication Servers before returning a response to the client. So if the server or the replication server is stopped, the data is still available to all other replicas.
  • Safe Read Mode: an update must be propagated to all directory servers in the domain before the client is returned a response for the update.

Of course, for both modes, it’s possible to configure a timeout interval to prevent LDAP clients to be waiting indefinitely if some servers are not available.

Configuring Assured Replication is pretty straightforward but cannot be done when setting up replication itself. So the first step is to configure Multi-Master Replication for a domain with dsreplication.

$ bin/dsreplication enable –host1 localhost –port1 5444 –bindDN1 ‘cn=directory manager’ –bindPassword1 secret12 –replicationPort1 8989 –host2 localhost –port2 6444 –bindDN2 ‘cn=directory manager’ –bindPassword2 secret12 –replicationPort2 8990 –adminUID admin –adminPassword secret12 –baseDN "dc=example,dc=com" -X -n

Establishing connections ….. Done.

Checking Registration information ….. Done.

Configuring Replication port on server localhost:5444 ….. Done.

Configuring Replication port on server localhost:6444 ….. Done.

Updating replication configuration for baseDN dc=example,dc=com on server localhost:5444 ….. Done.

Updating replication configuration for baseDN dc=example,dc=com on server localhost:6444 ….. Done.

Updating Registration configuration on server localhost:5444 ….. Done.

Updating Registration configuration on server localhost:6444 ….. Done.

Updating replication configuration for baseDN cn=schema on server localhost:5444 ….. Done.

Updating replication configuration for baseDN cn=schema on server localhost:6444 ….. Done.

Initializing Registration information on server localhost:6444 with the contents of server localhost:5444 ….. Done.

Initializing schema on server localhost:6444 with the contents of server localhost:5444 ….. Done.

Replication has been successfully enabled. Note that for replication to work you must initialize the contents of the base DN’s that are being replicated (use dsreplication initialize to do so).

$ bin/dsreplication initialize –baseDN "dc=example,dc=com" –adminUID admin –adminPassword secret12 –hostSource localhost –portSource 5444 –hostDestination localhost –portDestination 6444 -X -n

Initializing base DN dc=example,dc=com with the contents from localhost:5444:

23 entries processed (100 % complete).

Base DN initialized successfully.

See

/var/folders/SH/SHFsRjymHtqiZ4GxPNZERU++Fwk/-Tmp-/opends-replication-737929812662715818.log

for a detailed log of this operation.

$ bin/dsreplication status -h localhost -p 5444 –adminUID admin –adminPassword secret12 -X

dc=example,dc=com – Replication Enabled

=======================================

Server : Entries : M.C. (1) : A.O.M.C. (2) : Port (3) : Security (4)

—————:———:———-:————–:———-:————-

localhost:5444 : 23 : 0 : N/A : 8989 : Disabled

localhost:6444 : 23 : 0 : N/A : 8990 : Disabled

Now that replication is setup, we can enable the Assured Replication mode, using the dsconfig utility. For this, on each of the OpenDS direcotry servers, we first need to retrieve the full name of the replication domain.

$ bin/dsconfig -D cn=directory\ manager -w secret12 -n -s list-replication-domains –provider-name "Multimaster Synchronization"

cn=admin data (domain 29167)
cn=schema (domain 9674)
dc=example,dc=com (domain 14741)

$ bin/dsconfig -D cn=directory\ manager -w secret12 -n set-replication-domain-prop –provider-name "Multimaster Synchronization" –domain-name "dc=example,dc=com (domain 14741)" –advanced –set assured-type:safe-data –set assured-sd-level:2

Note that the Replication Domain has a different value on each server, so you have to repeat these 2 commands on each instance.

Setting the assured level for Safe Data to 2 means that the server will make sure the data has been received by at least 2 replication services before returning to the LDAP client the response to the update request.

From a client point of view, there should be no difference, except that the server might take a little longer to return the response to an update request. In our measures, we found that the response time increased by 25% for Safe Data Level 2, which seems a lot, but honestly, when the response time is in the order of 2ms, it’s hard to notice !

You can find more information about Assured Replication on OpenDS 2.0 documentation wiki, both in the overview of OpenDS Replication Architecture and the Replication Administration Guide, and more specifically Assured Replication Administration Guide

Technorati Tags: , , , , ,