More about OpenDJ support for JSON attribute values

In a previous post, I introduced the new JSON syntax, JSON query and matching rules that are delivered as part of the OpenDJ LDAP directory server. Today, I will give more insights on how to customise the syntax, tune the matching rules for smarter and more efficient indexing, and I will highlight some best practices with using the JSON syntax.

JSON Syntax Validation

When defining an attribute with a JSON syntax, the server will validate that the JSON value is compliant with JSON RFC.  OpenDJ offers a few options to relax some of the constraints of a valid JSON. To change the settings of the syntax, you must use dsconfig --advanced.

>>>> Configure the properties of the Core Schema

Property Value(s)
 ----------------------------------------------------------------------
 1) allow-attribute-types-with-no-sup-or-syntax true
 2) allow-zero-length-values-directory-string false
 3) disabled-matching-rule NONE
 4) disabled-syntax NONE
 5) enabled true
 6) java-class org.opends.server.schema.CoreSchemaProvider
 7) json-validation-policy strict
 8) strict-format-certificates true
 9) strict-format-country-string true
 10) strict-format-jpeg-photos false
 11) strict-format-telephone-numbers false
 12) strip-syntax-min-upper-bound-attribute-type-description false

?) help
 f) finish - apply any changes to the Core Schema
 c) cancel
 q) quit

Enter choice [f]: 7


>>>> Configuring the "json-validation-policy" property

Specifies the policy that will be used when validating JSON syntax values.

Do you want to modify the "json-validation-policy" property?

1) Keep the default value: strict
 2) Change it to the value: disabled
 3) Change it to the value: lenient

?) help
 q) quit

Enter choice [1]:

Strict is the default mode.

Disabled means that the server will not try to validate the content of a JSON value.

Lenient means that it will validate the JSON value, but tolerate comments, single quotes and unquoted control characters.

JSON Matching Rule and Indexing

Like any attribute in the OpenDJ server, attributes with a JSON syntax can be indexed.

$ dsconfig -h localhost -p 4444 \
  -D "cn=Directory Manager" -w secret12 -X -n \
 set-backend-index-prop \--backend-name userRoot \
 --index-name json --set index-type:equality

By default, the server actually indexes each field of all JSON values. If the values are large and complex, indexing will  result in many disk I/O, possibly impacting performances for write operations.

If you know which fields of the JSON values will be queried for by the client applications, you can optimise the index and specify the JSON fields that are indexed. This is by creating a new custom schema provider for the JSON query. You can choose to overwrite the default JSON query matching rules (as illustrated below), and this will affect all JSON attributes, or you can choose to create a new rule (with a new name and OID).

In the example below, the custom schema provider overwrites the default caseIgnoreJsonQueryMatch, and only indexes the JSON fields _id and name with its subfields.

$ dsconfig -h localhost -p 4444 \
  -D "cn=Directory Manager" -w secret12 -X -n \
 create-schema-provider --provider-name "Json Schema" \
 --type json-schema --set enabled:true \
 --set case-sensitive-strings:false \
 --set ignore-white-space:true \
 --set matching-rule-name:caseIgnoreJsonQueryMatch \
 --set matching-rule-oid:1.3.6.1.4.1.36733.2.1.4.1 \
 --set indexed-field:_id \
 --set "indexed-field:name/**" 

When you overwrite the default matching rule, or you define a new one, you need to rebuild the indexes for all attributes that are making use of it.

Best Practices

The support for JSON attributes in OpenDJ is very new, but yet, we can recommend how to best use them.

The first thing, is to use the JSON syntax for attributes that are single valued. Indexing is designed to associate values with entries. Because JSON query indexes are built for all fields of the JSON objects, an entry will be returned if a query matches all fields, even though they are in different objects.

The JSON syntax is handy to store complex JSON objects in a single attribute and query them, through any field. However, the larger the values, the  more impact on the directory server’s performances. As, by default, all JSON fields are indexed, the more fields, the more expensive will be indexing. Also, because the JSON objects are LDAP attributes, the only way to change a value is to replace the value with a new one (or delete the value and add a new one, which are operations with even more bytes). There is no patch operation on the value. Finally, OpenDJ stores all attributes of an entry in a single database record. So any change in the entry itself will require to write the whole entry again.

As we’ve seen above, OpenDJ proposes a way to customise the JSON queries and the JSON fields that are indexed. We suggest that you make use of this capability and optimise the indexing of JSON objects for the queries run by the client applications.

If you plan to store different kinds of JSON objects in an OpenDJ directory service, define different attributes with the JSON syntax, and use a custom JSON query per attribute. For example, lets assume you will have entries that are persons with an address attribute with a JSON syntax, and some other entries that represent OAuth2 tokens, and the token main attribute has a JSON syntax. You should define an address attribute and a token attribute, both with the JSON syntax, but their specific matching rules, like below.

attributeTypes: ( 1.3.6.1.4.1.36733.2.1.1.999 NAME 'address'
  SYNTAX 1.3.6.1.4.1.36733.2.1.3.1
  EQUALITY caseIgnoreJsonAddressQueryMatch SINGLE-VALUE )

attributeTypes: ( 1.3.6.1.4.1.36733.2.1.1.999 NAME 'token'
  SYNTAX 1.3.6.1.4.1.36733.2.1.3.1 
  EQUALITY caseIgnoreJsonTokenQueryMatch SINGLE-VALUE )

where the matching rules are defined as such:

$ dsconfig -h localhost -p 4444 \
  -D "cn=Directory Manager" -w secret12 -X -n \
 create-schema-provider \
 --provider-name "Address Json Schema" \
 --type json-schema --set enabled:true \
 --set case-sensitive-strings:false \
 --set ignore-white-space:true \
 --set matching-rule-name:caseIgnoreJsonAddressQueryMatch \
 --set matching-rule-oid:1.3.6.1.4.1.36733.2.1.4.998

and

$ dsconfig -h localhost -p 4444 \
  -D "cn=Directory Manager" -w secret12 -X -n \
 create-schema-provider \
 --provider-name "Token Json Schema" \
 --type json-schema --set enabled:true \
 --set case-sensitive-strings:false \
 --set ignore-white-space:true \
 --set matching-rule-name:caseIgnoreJsonTokenQueryMatch \
 --set matching-rule-oid:1.3.6.1.4.1.36733.2.1.4.999 \
 --set indexed-field:token_type \
 --set indexed-field:expires_at \
 --set indexed-field:access_token

Note that there is an issue with OpenDJ 4.0.0-SNAPSHOTS (nightly builds) and when you define a new Schema Provider, you need to restart the server to have it be effective.

OpenDJ: Monitoring Unindexed Searches…

FR_plogo_org_FC_openDJ-300x86OpenDJ, the open source LDAP directory services, makes use of indexes to optimise search queries. When a search query doesn’t match any index, the server will cursor through the whole database to return the entries, if any, that match the search filter. These unindexed queries can require a lot of resources : I/Os, CPU… In order to reduce the resource consumption, OpenDJ rejects unindexed queries by default, except for the Root DNs (i.e. for cn=Directory Manager).

In previous articles, I’ve talked about privileges for administratives accounts, and also about Analyzing Search Filters and Indexes.

Today, I’m going to show you how to monitor for unindexed searches by keeping a dedicated log file, using the traditional access logger and filtering criteria.

First, we’re going to create a new access logger, named “Searches” that will write its messages under “logs/search”.

dsconfig -D cn=directory\ manager -w secret12 -h localhost -p 4444 -n -X \
    create-log-publisher \
    --set enabled:true \
    --set log-file:logs/search \
    --set filtering-policy:inclusive \
    --set log-format:combined \
    --type file-based-access \
    --publisher-name Searches

Then we’re defining a Filtering Criteria, that will restrict what is being logged in that file: Let’s log only “search” operations, that are marked as “unindexed” and take more than “5000” milliseconds.

dsconfig -D cn=directory\ manager -w secret12 -h localhost -p 4444 -n -X \
    create-access-log-filtering-criteria \
    --publisher-name Searches \
    --set log-record-type:search \
    --set search-response-is-indexed:false \
    --set response-etime-greater-than:5000 \
    --type generic \
    --criteria-name Expensive\ Searches

Voila! Now, whenever a search request is unindexed and take more than 5 seconds, the server will log the request to logs/search (in a single line) as below :

$ tail logs/search
[12/Sep/2016:14:25:31 +0200] SEARCH conn=10 op=1 msgID=2 base="dc=example,
dc=com" scope=sub filter="(objectclass=*)" attrs="+,*" result=0 nentries=
10003 unindexed etime=6542

This file can be monitored and used to trigger alerts to administrators, or simply used to collect and analyse the filters that result into unindexed requests, in order to better tune the OpenDJ indexes.

Note that sometimes, it is a good option to leave some requests unindexed (the cost of indexing them outweighs the benefits of the index). If these requests are unfrequent, run by specific administrators for reporting reasons, and if the results are expecting to contain a lot of entries. If so, a best practice is to have a dedicated replica for administration and run these expensive requests. Also, it is better if the client applications are tuned to expect these requests to take a long time.

Learning Curve

A few years ago I had the pleasure to work with Rajesh Rajasekharan at Sun. He was an efficient trainer on Sun products and especially on Sun Directory Server. He recently joined ForgeRock and has started a series of blog posts and screen-casts on ForgeRock products and especially OpenDJ, but not only !

If you are getting started with the products or want to see demos of them, there’s no better place than to be on the “Learning Curve

 

About auditing LDAP operations…

OpenDJ LogoMany years ago, when I’ve started working on LDAP directory services, we needed to have some auditing of the operations occurring on the server. So, the server had a “Access” log which contained a message when an operation was received, and one when it was returned to the client, which included the processing time on the server side (the etime parameter). On Netscape and Sun directory servers, the etime was measured in seconds. This format allowed us to detect requests that were taking a long time, or were started but not finished.

In OpenDJ, we switched the etime resolution to milliseconds, but there’s an option to set it to nano-seconds. Yet, with millisecond resolution, there are still a number of log entries with an etime value of 0. The truth is that the server is faster, but so are the machines and processors.

At a rate of 50 000 operations per seconds (which can easily be sustained on my laptop), having two messages per operation does generate a lot of data to write to disk. That’s why we have introduced a new audit log format, not well advertised I must say, in OpenDJ 2.6.0. To enable the new format, use the following dsconfig command:

dsconfig set-log-publisher-prop -h localhost -p 4444 -X -n \
 -D "cn=directory manager" -w password \
 --publisher-name File-Based\ Access\ Logger  --set log-format:combined

And now instead of having 2 lines per operations, there is a single one.

Before:

[23/Feb/2015:08:56:31 +0100] SEARCH REQ conn=0 op=4 msgID=5 base="cn=File-Based Access Logger,cn=Loggers,cn=config" scope=baseObject filter="(objectClass=*)" attrs="1.1"
[23/Feb/2015:08:56:31 +0100] SEARCH RES conn=0 op=4 msgID=5 result=0 nentries=1 etime=0
[23/Feb/2015:08:56:31 +0100] SEARCH REQ conn=0 op=5 msgID=6 base="cn=File-Based Access Logger,cn=Loggers,cn=config" scope=baseObject filter="(objectClass=*)" attrs="objectclass"
[23/Feb/2015:08:56:31 +0100] SEARCH RES conn=0 op=5 msgID=6 result=0 nentries=1 etime=0

After, in combined mode:

[23/Feb/2015:13:00:28 +0100] SEARCH conn=48 op=8215 msgID=8216 base="dc=example,dc=com" scope=wholeSubtree filter="(uid=user.1)" attrs="ALL" result=0 nentries=1 etime=0
[23/Feb/2015:13:00:28 +0100] SEARCH conn=60 op=10096 msgID=10097 base="dc=example,dc=com" scope=wholeSubtree filter="(uid=user.6)" attrs="ALL" result=0 nentries=1 etime=0

The benefits of enabling the combined log format are multiple. Less data is written to disk for each operation, less I/O operations are involved, resulting in overall better throughput for the server. And it allows to keep more history of operations with the same volume of log files.

Do you think that OpenDJ 3.0 access log files should use the combined format by default ?

API Protection with OpenIG: Controlling access by methods

OpenIGUsually, one of the first thing you want to do when securing APIs is to only allow specifics calls to them. For example, you want to make sure that you can only read to specific URLs, or can call PUT but not POST to other ones.
OpenIG, the Open Identity Gateway, has a everything you need to do this by default using a DispatchHandler, in which you express the methods that you want to allow as a condition.
The configuration for the coming OpenIG 3.1 version, would look like this:

 {
     "name": "MethodFilterHandler",
     "type": "DispatchHandler",
     "config": {
         "bindings": [
         {
             "handler": "ClientHandler",
             "condition": "${exchange.request.method == 'GET' or exchange.request.method == 'HEAD'}",
             "baseURI": "http://www.example.com:8089"
         },
         {
             "handler": {
                 "type": "StaticResponseHandler",
                 "config": {
                     "status": 405,
                     "reason": "Method is not allowed",
                     "headers": {
                         "Allow": [ "GET", "HEAD" ]
                     }
                 }
             }
         }]
     }
 }

This is pretty straightforward, but if you want to allow another method, you need to update the both the condition and the rejection headers. And when you have multiple APIs with different methods that you want to allow or deny, you need to repeat this block of configuration or make a much complex condition expression.

But there is a simpler way, leveraging the scripting capabilities of OpenIG.
Create a file under your .openig/scripts/groovy named MethodFilter.groovy with the following content:

/**
 * The contents of this file are subject to the terms of the Common Development and
 * Distribution License 1.0 (the License). You may not use this file except in compliance with the
 * License.
 * Copyright 2014 ForgeRock AS.
 * Author: Ludovic Poitou
 */
import org.forgerock.openig.http.Response

/*
 * Filters requests that have the allowedmethods supplied using a
 * configuration like the following:
 *
 * {
 *     "name": "MethodFilter",
 *     "type": "ScriptableFilter",
 *     "config": {
 *         "type": "application/x-groovy",
 *         "file": "MethodFilter.groovy",
 *         "args": {
 *             "allowedmethods": [ "GET", "HEAD" ]
 *         }
 *     }
 * }
 */

if (allowedmethods.contains(exchange.request.method)) {
    // Call the next handler. This returns when the request has been handled.
    next.handle(exchange)
} else {
    exchange.response = new Response()
    exchange.response.status = 405
    exchange.response.reason = "Method not allowed: (" + exchange.request.method +")"
    exchange.response.headers.addAll("Allow", allowedmethods)
}

And now in all the places where you need to filter specific methods for an API, just add a filter to the Chain as below:

{
    "heap": [
        {
            "name": "MethodFilterHandler",
            "type": "Chain",
            "config": {
                "filters": [
                    {
                        "type": "ScriptableFilter",
                        "config": {
                            "type": "application/x-groovy",
                            "file": "MethodFilter.groovy",
                            "args": {
                                "allowedmethods": [ "GET", "HEAD" ]
                            }
                        }
                    }
                ],
                "handler": "ClientHandler"
            }
        }
    ],
    "handler": "MethodFilterHandler",
    "baseURI": "http://www.example.com:8089"
}

This solution allows to filter different methods for different APIs with a simple configuration element, the “allowedmethods” field, for greater reusability.

About LDAP Syntaxes and backward compatibility…

In the LDAP information model, a syntax constrains the structure and format of attribute values. OpenDJ defines and implements a large number of syntaxes (you can discover them by reading the ldapSyntaxes attribute from the cn=Schema entry).

But infrequently, we receive enquiries on an obscure and non standard syntax, often in the form of “I’m having an error importing schema from this or that legacy directory server”, with an error message that ends with “No such syntax is configured for use in the Directory Server”.

As syntaxes are constraining the structure and format of attribute values, they are implemented as code, specifically Java code in OpenDJ. It’s possible to implement new syntaxes by implementing the org.opends.server.api.AttributeSyntax abstract class, and installing the classes or the JAR in OpenDJ classpath. But often, it’s easier and more convenient to define a syntax by configuration, and OpenDJ offers 3 possibilities to define new syntaxes. In term of backward compatibility, I will only focus on the 2 main ones, by substitution and by pattern (the 3rd one allows to define enumeration of values).

With OpenDJ, you can define a new syntax by configuration and delegating the contraints to an already implemented syntax. A simple example is the URI syntax (which was defined is some very old schema with the OID  1.3.6.1.4.1.4401.1.1.1). A URI is really an ASCII string, and it might be sufficient to accept attributes with URI syntax to verify that all characters are pure ASCII. The standard syntax for ASCII strings is IA5String aka 1.3.6.1.4.1.1466.115.121.1.15.

ldapSyntaxes: ( 1.3.6.1.4.1.4401.1.1.1 DESC ‘URI’ X-SUBST ‘1.3.6.1.4.1.1466.115.121.1.15’ )

Insert the above line in the schema LDIF file before the attributeTypes, and you’re done.

The other option is to define the syntax as a pattern, using regular expressions. This could be better when willing to enforce additional constraints on an URI, for example, verifying that the URI is an LDAP one.

ldapSyntaxes: ( 999.999.999.1 DESC 'LDAP URI Syntax' X-PATTERN '^ldap://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]' )

So the next time you are trying to import some legacy schema to the OpenDJ directory server, and you have an error due to missing syntaxes, you know what to do to quickly resolve the problem.

Tips: Do not index virtual attributes in OpenDJ

OpenDJ-300x100OpenDJ, the open source LDAP directory service in Java, offer some interesting services to reduce and optimize the size and usage of data.

One of them is the Virtual Attribute feature, which allow certain attributes and values to be computed as needed, either based on some of the server internals or other attributes. OpenDJ ships with a number of virtual attributes by default : entryDN, entryUUID, etag, gouverningStructureRule, hasSubordinate, isMemberOf, numSubordinate, password Expiration Time (ds-pwp-password-expiration-time), structuralObjectClass, subSchemaSubEntry, …

Since these attributes are virtual and thus not stored as part of the entries in the database backend, you must not define any index for them. When possible, the virtual attribute provider will make use of default system index (like entryDN uses the DN index), but most of the time, these attributes are for reading and consuming.

If you do configure an index for one of the virtual attribute, the server will repeatedly report that the index is degraded with an error message similar to the following :

[21/Jan/2013:09:16:07 +0000] category=JEB severity=NOTICE msgID=8847510 msg=Due to changes in the configuration, index dc_example_dc_com_entryDN is currently operating in a degraded state and must be rebuilt before it can be used

And then some seaches may fail to return entries. So you must delete this index to let the server behave properly.

Tips: resource limits in OpenDJ

Photo by Scallop Holden http://www.flickr.com/photos/scallop_holden/
Photo by Scallop Holden http://www.flickr.com/photos/scallop_holden/

OpenDJ, the open source LDAP directory services in Java, defines a few global resource limits to prevent client connections or operations from abusing the server’s resources. These limits are

  • the maximum number of entries returned to a search request (size-limit, default is 1000),
  • the maximum amount of time to spend returning results to a client (time-limit, default is 60 seconds),
  • the maximum number of entries to look through while processing a search request (lookthrough-limit, default is 5000),
  • the maximum amount of time a connection can sit idle before the server disconnect it (idle-time-limit, default is unlimited).

There are default values for all of these limits in the Global configuration, but they can also be set on a per user basis. The global limits are read or set using dsconfig :

$ bin/dsconfig get-global-configuration-prop -p 4444 -X -n -h localhost \
 -D cn=directory\ manager -w secret12
Property : Value(s)
--------------------------------------:------------------------
bind-with-dn-requires-password : true
default-password-policy : Default Password Policy
disabled-privilege : -
entry-cache-preload : false
etime-resolution : milliseconds
idle-time-limit : 0
lookthrough-limit : 5000
max-allowed-client-connections : 0
max-psearches : unlimited
proxied-authorization-identity-mapper : Exact Match
reject-unauthenticated-requests : false
return-bind-error-messages : false
save-config-on-successful-startup : true
size-limit : 1000
smtp-server : -
time-limit : 60 s
writability-mode : enabled

The per user limits have a different LDAP attribute name and can be found or set directly in users’ entry, or through Collective Attributes. The Directory Manager entry has such specific limits set, so that everything is unlimited.

$ bin/ldapsearch -D "cn=directory manager" -w secret12 -p 1389 -X -b "cn=config" \
  '(objectClass=inetOrgPerson)' ds-rlim-time-limit ds-rlim-size-limit \
  ds-rlim-lookthrough-limit ds-rlim-idle-time-limit
dn: cn=Directory Manager,cn=Root DNs,cn=config
ds-rlim-lookthrough-limit: 0
ds-rlim-time-limit: 0
ds-rlim-idle-time-limit: 0
ds-rlim-size-limit: 0

If you decide to change the default global settings, for example the idle-time-limit, to force idle connections to be closed by the server after some time (often a smaller time than the settings of the load-balancer in between your applications and the OpenDJ servers), please remember that you might also want to change the limit for “cn=Directory Manager”, especially if your client applications are connecting with Directory Manager credentials.

Cache strategy for OpenDJ LDAP directory server

System administrators that are familiar with legacy LDAP directory servers know that one of the key for the best performance is caching the data. With Sun Directory Server or OpenLDAP, there are 3 levels of caching that could be done : the filesystem level, the database level and the entries level. The filesystem level cache is managed by the OS and cannot be controlled by the application. Using the filesystem cache is good when the directory server is the only process on the machine, and/or for initial performance. The database level cache allows faster read or write operations, and also includes the indexes. The later cache is the higher level cache, and usually the one that provides the best performances as it requires the least processing from the server to return entries to the applications, and it has the least contention.

OpenDJ has a different design for its database and core server, and thus the caching strategy needs to be different.

By default, OpenDJ does have a database cache enabled, and 3 different kind of entry caches, all disabled. The reason for the 3 entry caches is that they are implementing for different needs and access patterns. But all have in common a specific filter to select which entries to cache, and some settings as to how much memory to use. During our stress and performance tests, we noticed that using an entry cache for all accessed entries added a lot of pressure on the garbage collector, and also caused more garbage collection from the old generation, often leading to either fragmentation of the memory, or more frequent full GC (also known as “Stop the world GC”). This resulted in an overall lower consistent average response time and throughput.

So, we recommend that you favor the database cache, and do not setup an entry cache, except for specific needs (and do not try to activate all 3 entry caches, this may lead to some really strange behavior).

The default settings with OpenDJ 2.4 is that 10 % of the JVM heap space will be used for the database cache. With OpenDJ 2.5 (soon to be released), we have bumped the default to 50% of the heap space. If you’re tuning the heap size and make it larger than 2GB, we recommend that you keep that 50% ratio or even increase it if the heap size exceeds the 3GB.

If you do have a few very specific entries that are very often accessed, like large static groups that are constantly used for ACI or group membership by application, then the entry cache becomes handy, and then you want to set a filter so only these specific entries are cached.

For example, if you want to cache at most 5 entries, that are groupOfNames, you can use the following dsconfig command:

bin/dsconfig set-entry-cache-prop --cache-name FIFO
 --set include-filter:\(objectclass=GroupOfNames\)
 --set max-entries:5 --set max-memory-percent:90 --set enabled:true
 -h localhost -p 4444 -D "cn=Directory Manager" -w secret12 -X -n

Otherwise, you’d better of running with no entry cache. OpenDJ read performance are such that the directory server can respond to tens of thousands if not hundred of thousands searches per second with average response time in the order of a milli-second. This should be good enough for most applications !

OpenDJ: Extensible indexes for Internationalization.

While taming the subject of indexes, we recently had some discussion with one of our users who complained about long response times with some language specific search filter such as (cn:fr.6:=*John*).

These extended filters rely on I18N Collation matching rules and indexes that I’ve described in an old post for OpenDS.

It turned out that he had defined the matching rule for the index, and rebuilt it, but had missed an important part: the index-type did not include “extensible”.

The proper command to create an extensible index is the following :

dsconfig set-local-db-index-prop --backend-name userRoot --index-name cn \
 --set index-extensible-matching-rule:fr.6 \
 --add index-type:extensible \
 --hostname localhost --port 4444 \
 --bindDN cn=Directory\ Manager --bindPassword ****** \
 -X -n

fr.6 is the shortcut for the French substring collation matching rule which full OID is 1.3.6.1.4.1.42.2.27.9.4.76.1

Note that if you don’t specify the extensible index-type, the server will not build the index for the extensible matching rule. The use of the index-type is consistent with the other types of index, equality or else, and allows you to disable and re-enable extensible indexes without having to re-enter all  OIDs.

OpenDJ Tips: More on troubleshooting indexes and search performances

In a previous post I talked about analyzing search filters and indexes. Matt added in a comment that OpenDJ has another mean of understanding how indexes are used in a search. Here’s a detailed post.

The OpenDJ LDAP directory server supports a “magic” operational attribute that allows an administrator to get from the server information about the processing of indexes for a specific search query: debugsearchindex.

If the attribute is set in the requested attributes in a search operation, the server will not return all entries as expected, but a single result entry with a fixed distinguished name and a single valued attribute debugsearchindex that contains the information related to the index processing, including the number of candidate entries per filter component, the overall number of candidate, and whether any or all of the search is indexed.

$ bin/ldapsearch -h localhost -p 1389 -D "cn=Directory Manager" -b "dc=example,dc=com" "(&(mail=user.*)(cn=*Denice*))" debugsearchindex
Password for user 'cn=Directory Manager': *******
dn: cn=debugsearch
debugsearchindex: filter=(&(mail=user.*)[INDEX:mail.substring][COUNT:2000](cn=*Denice*)[INDEX:cn.substring][COUNT:1])[COUNT:1] final=[COUNT:1]

$ bin/ldapsearch -h localhost -p 1389 -D "cn=Directory Manager" -b "dc=example,dc=com" "objectclass=*" debugsearchindex
Password for user 'cn=Directory Manager': *********
dn: cn=debugsearchdebugsearchindex: filter=(objectClass=*)[NOT-INDEXED] scope=wholeSubtree[COUNT:2007] final=[COUNT:2007]

$ bin/ldapsearch -h localhost -p 1389 -D "cn=Directory Manager" -b "dc=example,dc=com" "mail=user.1*" debugsearchindex
Password for user 'cn=Directory Manager': *********
dn: cn=debugsearch
debugsearchindex: filter=(mail=user.1*)[INDEX:mail.substring][COUNT:1111] scope=wholeSubtree[COUNT:2007] final=[COUNT:1111]

Note that sometimes, OpenDJ tries to optimize the query and use some other index than the regular one for the query. For example, it might use the equality index for an initial substring filter. The index used during the search does appear in the debugsearchindex attribute. Also, once the result set has been narrowed down to very few entries, it will stop using index and evaluate directly the entry set, as for the example below:

$ bin/ldapsearch -h localhost -p 1389 -D "cn=Directory Manager" -b "dc=example,dc=com" "(&(cn=Denice*)(mail=user.9*))" debugsearchindex
Password for user 'cn=Directory Manager':
dn: cn=debugsearch
debugsearchindex: filter=(&(cn=Denice*)[INDEX:cn.equality][COUNT:1])[COUNT:1] final=[COUNT:1]

Disabling Replication in OpenDJ 2.4.

Enabling replication between multiple instances of the OpenDJ LDAP directory server is pretty simple and straightforward. You can check for yourself in the Replication chapter of the Administration Guide.

But fully disabling replication can be tricky with OpenDJ 2.4, mostly because of a known issue with the dsreplication disable –disableAll command : OPENDJ-249 : Doing dsreplication disable –disableAll is throwing a javax.naming.CommunicationException when removing contents of “cn=admin data”.

We are fixing this issue in OpenDJ 2.5, but for those who have deployed OpenDJ 2.4 and want to know how to fully remove all references to a replica in the topology, here are the steps to manually disable replication :

Note, all these steps should be done using ldapmodify, or an LDAP browser such as OpenDJ Control-Panel’s Manage Entry or Apache Directory Studio.

  1. For each replica to be disabled connect to it on the admin port (4444) and:
    1. MANDATORY: set the “ds-cfg-enabled” property to “false” in “cn=Multimaster Synchronization,cn=Synchronization Providers,cn=config”
    2. OPTIONAL: recursively remove the entries beneath “cn=Multimaster Synchronization,cn=Synchronization Providers,cn=config” using individual delete operations. Note that the configuration backend does not support the sub-tree delete control, so this has to be done iteratively. This step is also not mandatory, since replication was fully disabled in the previous step
    3. MANDATORY: remove each entry beneath “cn=Servers,cn=admin data” except the entry itself. I find the easiest way to do this is to perform a sub-tree delete and then add back the base entry
    4. OPTIONAL: remove (purge) unused instance keys from beneath “cn=instance keys,cn=admin data” *except* own key. This step is really independent of replication: administrators should periodically purge unused instance keys anyway when they are sure that they are no longer needed (e.g. used for signing backups, etc)
    5. MANDATORY: delete “uniqueMember” in “cn=all-servers,cn=Server Groups,cn=admin data”
  2. On one of the remaining enabled replicas, connect to it via the admin port and:
    1. MANDATORY: remove each disabled server beneath “cn=Servers,cn=admin data”
    2. OPTIONAL: remove (purge) each disabled instance key beneath “cn=Servers,cn=admin data” (see 1.4)
    3. MANDATORY: remove each disabled server from uniqueMember in “cn=all-servers,cn=Server Groups,cn=admin data”
    4. MANDATORY: get list of all remaining servers from “cn=all-servers,cn=Server Groups,cn=admin data”
  3. For each of the remaining enabled replicas obtained in step 2.4 connect to it via the admin port and:
    1. MANDATORY: remove each disabled server(rsPort) from ds-cfg-replication-server in “cn=replication server,cn=Multimaster Synchronization,cn=Synchronization Providers,cn=config”
    2. MANDATORY: remove each disabled server(rsPort) from ds-cfg-replication-server in “cn=*,cn=domains,cn=Multimaster Synchronization,cn=Synchronization Providers,cn=config”

An important tuning flag for OpenDJ with 64bit JVM…

If you’re running OpenDJ with a 64bit JVM with less than 32GB of heap size, be aware of the need to explicitly set the -XX:+UseCompressedOops option (unless you want to disable it).

Compressed oops is supported and enabled by default in Java SE 6u23 and later, when running a 64bit JBM with a value of -Xmx lower than 32GB. You can find more information about Compressed Oops in Java technical notes here: http://download.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html

However, OpenDJ internal database, in order to estimate properly the occupation of the DB cache and tune the cache eviction threads, needs to take into account the compressed oops option. For this is relies on the JVM option to be set explicitly. If the option is not explicitly set, the database may consider the cache full when it’s not, and run cache eviction too early, resulting in less optimized performances.

So, with 64bit JVM, make sure you add the -XX:+UseCompressedOops option to the start-ds line in the config/java.properties file. Then run bin/dsjavaproperties and restart OpenDJ to benefit from the new settings.

LDAP: Matching against the current time in OpenDJ

In LDAP, attributes have different syntaxes. The one used to indicate date and time is the GeneralizedTime, a string representation of the date and time, typically expressed in GMT time. For example, when an entry is modified, the server maintains the modifytimestamp attribute and sets a value like 20110825120001Z (for 2011, Aug 25, 12:00:01 GMT).

LDAP client applications often have to search for entries based on these date and time attributes, whether it is to find the entries that have been modified , or had the password changed recently… The way it is typically done, is the following: get from the system the current date, add or substract some fixed time (for example if you want to know the entries modified in the last 10 minutes), transform  to a GeneralizedTime, use that string in a search filter: (modifyTimestamp >= 20110825130000Z). If the application repeats that search a minute later, it has to recompute the value again, and again…

Ideally what application writers would like is to express the filter as an expression like (modifyTimestamp>=${CurrentTime} – 10 mn). However this is not compliant with LDAP. The proper way to solve this is to use extensible matching rules, and for that purpose, we’ve added 2 “relative time” matching rules in OpenDJ, the Open source LDAP Directory services for Java: one for “lower than” and one for “greater than”.

matchingrules: ( 1.3.6.1.4.1.26027.1.4.6 NAME ( 'relativeTimeLTOrderingMatch' 'relativeTimeOrderingMatch.lt' )
  SYNTAX 1.3.6.1.4.1.1466.115.121.1.24 )
matchingrules: ( 1.3.6.1.4.1.26027.1.4.5 NAME ( 'relativeTimeGTOrderingMatch' 'relativeTimeOrderingMatch.gt' )
  SYNTAX 1.3.6.1.4.1.1466.115.121.1.24 )

The way the matching rules work is pretty simple : (attribute:MatchingRule:=Offset), where the offset is a signed integer follow by its unit, either s for seconds, m for minutes, h for hours, d for days or w for weeks.

You can translate a statement to “is Attribute greater than (or lower than) CurrentTime +/- Offset”

(lastLoginTime:1.3.6.1.4.1.26027.1.4.6:=-4w) will match all entries who have a lastLoginTime value smaller than the Current Time minus 4 weeks, i.e. all entries who have a lastLoginTime older than 4 weeks.

(pwdExpirationTime:1.3.6.1.4.1.26027.1.4.5:=5d) will match all entries that have pwdExpirationTime greater than the Current Time plus 5 days, i.e. all entries that will expire in more than 5 days.

The true benefit of those matching rules, is actually when expressing policies in the OpenDJ server, for example for granting or denying access based on some attribute with a generalizedTime syntax, such as last login time, pwdChangedTime, modifyTimeStamp …

For example, imagine an auxiliary objectClass representing a service, with some specific attributes including an expiration date : validUntil. Now, you want to allow these attributes to be read only if the expiration date is not passed.

aci: (targetattr="serviceAttr1 || serverAttr2")(targetfilter="(validUntil:1.3.6.1.4.1.26027.1.4.5:=0s)")
  (version 3.0; acl "Read Valid service attributes"; allow (read, search, compare)
  userdn="ldap:///all";)

As you can see, this is a good way to hide (deny access to) stale data in a directory server, and to simplify client applications that need to search for entries based on some generalizedTime attributes. For example, consider using these “relative time” matching rules for all your audit queries for expired or unused accounts.

Finally, remember that the OpenDJ directory server doesn’t allow unindexed searches by default. So you might also want to create an index for the “relative time” matching rules. That’s a 2 steps process :

Define the index

$ bin/dsconfig create-local-db-index --backend-name userRoot --set index-type:extensible \
 --set index-extensible-matching-rule:1.3.6.1.4.1.26027.1.4.5 \
 --set index-extensible-matching-rule:1.3.6.1.4.1.26027.1.4.6 \
 --index-name createTimestamp -h localhost -p 4444 \
 -D cn=Directory\ Manager -w secret12 -n -X

Rebuild the index

$ bin/rebuild-index -b dc=example,dc=com -i createTimestamp \
 -h localhost -p 4444 -D cn=directory\ manager -w secret12 -X

OpenDJ: Analyzing Search Filters and Indexes

LDAP directory services greatly rely on indexes to provide fast and accurate search results.

OpenDJ, the open source LDAP directory services for the Java platform, provides a number of tools to ensure indexes are efficiently used or to optimize them for even better performances.

To start with, OpenDJ rejects by default all unindexed searches, unless the authenticated user has the privilege to perform them. Unindexed searches are rejected because they result in scanning the whole database, which consumes lots of resources and time. There are legitimate uses of unindexed search though, and OpenDJ offers a way to control who can perform them through a privilege. To learn more about privileges, how to grant them, please check the Administration Guide or some of my previous posts.

When unindexed searches are completed, OpenDJ (starting with revision 7148 of the OpenDJ trunk, and therefore OpenDJ 2.5) does logs the “Unindexed” keyword as part of the Search Response access log message. But the access log file can also be used to identify search operations that are not making an optimal use of indexes. Simply check for those search responses that have been returned with an etime (execution time) greater than the average.

The access log example below contains both an unusually high etime (expressed in ms) and the Unindexed tag.

[27/Jul/2011:20:27:27 +0200] SEARCH RES conn=0 op=1 msgID=2 result=0 nentries=10001 Unindexed etime=1846

The verify-index command let you check that no index is corrupted (i.e. no data is missing from indexes).

The rebuild-index command let you build or rebuild an index that would be corrupted or had its configuration changed.

One of the tuning parameter of indexes is the index-entry-limit (which was known in Sun DSEE as the AllIDsThreshold), the maximum size of entries kept in an index record, before the server stop maintaining that record and consider it’s more efficient to scan the whole database. For more information on the index entry limit, check the Section 7.2.4 Changing Index Entry Limits of the Indexing chapter of the Administration Guide.

OpenDJ provides a static analyzer of indexes which can help to understand how well the attributes are indexed, as well as help to tune the index entry limit. This tool is a function of the dbtest utility and is simply used as follow:

$ bin/dbtest list-index-status -n userRoot -b "dc=example,dc=com"

Index Name Index Type JE Database Name Index Valid Record Count Undefined 95% 90% 85%

---------------------------------------------------------------------------------------------------------------------------------------
id2children                Index       dc_example_dc_com_id2children                true         2             0          0    0    0
id2subtree                 Index       dc_example_dc_com_id2subtree                 true         2             0          0    0    0
uid.equality               Index       dc_example_dc_com_uid.equality               true         2000          0          0    0    0
aci.presence               Index       dc_example_dc_com_aci.presence               true         0             0          0    0    0
ds-sync-conflict.equality  Index       dc_example_dc_com_ds-sync-conflict.equality  true         0             0          0    0    0
givenName.equality         Index       dc_example_dc_com_givenName.equality         true         2000          0          0    0    0
givenName.substring        Index       dc_example_dc_com_givenName.substring        true         5777          0          0    0    0
objectClass.equality       Index       dc_example_dc_com_objectClass.equality       true         6             0          0    0    0
member.equality            Index       dc_example_dc_com_member.equality            true         0             0          0    0    0
uniqueMember.equality      Index       dc_example_dc_com_uniqueMember.equality      true         0             0          0    0    0
cn.equality                Index       dc_example_dc_com_cn.equality                true         2000          0          0    0    0
cn.substring               Index       dc_example_dc_com_cn.substring               true         19407         0          0    0    0
sn.equality                Index       dc_example_dc_com_sn.equality                true         2000          0          0    0    0
sn.substring               Index       dc_example_dc_com_sn.substring               true         8147          0          0    0    0
telephoneNumber.equality   Index       dc_example_dc_com_telephoneNumber.equality   true         2000          0          0    0    0
telephoneNumber.substring  Index       dc_example_dc_com_telephoneNumber.substring  true         16506         0          0    0    0
ds-sync-hist.ordering      Index       dc_example_dc_com_ds-sync-hist.ordering      true         1             0          0    0    0
mail.equality              Index       dc_example_dc_com_mail.equality              true         2000          0          0    0    0
mail.substring             Index       dc_example_dc_com_mail.substring             true         7235          0          0    0    0
entryUUID.equality         Index       dc_example_dc_com_entryUUID.equality         true         2002          0          0    0    0

Total: 20

If an index contains a non zero value (N) in the undefined column, it means N index keys have reached the index entry limit and are no longer maintained. This can be normal, for example with the ObjectClass equality index, where the vast majority of entries will have the same objectclasses (top, Person, organizationalPerson, inetOrgPerson). But, for other attributes, such as cn, it may indicate that the index entry limit is too low.

Finally, OpenDJ has an option to do a live analysis of search filters and how they use indexes. To enable live index analysis, simply enable it for the database backend that contains the data :

dsconfig set-backend-prop --backend-name userRoot  --set index-filter-analyzer-enabled:true \
 --set max-entries:50 -h localhost -p 4444 -D cn=Directory\ Manager -w ****** -n -X

The max-entries parameter specifies how many filter items are being analyzed and kept in memory. Only the last max-entries will be kept. If there is a huge variety of requests against the directory service, you might want to increase the number. However, keep in mind that the analysis is kept in memory, and the higher the number the largest the impact on the overall performances of the server.

We do not recommend that you leave the index analysis enabled all the time, especially in production. The index analyzer should be used to gather statistics over a flow of requests for a short period of time, and should be disabled afterwards to free the resources.

The result of the index analyzer can be retrieved under the cn=monitor suffix, more specifically as part of the database environment of the backend.

$ bin/ldapsearch -p 1389 -D cn=directory\ manager -w secret12  \
-b "cn=userRoot Database Environment,cn=monitor" '(objectclass=*)' filter-use

dn: cn=userRoot Database Environment,cn=monitor
filter-use: (uid=user.*) hits:1 maxmatches:20 message:
filter-use: (tel=*) hits:1 maxmatches:-1 message:presence index type is disabled
  for the tel attribute
filter-use: (objectClass=groupOfURLs) hits:1 maxmatches:0 message:
filter-use: (objectClass=groupOfEntries) hits:1 maxmatches:0 message:
filter-use: (objectClass=person) hits:1 maxmatches:20 message:
filter-use: (objectClass=ds-virtual-static-group) hits:1 maxmatches:0 message:
filter-use: (aci=*) hits:1 maxmatches:0 message:
filter-use: (objectClass=groupOfNames) hits:1 maxmatches:0 message:
filter-use: (objectClass=groupOfUniqueNames) hits:1 maxmatches:0 message:
filter-use: (objectClass=ldapSubentry) hits:1 maxmatches:0 message:
filter-use: (objectClass=subentry) hits:1 maxmatches:0 message:

hits represents the number of time this filter was used. the maxmatches represents the maximum number of entries that were returned for that filter.

Index analysis and tuning is not a simple task, and I recommend to play with these tools  a lot on a test environment to understand how to get the best out of them. But, as you can see, OpenDJ provides you with all the tools you need to get the best performances out of your LDAP directory.