Explaining index-entry-limit in ForgeRock Directory Services / OpenDJ

A few years ago, I’ve explained the various resource limits in OpenDJ, the open source LDAP and REST directory server. A few months ago, someone read the post and asked on twitter about the index-entry-limit:

Screen Shot 2016-08-20 at 16.28.01

The index-entry-limit is probably the least understood parameter in the OpenDJ directory server, as was the AllIDThreshold in Sun Directory Server (and its siblings : Netscape Directory, Red Hat Directory, Oracle DSEE…). So before I dive in explaining what is this parameter, how it’s used and how it can be tuned, let me start with answering the question : how does index-entry-limit relate to other administrative limits ?

Answer: It doesn’t ! The index-entry-limit is an internal limit and does not really limits the results returned to clients. It just limits the resources consumed when processing indexes.

A Directory Server is a very specialized data-store based on the LDAP standard, and its primary goal was to be able to search and return user information such as email addresses or names and phone numbers, very quickly and for a large number of different clients. For that, the directory servers were designed to favor reads over writes, and read optimization was achieved through the use of indexes.

In LDAP, a search request (which can be used to read an entry or search for one or more through the whole database) contains a search filter. The filter may be simple or complex, and composed of one or more attribute value assertions.

A simple filter can be “(sn=Smith)”. Complex filters combine operators and different attributes : “(&(objectclass=Person)(|(sn=Smith)(cn=*Smith*)))” – find a person whose surname is smith or whose common name contains smith

When the ForgeRock Directory Server / OpenDJ receives a search request, it processes it in 2 phases. In the first phase, it analyzes the search filter, to identify which attributes are indexed, and then uses these indexes to build a list of possible candidates to return. If there are no indexed attributes or the list is too large, the server decides that the list is actually the whole database. Such search request is tagged as “unindexed” and the server verify if the authenticated user has the “unindexed-search” privilege before continuing. In the second phase, it reads all the candidates from the database, and assess the full filter to decide to return the entry to the client or not (subject to access controls).

ForgeRock DS / OpenDJ implements attribute indexes as reversed index. Meaning that for a specific attribute, we keep a pair of each unique value and a list of the entries that  contain that value. Because maintaining a large list of entries for each value of all indexed attributes may have a big cost, both in term of memory usage and disk I/O (think that when you add an entry in the Directory, all of its indexed attributes will need to be updated), we introduced a limit to the number of entries that an index record can contain: the index-entry-limit. For example, if the number of entries that contain the objectClass person exceeds the limit, then we mark the key as “full” and we consider that the list of candidates is actually the whole set of directory entries.  This saves us from updating and reading a very long record, allocating lots of data, to end up iterating through almost all entries. You might ask, so why having an index for objectClass then ? Well, in a directory server that contains millions of users, there are in fact very few entries that are not persons. These entries will have their objectClass values indexed, and searching for those entries will be very efficient thanks for the index.

The index-entry-limit is a limit of the number of entries that are contained in a single index record, per value of an attribute index. Its default value is 4000 and works for most medium to large scale deployments. So, why is it a configurable parameter, and when should you change it?

Because ForgeRock DS is used in many different environments with various use cases, and a great range of number of entries (some of our customers have over 100 millions entries in a directory service), we know that one size doesn’t fit all. But the default value works for most of the index usages. Also, the index-entry-limit can be set for each individual index, or for the whole backend (and this value applies to all indexes that don’t have a specific value). It is highly recommended that you only try to change the index-entry-limit of specific indexes, and not the backend default value.

In no case, should you increase the index-entry-limit to a value close to the total number of entries in the directory. This will undermine performances of both searches and updates, significantly increase the footprint of the data stored on disk.

There are few known cases where the index-entry-limit value should be changed (and equally cases where increasing the value will only consume more resources for no performance gain). Keep also in mind that when you change the index-entry-limit, you need to rebuild the indexes for which the limit was changed. So it’s not something that you want to do too often. And definitely not something that you need to adjust constantly.

Groups. When the server starts, it issues an internal search to find all group entries and cache them for better performances. The search is based on the ObjectClass attribute. If there are more than 4000 groups of one kind (the search is for GroupOfNames, GroupOfUniqueNames, GroupOfEntries, DynamicGroup and ds-virtual-static-group), the search will be unindexed and can take a long time to proceed. In that case, you should increase the index-entry-limit for the ObjectClass attribute, to a value just above the number of groups.

Members (or uniqueMembers). If you have more than 4000 static groups, and you know that some users are likely to be member of more than 4000 groups, then you should also increase the index-entry-limit for the member attribute (or uniqueMember) to a value just above the maximum number of group a user can be in, especially if you have enabled the Referential Integrity Plugin (that removes a user from groups when its entry is deleted).

Another typical use case for increasing the index-entry-limit is when you have millions of entries, and an attribute doesn’t have a flat distribution of values. Think about the surname of users. In a wide range of population, there are probably more “Smith” or “Lee” than “Washington”. Within 10M users, would there be more than 4000 “Lee”? If it’s possible, and the server receives searches with filters such as “(sn=Lee)”, then you should consider increasing the limit for the sn attribute.

Backendstat is the tool you want to use to verify the state of the index and whether some records have reached the index-entry-limit. For some attributes, such as ObjectClass, it is normal that the limit is reached. For others, such as sn, it’s probably something you want to check regularly.

The backendstat tool requires exclusive access to the database, and thus can only run against a server that is stopped (or a backup).

To list the indexes, use backendstat list-indexes:

$ backendstat list-indexes -b dc=example,dc=com -n userRoot

Index Name Raw DB Name Type Record Count
dn2id /dc=com,dc=example/dn2id DN2ID 10002
id2entry /dc=com,dc=example/id2entry ID2Entry 10002
referral /dc=com,dc=example/referral DN2URI 0
id2childrencount /dc=com,dc=example/id2childrencount ID2ChildrenCount 3
state /dc=com,dc=example/state State 18
uniqueMember.uniqueMemberMatch /dc=com,dc=example/uniqueMember.uniqueMemberMatch MatchingRuleIndex 0
mail.caseIgnoreIA5SubstringsMatch:6 /dc=com,dc=example/mail.caseIgnoreIA5SubstringsMatch:6 MatchingRuleIndex 31232
mail.caseIgnoreIA5Match /dc=com,dc=example/mail.caseIgnoreIA5Match MatchingRuleIndex 10000
aci.presence /dc=com,dc=example/aci.presence MatchingRuleIndex 0
member.distinguishedNameMatch /dc=com,dc=example/member.distinguishedNameMatch MatchingRuleIndex 0
givenName.caseIgnoreMatch /dc=com,dc=example/givenName.caseIgnoreMatch MatchingRuleIndex 8605
givenName.caseIgnoreSubstringsMatch:6 /dc=com,dc=example/givenName.caseIgnoreSubstringsMatch:6 MatchingRuleIndex 19629
telephoneNumber.telephoneNumberSubstringsMatch:6 /dc=com,dc=example/telephoneNumber.telephoneNumberSubstringsMatch:6 MatchingRuleIndex 73235
telephoneNumber.telephoneNumberMatch /dc=com,dc=example/telephoneNumber.telephoneNumberMatch MatchingRuleIndex 10000
ds-sync-hist.changeSequenceNumberOrderingMatch /dc=com,dc=example/ds-sync-hist.changeSequenceNumberOrderingMatch MatchingRuleIndex 0
ds-sync-conflict.distinguishedNameMatch /dc=com,dc=example/ds-sync-conflict.distinguishedNameMatch MatchingRuleIndex 0
entryUUID.uuidMatch /dc=com,dc=example/entryUUID.uuidMatch MatchingRuleIndex 10002
sn.caseIgnoreMatch /dc=com,dc=example/sn.caseIgnoreMatch MatchingRuleIndex 10000
sn.caseIgnoreSubstringsMatch:6 /dc=com,dc=example/sn.caseIgnoreSubstringsMatch:6 MatchingRuleIndex 32217
cn.caseIgnoreMatch /dc=com,dc=example/cn.caseIgnoreMatch MatchingRuleIndex 10000
cn.caseIgnoreSubstringsMatch:6 /dc=com,dc=example/cn.caseIgnoreSubstringsMatch:6 MatchingRuleIndex 86040
objectClass.objectIdentifierMatch /dc=com,dc=example/objectClass.objectIdentifierMatch MatchingRuleIndex 6
uid.caseIgnoreMatch /dc=com,dc=example/uid.caseIgnoreMatch MatchingRuleIndex 10000

Total: 23

To check the status of the indexes and see which keys are full (i.e. exceeded the index-entry-limit), use backendstat show-index-status. Warning, this may take a long time.

$ backendstat show-index-status -b dc=example,dc=com -n userRoot
Index Name Raw DB Name Valid Confidential Record Count Over Entry Limit 95% 90% 85%
uniqueMember.uniqueMemberMatch /dc=com,dc=example/uniqueMember.uniqueMemberMatch true false 0 0 0 0 0
mail.caseIgnoreIA5SubstringsMatch:6 /dc=com,dc=example/mail.caseIgnoreIA5SubstringsMatch:6 true false 31232 12 0 0 0
mail.caseIgnoreIA5Match /dc=com,dc=example/mail.caseIgnoreIA5Match true false 10000 0 0 0 0
aci.presence /dc=com,dc=example/aci.presence true false 0 0 0 0 0
member.distinguishedNameMatch /dc=com,dc=example/member.distinguishedNameMatch true false 0 0 0 0 0
givenName.caseIgnoreMatch /dc=com,dc=example/givenName.caseIgnoreMatch true false 8605 0 0 0 0
givenName.caseIgnoreSubstringsMatch:6 /dc=com,dc=example/givenName.caseIgnoreSubstringsMatch:6 true false 19629 0 0 0 0
telephoneNumber.telephoneNumberSubstringsMatch:6 /dc=com,dc=example/telephoneNumber.telephoneNumberSubstringsMatch:6 true false 73235 0 0 0 0
telephoneNumber.telephoneNumberMatch /dc=com,dc=example/telephoneNumber.telephoneNumberMatch true false 10000 0 0 0 0
ds-sync-hist.changeSequenceNumberOrderingMatch /dc=com,dc=example/ds-sync-hist.changeSequenceNumberOrderingMatch true false 0 0 0 0 0
ds-sync-conflict.distinguishedNameMatch /dc=com,dc=example/ds-sync-conflict.distinguishedNameMatch true false 0 0 0 0 0
entryUUID.uuidMatch /dc=com,dc=example/entryUUID.uuidMatch true false 10002 0 0 0 0
sn.caseIgnoreMatch /dc=com,dc=example/sn.caseIgnoreMatch true false 10000 0 0 0 0
sn.caseIgnoreSubstringsMatch:6 /dc=com,dc=example/sn.caseIgnoreSubstringsMatch:6 true false 32217 0 0 0 0
cn.caseIgnoreMatch /dc=com,dc=example/cn.caseIgnoreMatch true false 10000 0 0 0 0
cn.caseIgnoreSubstringsMatch:6 /dc=com,dc=example/cn.caseIgnoreSubstringsMatch:6 true false 86040 0 0 0 0
objectClass.objectIdentifierMatch /dc=com,dc=example/objectClass.objectIdentifierMatch true false 6 4 0 0 0
uid.caseIgnoreMatch /dc=com,dc=example/uid.caseIgnoreMatch true false 10000 0 0 0 0
Total: 18
Index: /dc=com,dc=example/mail.caseIgnoreIA5SubstringsMatch:6
Over index-entry-limit keys: [.com] [@examp] [ample.] [com] [e.com] [exampl] [le.com] [m] [mple.c] [om] [ple.co] [xample]
Index: /dc=com,dc=example/objectClass.objectIdentifierMatch
Over index-entry-limit keys: [inetorgperson] [organizationalperson] [person] [top]

I hope this long article will help you better understand and tune your ForgeRock Directory Servers for search performances. Please let me know how it goes.

Better index troubleshooting with ForgeRock DS / OpenDJ

Many years ago, I wrote about troubleshooting indexes and search performances, explaining the magicdebugSearchIndex” operational attribute, that allows an administrator to get from the server information about the processing of indexes for a specific search query.

The returned value provides insights on the indexes that were used for a particular search, how they were used and how the resulting set of candidates was built, allowing an administrator to understand whether indexes are used optimally or need to be tailored better for specific search queries and filters, in combination with access logs and other tools such as backendstat.

In DS 6.5, we’ve made some improvements in the search filter processing and we’ve changed the format of the debugSearchIndex value to provide a better reporting of how indexes are used.

The new format is now JSON based, which allow to give it more structure and all could be processed programatically. Here are a few examples of output of the new debugSearchIndex attribute values.

$ bin/ldapsearch -h localhost -p 1389 -D "cn=directory manager" -b "dc=example,dc=com" "(&(cn=*Den*)(mail=user.19*))" debugsearchindex
Password for user 'cn=directory manager': *********

dn: cn=debugsearch
debugsearchindex: {"filter":{"intersection":[{"index":"mail.caseIgnoreIA5SubstringsMatch:6", "exact":"ser.19","candidates":111,"retained":111},{"index":"mail.caseIgnoreIA5SubstringsMatch:6", "exact":"user.1","candidates":1111,"retained":111},
{"filter":"(cn=*Den*)", "index":"cn.caseIgnoreSubstringsMatch:6",
"range":"[den,deo[","candidates":103,"retained":5}], "candidates":5},"final":5}

Let’s look at the debugSearchIndex value and interpret it:

{
"filter": {
"intersection": [
{
"index": "mail.caseIgnoreIA5SubstringsMatch:6",
"exact": "ser.19",
"candidates": 111,
"retained": 111
},
{
"index": "mail.caseIgnoreIA5SubstringsMatch:6",
"exact": "user.1",
"candidates": 1111,
"retained": 111
},
{
"filter": "(cn=*Den*)",
"index": "cn.caseIgnoreSubstringsMatch:6",
"range": "[den,deo[",
"candidates": 103,
"retained": 5
}
],
"candidates": 5
},
"final": 5
}

The filter had 2 components: (cn=*Den*) and (mail=user.19*). Because the whole filter is an AND, the result set is an intersection of several index lookups. Also, both substring filters, but one is a substring of 3 characters and the second one a substring of 7 characters. By default, substring indexes are built with substrings of 6 characters. So the filters are treated differently. The server optimises the processing of indexes so that it will try to first to use the queries that are the most effective. In the case above, the filter (mail=user.19*) is preferred. 2 records are read from the index, and that results in a list of 111 candidates. Then, the server use the remaining filter to narrow the result list. Because the string Den is shorter than the indexed substrings, the server scans a range of keys in the index, starting from the first key match “den” and stopping before the key that matches “deo”. This results in 103 candidates, but only 5 are retained because they were parts of the previous result set. So the result is 5 entries that are matching these filters.

Note the [den,deo[ notation is similar to mathematical Set representation where [ and ] indicate whether a set includes or excludes the boundaries.

Let’s take an example with an OR filter:

$ bin/ldapsearch -h localhost -p 1389 -D "cn=directory manager" -b "dc=example,dc=com" "(|(cn=*Denice*)(uid=user.19))" debugsearchindex
Password for user 'cn=directory manager': *********

dn: cn=debugsearch
debugsearchindex: {"filter":{"union":[{"filter":"(cn=*Denice*)", "index":"cn.caseIgnoreSubstringsMatch:6","exact":"denice","candidates":1}, {"filter":"(uid=user.19)", "index":"uid.caseIgnoreMatch","exact":"user.19","candidates":1}],"candidates":2},"final":2}

As you can see, the result is now a union of 2 exact match (i.e. reads of index keys), each resulting a 1 candidate.

Finally here’s another example, where the scope is used to attempt to reduce the candidate list:

$ bin/ldapsearch -h localhost -p 1389 -D "cn=directory manager" -b "ou=people,dc=example,dc=com" -s one "(mail=user.1)" debugsearchindex
Password for user 'cn=directory manager': *********

dn: cn=debugsearch
debugsearchindex: {"filter":{"filter":"(mail=user.1)","index":"mail.caseIgnoreIA5SubstringsMatch:6", "exact":"user.1","candidates":1111},"scope":{"type":"one","candidates":"[NOT-INDEXED]","retained":1111},"final":1111}

You can find more information and details about the debugsearchindex attribute in the ForgeRock Directory Services 6.5 Administration Guide.

OpenDJ: Monitoring Unindexed Searches…

FR_plogo_org_FC_openDJ-300x86OpenDJ, the open source LDAP directory services, makes use of indexes to optimise search queries. When a search query doesn’t match any index, the server will cursor through the whole database to return the entries, if any, that match the search filter. These unindexed queries can require a lot of resources : I/Os, CPU… In order to reduce the resource consumption, OpenDJ rejects unindexed queries by default, except for the Root DNs (i.e. for cn=Directory Manager).

In previous articles, I’ve talked about privileges for administratives accounts, and also about Analyzing Search Filters and Indexes.

Today, I’m going to show you how to monitor for unindexed searches by keeping a dedicated log file, using the traditional access logger and filtering criteria.

First, we’re going to create a new access logger, named “Searches” that will write its messages under “logs/search”.

dsconfig -D cn=directory\ manager -w secret12 -h localhost -p 4444 -n -X \
    create-log-publisher \
    --set enabled:true \
    --set log-file:logs/search \
    --set filtering-policy:inclusive \
    --set log-format:combined \
    --type file-based-access \
    --publisher-name Searches

Then we’re defining a Filtering Criteria, that will restrict what is being logged in that file: Let’s log only “search” operations, that are marked as “unindexed” and take more than “5000” milliseconds.

dsconfig -D cn=directory\ manager -w secret12 -h localhost -p 4444 -n -X \
    create-access-log-filtering-criteria \
    --publisher-name Searches \
    --set log-record-type:search \
    --set search-response-is-indexed:false \
    --set response-etime-greater-than:5000 \
    --type generic \
    --criteria-name Expensive\ Searches

Voila! Now, whenever a search request is unindexed and take more than 5 seconds, the server will log the request to logs/search (in a single line) as below :

$ tail logs/search
[12/Sep/2016:14:25:31 +0200] SEARCH conn=10 op=1 msgID=2 base="dc=example,
dc=com" scope=sub filter="(objectclass=*)" attrs="+,*" result=0 nentries=
10003 unindexed etime=6542

This file can be monitored and used to trigger alerts to administrators, or simply used to collect and analyse the filters that result into unindexed requests, in order to better tune the OpenDJ indexes.

Note that sometimes, it is a good option to leave some requests unindexed (the cost of indexing them outweighs the benefits of the index). If these requests are unfrequent, run by specific administrators for reporting reasons, and if the results are expecting to contain a lot of entries. If so, a best practice is to have a dedicated replica for administration and run these expensive requests. Also, it is better if the client applications are tuned to expect these requests to take a long time.

Tips: Do not index virtual attributes in OpenDJ

OpenDJ-300x100OpenDJ, the open source LDAP directory service in Java, offer some interesting services to reduce and optimize the size and usage of data.

One of them is the Virtual Attribute feature, which allow certain attributes and values to be computed as needed, either based on some of the server internals or other attributes. OpenDJ ships with a number of virtual attributes by default : entryDN, entryUUID, etag, gouverningStructureRule, hasSubordinate, isMemberOf, numSubordinate, password Expiration Time (ds-pwp-password-expiration-time), structuralObjectClass, subSchemaSubEntry, …

Since these attributes are virtual and thus not stored as part of the entries in the database backend, you must not define any index for them. When possible, the virtual attribute provider will make use of default system index (like entryDN uses the DN index), but most of the time, these attributes are for reading and consuming.

If you do configure an index for one of the virtual attribute, the server will repeatedly report that the index is degraded with an error message similar to the following :

[21/Jan/2013:09:16:07 +0000] category=JEB severity=NOTICE msgID=8847510 msg=Due to changes in the configuration, index dc_example_dc_com_entryDN is currently operating in a degraded state and must be rebuilt before it can be used

And then some seaches may fail to return entries. So you must delete this index to let the server behave properly.

OpenDJ: Extensible indexes for Internationalization.

While taming the subject of indexes, we recently had some discussion with one of our users who complained about long response times with some language specific search filter such as (cn:fr.6:=*John*).

These extended filters rely on I18N Collation matching rules and indexes that I’ve described in an old post for OpenDS.

It turned out that he had defined the matching rule for the index, and rebuilt it, but had missed an important part: the index-type did not include “extensible”.

The proper command to create an extensible index is the following :

dsconfig set-local-db-index-prop --backend-name userRoot --index-name cn \
 --set index-extensible-matching-rule:fr.6 \
 --add index-type:extensible \
 --hostname localhost --port 4444 \
 --bindDN cn=Directory\ Manager --bindPassword ****** \
 -X -n

fr.6 is the shortcut for the French substring collation matching rule which full OID is 1.3.6.1.4.1.42.2.27.9.4.76.1

Note that if you don’t specify the extensible index-type, the server will not build the index for the extensible matching rule. The use of the index-type is consistent with the other types of index, equality or else, and allows you to disable and re-enable extensible indexes without having to re-enter all  OIDs.

OpenDJ Tips: More on troubleshooting indexes and search performances

In a previous post I talked about analyzing search filters and indexes. Matt added in a comment that OpenDJ has another mean of understanding how indexes are used in a search. Here’s a detailed post.

The OpenDJ LDAP directory server supports a “magic” operational attribute that allows an administrator to get from the server information about the processing of indexes for a specific search query: debugsearchindex.

If the attribute is set in the requested attributes in a search operation, the server will not return all entries as expected, but a single result entry with a fixed distinguished name and a single valued attribute debugsearchindex that contains the information related to the index processing, including the number of candidate entries per filter component, the overall number of candidate, and whether any or all of the search is indexed.

$ bin/ldapsearch -h localhost -p 1389 -D "cn=Directory Manager" -b "dc=example,dc=com" "(&(mail=user.*)(cn=*Denice*))" debugsearchindex
Password for user 'cn=Directory Manager': *******
dn: cn=debugsearch
debugsearchindex: filter=(&(mail=user.*)[INDEX:mail.substring][COUNT:2000](cn=*Denice*)[INDEX:cn.substring][COUNT:1])[COUNT:1] final=[COUNT:1]

$ bin/ldapsearch -h localhost -p 1389 -D "cn=Directory Manager" -b "dc=example,dc=com" "objectclass=*" debugsearchindex
Password for user 'cn=Directory Manager': *********
dn: cn=debugsearchdebugsearchindex: filter=(objectClass=*)[NOT-INDEXED] scope=wholeSubtree[COUNT:2007] final=[COUNT:2007]

$ bin/ldapsearch -h localhost -p 1389 -D "cn=Directory Manager" -b "dc=example,dc=com" "mail=user.1*" debugsearchindex
Password for user 'cn=Directory Manager': *********
dn: cn=debugsearch
debugsearchindex: filter=(mail=user.1*)[INDEX:mail.substring][COUNT:1111] scope=wholeSubtree[COUNT:2007] final=[COUNT:1111]

Note that sometimes, OpenDJ tries to optimize the query and use some other index than the regular one for the query. For example, it might use the equality index for an initial substring filter. The index used during the search does appear in the debugsearchindex attribute. Also, once the result set has been narrowed down to very few entries, it will stop using index and evaluate directly the entry set, as for the example below:

$ bin/ldapsearch -h localhost -p 1389 -D "cn=Directory Manager" -b "dc=example,dc=com" "(&(cn=Denice*)(mail=user.9*))" debugsearchindex
Password for user 'cn=Directory Manager':
dn: cn=debugsearch
debugsearchindex: filter=(&(cn=Denice*)[INDEX:cn.equality][COUNT:1])[COUNT:1] final=[COUNT:1]

OpenDJ: Analyzing Search Filters and Indexes

LDAP directory services greatly rely on indexes to provide fast and accurate search results.

OpenDJ, the open source LDAP directory services for the Java platform, provides a number of tools to ensure indexes are efficiently used or to optimize them for even better performances.

To start with, OpenDJ rejects by default all unindexed searches, unless the authenticated user has the privilege to perform them. Unindexed searches are rejected because they result in scanning the whole database, which consumes lots of resources and time. There are legitimate uses of unindexed search though, and OpenDJ offers a way to control who can perform them through a privilege. To learn more about privileges, how to grant them, please check the Administration Guide or some of my previous posts.

When unindexed searches are completed, OpenDJ (starting with revision 7148 of the OpenDJ trunk, and therefore OpenDJ 2.5) does logs the “Unindexed” keyword as part of the Search Response access log message. But the access log file can also be used to identify search operations that are not making an optimal use of indexes. Simply check for those search responses that have been returned with an etime (execution time) greater than the average.

The access log example below contains both an unusually high etime (expressed in ms) and the Unindexed tag.

[27/Jul/2011:20:27:27 +0200] SEARCH RES conn=0 op=1 msgID=2 result=0 nentries=10001 Unindexed etime=1846

The verify-index command let you check that no index is corrupted (i.e. no data is missing from indexes).

The rebuild-index command let you build or rebuild an index that would be corrupted or had its configuration changed.

One of the tuning parameter of indexes is the index-entry-limit (which was known in Sun DSEE as the AllIDsThreshold), the maximum size of entries kept in an index record, before the server stop maintaining that record and consider it’s more efficient to scan the whole database. For more information on the index entry limit, check the Section 7.2.4 Changing Index Entry Limits of the Indexing chapter of the Administration Guide.

OpenDJ provides a static analyzer of indexes which can help to understand how well the attributes are indexed, as well as help to tune the index entry limit. This tool is a function of the dbtest utility and is simply used as follow:

$ bin/dbtest list-index-status -n userRoot -b "dc=example,dc=com"

Index Name Index Type JE Database Name Index Valid Record Count Undefined 95% 90% 85%

---------------------------------------------------------------------------------------------------------------------------------------
id2children                Index       dc_example_dc_com_id2children                true         2             0          0    0    0
id2subtree                 Index       dc_example_dc_com_id2subtree                 true         2             0          0    0    0
uid.equality               Index       dc_example_dc_com_uid.equality               true         2000          0          0    0    0
aci.presence               Index       dc_example_dc_com_aci.presence               true         0             0          0    0    0
ds-sync-conflict.equality  Index       dc_example_dc_com_ds-sync-conflict.equality  true         0             0          0    0    0
givenName.equality         Index       dc_example_dc_com_givenName.equality         true         2000          0          0    0    0
givenName.substring        Index       dc_example_dc_com_givenName.substring        true         5777          0          0    0    0
objectClass.equality       Index       dc_example_dc_com_objectClass.equality       true         6             0          0    0    0
member.equality            Index       dc_example_dc_com_member.equality            true         0             0          0    0    0
uniqueMember.equality      Index       dc_example_dc_com_uniqueMember.equality      true         0             0          0    0    0
cn.equality                Index       dc_example_dc_com_cn.equality                true         2000          0          0    0    0
cn.substring               Index       dc_example_dc_com_cn.substring               true         19407         0          0    0    0
sn.equality                Index       dc_example_dc_com_sn.equality                true         2000          0          0    0    0
sn.substring               Index       dc_example_dc_com_sn.substring               true         8147          0          0    0    0
telephoneNumber.equality   Index       dc_example_dc_com_telephoneNumber.equality   true         2000          0          0    0    0
telephoneNumber.substring  Index       dc_example_dc_com_telephoneNumber.substring  true         16506         0          0    0    0
ds-sync-hist.ordering      Index       dc_example_dc_com_ds-sync-hist.ordering      true         1             0          0    0    0
mail.equality              Index       dc_example_dc_com_mail.equality              true         2000          0          0    0    0
mail.substring             Index       dc_example_dc_com_mail.substring             true         7235          0          0    0    0
entryUUID.equality         Index       dc_example_dc_com_entryUUID.equality         true         2002          0          0    0    0

Total: 20

If an index contains a non zero value (N) in the undefined column, it means N index keys have reached the index entry limit and are no longer maintained. This can be normal, for example with the ObjectClass equality index, where the vast majority of entries will have the same objectclasses (top, Person, organizationalPerson, inetOrgPerson). But, for other attributes, such as cn, it may indicate that the index entry limit is too low.

Finally, OpenDJ has an option to do a live analysis of search filters and how they use indexes. To enable live index analysis, simply enable it for the database backend that contains the data :

dsconfig set-backend-prop --backend-name userRoot  --set index-filter-analyzer-enabled:true \
 --set max-entries:50 -h localhost -p 4444 -D cn=Directory\ Manager -w ****** -n -X

The max-entries parameter specifies how many filter items are being analyzed and kept in memory. Only the last max-entries will be kept. If there is a huge variety of requests against the directory service, you might want to increase the number. However, keep in mind that the analysis is kept in memory, and the higher the number the largest the impact on the overall performances of the server.

We do not recommend that you leave the index analysis enabled all the time, especially in production. The index analyzer should be used to gather statistics over a flow of requests for a short period of time, and should be disabled afterwards to free the resources.

The result of the index analyzer can be retrieved under the cn=monitor suffix, more specifically as part of the database environment of the backend.

$ bin/ldapsearch -p 1389 -D cn=directory\ manager -w secret12  \
-b "cn=userRoot Database Environment,cn=monitor" '(objectclass=*)' filter-use

dn: cn=userRoot Database Environment,cn=monitor
filter-use: (uid=user.*) hits:1 maxmatches:20 message:
filter-use: (tel=*) hits:1 maxmatches:-1 message:presence index type is disabled
  for the tel attribute
filter-use: (objectClass=groupOfURLs) hits:1 maxmatches:0 message:
filter-use: (objectClass=groupOfEntries) hits:1 maxmatches:0 message:
filter-use: (objectClass=person) hits:1 maxmatches:20 message:
filter-use: (objectClass=ds-virtual-static-group) hits:1 maxmatches:0 message:
filter-use: (aci=*) hits:1 maxmatches:0 message:
filter-use: (objectClass=groupOfNames) hits:1 maxmatches:0 message:
filter-use: (objectClass=groupOfUniqueNames) hits:1 maxmatches:0 message:
filter-use: (objectClass=ldapSubentry) hits:1 maxmatches:0 message:
filter-use: (objectClass=subentry) hits:1 maxmatches:0 message:

hits represents the number of time this filter was used. the maxmatches represents the maximum number of entries that were returned for that filter.

Index analysis and tuning is not a simple task, and I recommend to play with these tools  a lot on a test environment to understand how to get the best out of them. But, as you can see, OpenDJ provides you with all the tools you need to get the best performances out of your LDAP directory.