Storing JSON objects in LDAP attributes…

jsonUntil recently, the only way to store a JSON object to an LDAP directory server, was to store it as string (either a Directory String i.e a sequence of UTF-8 characters, or an Octet String i.e. a blob of octets).

But now, in OpenDJ, the Open source LDAP Directory services in Java, there is now support for new syntaxes : one for JSON objects and one for JSON Query. Associated with the JSON query, a couple of matching rules, that can be easily customised and extended, have been defined.

To use the syntax and matching rules, you should first extend the LDAP schema with one or more new attributes, and use these attributes in object classes. For example :

dn: cn=schema
objectClass: top
objectClass: ldapSubentry
objectClass: subschema
attributeTypes: ( 1.3.6.1.4.1.36733.2.1.1.999 NAME 'json'
SYNTAX 1.3.6.1.4.1.36733.2.1.3.1 EQUALITY caseIgnoreJsonQueryMatch SINGLE-VALUE )
objectClasses: (1.3.6.1.4.1.36733.2.1.2.999 NAME 'jsonObject'
SUP top MUST (cn $ json ) )

Just copy the LDIF above into config/schema/95-json.ldif, and restart the OpenDJ server. Make sure you use your own OIDs when defining schema elements. The ones above are samples and should not be used in production.

Then, you can add entries in the OpenDJ directory server like this:

$ ldapmodify -a -D cn=directory\ manager -w secret12 -h localhost -p 1389

dn: cn=bjensen,ou=people,dc=example,dc=com
objectClass: top
objectClass: jsonObject
cn: bjensen
json: { "_id":"bjensen", "_rev":"123", "name": { "first": "Babs", "surname": "Jensen" }, "age": 25, "roles": [ "sales", "admin" ] }

dn: cn=scarter,ou=people,dc=example,dc=com
objectClass: top
objectClass: jsonObject
cn: scarter
json: { "_id":"scarter", "_rev":"456", "name": { "first": "Sam", "surname": "Carter" }, "age": 48, "roles": [ "manager", "eng" ] }

The very nice thing about the JSON syntax and matching rules, is that OpenDJ understands how the values of the json attribute are structured, and it becomes possible to make specific queries, using the JSON Query syntax.

Let’s search for all jsonObjects that have a json value with a specific _id :

$ ldapsearch -D cn=directory\ manager -w secret12 -h localhost -p 1389 -b "dc=example,dc=com" -s sub "(json=_id eq 'scarter')"

dn: cn=scarter,ou=people,dc=example,dc=com
objectClass: top
objectClass: jsonObject
json: { "_id":"scarter", "_rev":"456", "name": { "first": "Sam", "surname": "Carter" }, "age": 48, "roles": [ "manager", "eng" ] }
cn: scarter

We can run more complex queries, still using the JSON Query Syntax:

$ ldapsearch -D cn=directory\ manager -w secret12 -h localhost -p 1389 -b "dc=example,dc=com" -s sub "(json=name/first sw 'b' and age lt 30)"

dn: cn=bjensen,ou=people,dc=example,dc=com
objectClass: top
objectClass: jsonObject
json: { "_id":"bjensen", "_rev":"123", "name": { "first": "Babs", "surname": "Jensen" }, "age": 25, "roles": [ "sales", "admin" ] }
cn: bjensen

For a complete description of the query  filter expressions, please refer to ForgeRock Common  REST (CREST) Query Filter documentation.

The JSON matching rule supports indexing which can be enabled using dsconfig against the appropriate attribute index. By default all JSON fields of the attribute are indexed.

In a followup post, I will give more advanced configuration of the JSON Syntax, detail how to customise the matching rule to index only specific JSON fields, and will outline some best practices with the JSON syntax and attributes.

OpenDJ: Monitoring Unindexed Searches…

FR_plogo_org_FC_openDJ-300x86OpenDJ, the open source LDAP directory services, makes use of indexes to optimise search queries. When a search query doesn’t match any index, the server will cursor through the whole database to return the entries, if any, that match the search filter. These unindexed queries can require a lot of resources : I/Os, CPU… In order to reduce the resource consumption, OpenDJ rejects unindexed queries by default, except for the Root DNs (i.e. for cn=Directory Manager).

In previous articles, I’ve talked about privileges for administratives accounts, and also about Analyzing Search Filters and Indexes.

Today, I’m going to show you how to monitor for unindexed searches by keeping a dedicated log file, using the traditional access logger and filtering criteria.

First, we’re going to create a new access logger, named “Searches” that will write its messages under “logs/search”.

dsconfig -D cn=directory\ manager -w secret12 -h localhost -p 4444 -n -X \
    create-log-publisher \
    --set enabled:true \
    --set log-file:logs/search \
    --set filtering-policy:inclusive \
    --set log-format:combined \
    --type file-based-access \
    --publisher-name Searches

Then we’re defining a Filtering Criteria, that will restrict what is being logged in that file: Let’s log only “search” operations, that are marked as “unindexed” and take more than “5000” milliseconds.

dsconfig -D cn=directory\ manager -w secret12 -h localhost -p 4444 -n -X \
    create-access-log-filtering-criteria \
    --publisher-name Searches \
    --set log-record-type:search \
    --set search-response-is-indexed:false \
    --set response-etime-greater-than:5000 \
    --type generic \
    --criteria-name Expensive\ Searches

Voila! Now, whenever a search request is unindexed and take more than 5 seconds, the server will log the request to logs/search (in a single line) as below :

$ tail logs/search
[12/Sep/2016:14:25:31 +0200] SEARCH conn=10 op=1 msgID=2 base="dc=example,
dc=com" scope=sub filter="(objectclass=*)" attrs="+,*" result=0 nentries=
10003 unindexed etime=6542

This file can be monitored and used to trigger alerts to administrators, or simply used to collect and analyse the filters that result into unindexed requests, in order to better tune the OpenDJ indexes.

Note that sometimes, it is a good option to leave some requests unindexed (the cost of indexing them outweighs the benefits of the index). If these requests are unfrequent, run by specific administrators for reporting reasons, and if the results are expecting to contain a lot of entries. If so, a best practice is to have a dedicated replica for administration and run these expensive requests. Also, it is better if the client applications are tuned to expect these requests to take a long time.

OpenDJ: Extensible indexes for Internationalization.

While taming the subject of indexes, we recently had some discussion with one of our users who complained about long response times with some language specific search filter such as (cn:fr.6:=*John*).

These extended filters rely on I18N Collation matching rules and indexes that I’ve described in an old post for OpenDS.

It turned out that he had defined the matching rule for the index, and rebuilt it, but had missed an important part: the index-type did not include “extensible”.

The proper command to create an extensible index is the following :

dsconfig set-local-db-index-prop --backend-name userRoot --index-name cn \
 --set index-extensible-matching-rule:fr.6 \
 --add index-type:extensible \
 --hostname localhost --port 4444 \
 --bindDN cn=Directory\ Manager --bindPassword ****** \
 -X -n

fr.6 is the shortcut for the French substring collation matching rule which full OID is 1.3.6.1.4.1.42.2.27.9.4.76.1

Note that if you don’t specify the extensible index-type, the server will not build the index for the extensible matching rule. The use of the index-type is consistent with the other types of index, equality or else, and allows you to disable and re-enable extensible indexes without having to re-enter all  OIDs.

OpenDJ Tips: More on troubleshooting indexes and search performances

In a previous post I talked about analyzing search filters and indexes. Matt added in a comment that OpenDJ has another mean of understanding how indexes are used in a search. Here’s a detailed post.

The OpenDJ LDAP directory server supports a “magic” operational attribute that allows an administrator to get from the server information about the processing of indexes for a specific search query: debugsearchindex.

If the attribute is set in the requested attributes in a search operation, the server will not return all entries as expected, but a single result entry with a fixed distinguished name and a single valued attribute debugsearchindex that contains the information related to the index processing, including the number of candidate entries per filter component, the overall number of candidate, and whether any or all of the search is indexed.

$ bin/ldapsearch -h localhost -p 1389 -D "cn=Directory Manager" -b "dc=example,dc=com" "(&(mail=user.*)(cn=*Denice*))" debugsearchindex
Password for user 'cn=Directory Manager': *******
dn: cn=debugsearch
debugsearchindex: filter=(&(mail=user.*)[INDEX:mail.substring][COUNT:2000](cn=*Denice*)[INDEX:cn.substring][COUNT:1])[COUNT:1] final=[COUNT:1]

$ bin/ldapsearch -h localhost -p 1389 -D "cn=Directory Manager" -b "dc=example,dc=com" "objectclass=*" debugsearchindex
Password for user 'cn=Directory Manager': *********
dn: cn=debugsearchdebugsearchindex: filter=(objectClass=*)[NOT-INDEXED] scope=wholeSubtree[COUNT:2007] final=[COUNT:2007]

$ bin/ldapsearch -h localhost -p 1389 -D "cn=Directory Manager" -b "dc=example,dc=com" "mail=user.1*" debugsearchindex
Password for user 'cn=Directory Manager': *********
dn: cn=debugsearch
debugsearchindex: filter=(mail=user.1*)[INDEX:mail.substring][COUNT:1111] scope=wholeSubtree[COUNT:2007] final=[COUNT:1111]

Note that sometimes, OpenDJ tries to optimize the query and use some other index than the regular one for the query. For example, it might use the equality index for an initial substring filter. The index used during the search does appear in the debugsearchindex attribute. Also, once the result set has been narrowed down to very few entries, it will stop using index and evaluate directly the entry set, as for the example below:

$ bin/ldapsearch -h localhost -p 1389 -D "cn=Directory Manager" -b "dc=example,dc=com" "(&(cn=Denice*)(mail=user.9*))" debugsearchindex
Password for user 'cn=Directory Manager':
dn: cn=debugsearch
debugsearchindex: filter=(&(cn=Denice*)[INDEX:cn.equality][COUNT:1])[COUNT:1] final=[COUNT:1]

OpenDJ: Analyzing Search Filters and Indexes

LDAP directory services greatly rely on indexes to provide fast and accurate search results.

OpenDJ, the open source LDAP directory services for the Java platform, provides a number of tools to ensure indexes are efficiently used or to optimize them for even better performances.

To start with, OpenDJ rejects by default all unindexed searches, unless the authenticated user has the privilege to perform them. Unindexed searches are rejected because they result in scanning the whole database, which consumes lots of resources and time. There are legitimate uses of unindexed search though, and OpenDJ offers a way to control who can perform them through a privilege. To learn more about privileges, how to grant them, please check the Administration Guide or some of my previous posts.

When unindexed searches are completed, OpenDJ (starting with revision 7148 of the OpenDJ trunk, and therefore OpenDJ 2.5) does logs the “Unindexed” keyword as part of the Search Response access log message. But the access log file can also be used to identify search operations that are not making an optimal use of indexes. Simply check for those search responses that have been returned with an etime (execution time) greater than the average.

The access log example below contains both an unusually high etime (expressed in ms) and the Unindexed tag.

[27/Jul/2011:20:27:27 +0200] SEARCH RES conn=0 op=1 msgID=2 result=0 nentries=10001 Unindexed etime=1846

The verify-index command let you check that no index is corrupted (i.e. no data is missing from indexes).

The rebuild-index command let you build or rebuild an index that would be corrupted or had its configuration changed.

One of the tuning parameter of indexes is the index-entry-limit (which was known in Sun DSEE as the AllIDsThreshold), the maximum size of entries kept in an index record, before the server stop maintaining that record and consider it’s more efficient to scan the whole database. For more information on the index entry limit, check the Section 7.2.4 Changing Index Entry Limits of the Indexing chapter of the Administration Guide.

OpenDJ provides a static analyzer of indexes which can help to understand how well the attributes are indexed, as well as help to tune the index entry limit. This tool is a function of the dbtest utility and is simply used as follow:

$ bin/dbtest list-index-status -n userRoot -b "dc=example,dc=com"

Index Name Index Type JE Database Name Index Valid Record Count Undefined 95% 90% 85%

---------------------------------------------------------------------------------------------------------------------------------------
id2children                Index       dc_example_dc_com_id2children                true         2             0          0    0    0
id2subtree                 Index       dc_example_dc_com_id2subtree                 true         2             0          0    0    0
uid.equality               Index       dc_example_dc_com_uid.equality               true         2000          0          0    0    0
aci.presence               Index       dc_example_dc_com_aci.presence               true         0             0          0    0    0
ds-sync-conflict.equality  Index       dc_example_dc_com_ds-sync-conflict.equality  true         0             0          0    0    0
givenName.equality         Index       dc_example_dc_com_givenName.equality         true         2000          0          0    0    0
givenName.substring        Index       dc_example_dc_com_givenName.substring        true         5777          0          0    0    0
objectClass.equality       Index       dc_example_dc_com_objectClass.equality       true         6             0          0    0    0
member.equality            Index       dc_example_dc_com_member.equality            true         0             0          0    0    0
uniqueMember.equality      Index       dc_example_dc_com_uniqueMember.equality      true         0             0          0    0    0
cn.equality                Index       dc_example_dc_com_cn.equality                true         2000          0          0    0    0
cn.substring               Index       dc_example_dc_com_cn.substring               true         19407         0          0    0    0
sn.equality                Index       dc_example_dc_com_sn.equality                true         2000          0          0    0    0
sn.substring               Index       dc_example_dc_com_sn.substring               true         8147          0          0    0    0
telephoneNumber.equality   Index       dc_example_dc_com_telephoneNumber.equality   true         2000          0          0    0    0
telephoneNumber.substring  Index       dc_example_dc_com_telephoneNumber.substring  true         16506         0          0    0    0
ds-sync-hist.ordering      Index       dc_example_dc_com_ds-sync-hist.ordering      true         1             0          0    0    0
mail.equality              Index       dc_example_dc_com_mail.equality              true         2000          0          0    0    0
mail.substring             Index       dc_example_dc_com_mail.substring             true         7235          0          0    0    0
entryUUID.equality         Index       dc_example_dc_com_entryUUID.equality         true         2002          0          0    0    0

Total: 20

If an index contains a non zero value (N) in the undefined column, it means N index keys have reached the index entry limit and are no longer maintained. This can be normal, for example with the ObjectClass equality index, where the vast majority of entries will have the same objectclasses (top, Person, organizationalPerson, inetOrgPerson). But, for other attributes, such as cn, it may indicate that the index entry limit is too low.

Finally, OpenDJ has an option to do a live analysis of search filters and how they use indexes. To enable live index analysis, simply enable it for the database backend that contains the data :

dsconfig set-backend-prop --backend-name userRoot  --set index-filter-analyzer-enabled:true \
 --set max-entries:50 -h localhost -p 4444 -D cn=Directory\ Manager -w ****** -n -X

The max-entries parameter specifies how many filter items are being analyzed and kept in memory. Only the last max-entries will be kept. If there is a huge variety of requests against the directory service, you might want to increase the number. However, keep in mind that the analysis is kept in memory, and the higher the number the largest the impact on the overall performances of the server.

We do not recommend that you leave the index analysis enabled all the time, especially in production. The index analyzer should be used to gather statistics over a flow of requests for a short period of time, and should be disabled afterwards to free the resources.

The result of the index analyzer can be retrieved under the cn=monitor suffix, more specifically as part of the database environment of the backend.

$ bin/ldapsearch -p 1389 -D cn=directory\ manager -w secret12  \
-b "cn=userRoot Database Environment,cn=monitor" '(objectclass=*)' filter-use

dn: cn=userRoot Database Environment,cn=monitor
filter-use: (uid=user.*) hits:1 maxmatches:20 message:
filter-use: (tel=*) hits:1 maxmatches:-1 message:presence index type is disabled
  for the tel attribute
filter-use: (objectClass=groupOfURLs) hits:1 maxmatches:0 message:
filter-use: (objectClass=groupOfEntries) hits:1 maxmatches:0 message:
filter-use: (objectClass=person) hits:1 maxmatches:20 message:
filter-use: (objectClass=ds-virtual-static-group) hits:1 maxmatches:0 message:
filter-use: (aci=*) hits:1 maxmatches:0 message:
filter-use: (objectClass=groupOfNames) hits:1 maxmatches:0 message:
filter-use: (objectClass=groupOfUniqueNames) hits:1 maxmatches:0 message:
filter-use: (objectClass=ldapSubentry) hits:1 maxmatches:0 message:
filter-use: (objectClass=subentry) hits:1 maxmatches:0 message:

hits represents the number of time this filter was used. the maxmatches represents the maximum number of entries that were returned for that filter.

Index analysis and tuning is not a simple task, and I recommend to play with these tools  a lot on a test environment to understand how to get the best out of them. But, as you can see, OpenDJ provides you with all the tools you need to get the best performances out of your LDAP directory.