More about OpenDJ support for JSON attribute values

In a previous post, I introduced the new JSON syntax, JSON query and matching rules that are delivered as part of the OpenDJ LDAP directory server. Today, I will give more insights on how to customise the syntax, tune the matching rules for smarter and more efficient indexing, and I will highlight some best practices with using the JSON syntax.

JSON Syntax Validation

When defining an attribute with a JSON syntax, the server will validate that the JSON value is compliant with JSON RFC.  OpenDJ offers a few options to relax some of the constraints of a valid JSON. To change the settings of the syntax, you must use dsconfig --advanced.

>>>> Configure the properties of the Core Schema

Property Value(s)
 ----------------------------------------------------------------------
 1) allow-attribute-types-with-no-sup-or-syntax true
 2) allow-zero-length-values-directory-string false
 3) disabled-matching-rule NONE
 4) disabled-syntax NONE
 5) enabled true
 6) java-class org.opends.server.schema.CoreSchemaProvider
 7) json-validation-policy strict
 8) strict-format-certificates true
 9) strict-format-country-string true
 10) strict-format-jpeg-photos false
 11) strict-format-telephone-numbers false
 12) strip-syntax-min-upper-bound-attribute-type-description false

?) help
 f) finish - apply any changes to the Core Schema
 c) cancel
 q) quit

Enter choice [f]: 7


>>>> Configuring the "json-validation-policy" property

Specifies the policy that will be used when validating JSON syntax values.

Do you want to modify the "json-validation-policy" property?

1) Keep the default value: strict
 2) Change it to the value: disabled
 3) Change it to the value: lenient

?) help
 q) quit

Enter choice [1]:

Strict is the default mode.

Disabled means that the server will not try to validate the content of a JSON value.

Lenient means that it will validate the JSON value, but tolerate comments, single quotes and unquoted control characters.

JSON Matching Rule and Indexing

Like any attribute in the OpenDJ server, attributes with a JSON syntax can be indexed.

$ dsconfig -h localhost -p 4444 \
  -D "cn=Directory Manager" -w secret12 -X -n \
 set-backend-index-prop \--backend-name userRoot \
 --index-name json --set index-type:equality

By default, the server actually indexes each field of all JSON values. If the values are large and complex, indexing will  result in many disk I/O, possibly impacting performances for write operations.

If you know which fields of the JSON values will be queried for by the client applications, you can optimise the index and specify the JSON fields that are indexed. This is by creating a new custom schema provider for the JSON query. You can choose to overwrite the default JSON query matching rules (as illustrated below), and this will affect all JSON attributes, or you can choose to create a new rule (with a new name and OID).

In the example below, the custom schema provider overwrites the default caseIgnoreJsonQueryMatch, and only indexes the JSON fields _id and name with its subfields.

$ dsconfig -h localhost -p 4444 \
  -D "cn=Directory Manager" -w secret12 -X -n \
 create-schema-provider --provider-name "Json Schema" \
 --type json-schema --set enabled:true \
 --set case-sensitive-strings:false \
 --set ignore-white-space:true \
 --set matching-rule-name:caseIgnoreJsonQueryMatch \
 --set matching-rule-oid:1.3.6.1.4.1.36733.2.1.4.1 \
 --set indexed-field:_id \
 --set "indexed-field:name/**" 

When you overwrite the default matching rule, or you define a new one, you need to rebuild the indexes for all attributes that are making use of it.

Best Practices

The support for JSON attributes in OpenDJ is very new, but yet, we can recommend how to best use them.

The first thing, is to use the JSON syntax for attributes that are single valued. Indexing is designed to associate values with entries. Because JSON query indexes are built for all fields of the JSON objects, an entry will be returned if a query matches all fields, even though they are in different objects.

The JSON syntax is handy to store complex JSON objects in a single attribute and query them, through any field. However, the larger the values, the  more impact on the directory server’s performances. As, by default, all JSON fields are indexed, the more fields, the more expensive will be indexing. Also, because the JSON objects are LDAP attributes, the only way to change a value is to replace the value with a new one (or delete the value and add a new one, which are operations with even more bytes). There is no patch operation on the value. Finally, OpenDJ stores all attributes of an entry in a single database record. So any change in the entry itself will require to write the whole entry again.

As we’ve seen above, OpenDJ proposes a way to customise the JSON queries and the JSON fields that are indexed. We suggest that you make use of this capability and optimise the indexing of JSON objects for the queries run by the client applications.

If you plan to store different kinds of JSON objects in an OpenDJ directory service, define different attributes with the JSON syntax, and use a custom JSON query per attribute. For example, lets assume you will have entries that are persons with an address attribute with a JSON syntax, and some other entries that represent OAuth2 tokens, and the token main attribute has a JSON syntax. You should define an address attribute and a token attribute, both with the JSON syntax, but their specific matching rules, like below.

attributeTypes: ( 1.3.6.1.4.1.36733.2.1.1.999 NAME 'address'
  SYNTAX 1.3.6.1.4.1.36733.2.1.3.1
  EQUALITY caseIgnoreJsonAddressQueryMatch SINGLE-VALUE )

attributeTypes: ( 1.3.6.1.4.1.36733.2.1.1.999 NAME 'token'
  SYNTAX 1.3.6.1.4.1.36733.2.1.3.1 
  EQUALITY caseIgnoreJsonTokenQueryMatch SINGLE-VALUE )

where the matching rules are defined as such:

$ dsconfig -h localhost -p 4444 \
  -D "cn=Directory Manager" -w secret12 -X -n \
 create-schema-provider \
 --provider-name "Address Json Schema" \
 --type json-schema --set enabled:true \
 --set case-sensitive-strings:false \
 --set ignore-white-space:true \
 --set matching-rule-name:caseIgnoreJsonAddressQueryMatch \
 --set matching-rule-oid:1.3.6.1.4.1.36733.2.1.4.998

and

$ dsconfig -h localhost -p 4444 \
  -D "cn=Directory Manager" -w secret12 -X -n \
 create-schema-provider \
 --provider-name "Token Json Schema" \
 --type json-schema --set enabled:true \
 --set case-sensitive-strings:false \
 --set ignore-white-space:true \
 --set matching-rule-name:caseIgnoreJsonTokenQueryMatch \
 --set matching-rule-oid:1.3.6.1.4.1.36733.2.1.4.999 \
 --set indexed-field:token_type \
 --set indexed-field:expires_at \
 --set indexed-field:access_token

Note that there is an issue with OpenDJ 4.0.0-SNAPSHOTS (nightly builds) and when you define a new Schema Provider, you need to restart the server to have it be effective.

OpenDJ: Monitoring Unindexed Searches…

FR_plogo_org_FC_openDJ-300x86OpenDJ, the open source LDAP directory services, makes use of indexes to optimise search queries. When a search query doesn’t match any index, the server will cursor through the whole database to return the entries, if any, that match the search filter. These unindexed queries can require a lot of resources : I/Os, CPU… In order to reduce the resource consumption, OpenDJ rejects unindexed queries by default, except for the Root DNs (i.e. for cn=Directory Manager).

In previous articles, I’ve talked about privileges for administratives accounts, and also about Analyzing Search Filters and Indexes.

Today, I’m going to show you how to monitor for unindexed searches by keeping a dedicated log file, using the traditional access logger and filtering criteria.

First, we’re going to create a new access logger, named “Searches” that will write its messages under “logs/search”.

dsconfig -D cn=directory\ manager -w secret12 -h localhost -p 4444 -n -X \
    create-log-publisher \
    --set enabled:true \
    --set log-file:logs/search \
    --set filtering-policy:inclusive \
    --set log-format:combined \
    --type file-based-access \
    --publisher-name Searches

Then we’re defining a Filtering Criteria, that will restrict what is being logged in that file: Let’s log only “search” operations, that are marked as “unindexed” and take more than “5000” milliseconds.

dsconfig -D cn=directory\ manager -w secret12 -h localhost -p 4444 -n -X \
    create-access-log-filtering-criteria \
    --publisher-name Searches \
    --set log-record-type:search \
    --set search-response-is-indexed:false \
    --set response-etime-greater-than:5000 \
    --type generic \
    --criteria-name Expensive\ Searches

Voila! Now, whenever a search request is unindexed and take more than 5 seconds, the server will log the request to logs/search (in a single line) as below :

$ tail logs/search
[12/Sep/2016:14:25:31 +0200] SEARCH conn=10 op=1 msgID=2 base="dc=example,
dc=com" scope=sub filter="(objectclass=*)" attrs="+,*" result=0 nentries=
10003 unindexed etime=6542

This file can be monitored and used to trigger alerts to administrators, or simply used to collect and analyse the filters that result into unindexed requests, in order to better tune the OpenDJ indexes.

Note that sometimes, it is a good option to leave some requests unindexed (the cost of indexing them outweighs the benefits of the index). If these requests are unfrequent, run by specific administrators for reporting reasons, and if the results are expecting to contain a lot of entries. If so, a best practice is to have a dedicated replica for administration and run these expensive requests. Also, it is better if the client applications are tuned to expect these requests to take a long time.

Learning Curve

A few years ago I had the pleasure to work with Rajesh Rajasekharan at Sun. He was an efficient trainer on Sun products and especially on Sun Directory Server. He recently joined ForgeRock and has started a series of blog posts and screen-casts on ForgeRock products and especially OpenDJ, but not only !

If you are getting started with the products or want to see demos of them, there’s no better place than to be on the “Learning Curve

 

About auditing LDAP operations…

OpenDJ LogoMany years ago, when I’ve started working on LDAP directory services, we needed to have some auditing of the operations occurring on the server. So, the server had a “Access” log which contained a message when an operation was received, and one when it was returned to the client, which included the processing time on the server side (the etime parameter). On Netscape and Sun directory servers, the etime was measured in seconds. This format allowed us to detect requests that were taking a long time, or were started but not finished.

In OpenDJ, we switched the etime resolution to milliseconds, but there’s an option to set it to nano-seconds. Yet, with millisecond resolution, there are still a number of log entries with an etime value of 0. The truth is that the server is faster, but so are the machines and processors.

At a rate of 50 000 operations per seconds (which can easily be sustained on my laptop), having two messages per operation does generate a lot of data to write to disk. That’s why we have introduced a new audit log format, not well advertised I must say, in OpenDJ 2.6.0. To enable the new format, use the following dsconfig command:

dsconfig set-log-publisher-prop -h localhost -p 4444 -X -n \
 -D "cn=directory manager" -w password \
 --publisher-name File-Based\ Access\ Logger  --set log-format:combined

And now instead of having 2 lines per operations, there is a single one.

Before:

[23/Feb/2015:08:56:31 +0100] SEARCH REQ conn=0 op=4 msgID=5 base="cn=File-Based Access Logger,cn=Loggers,cn=config" scope=baseObject filter="(objectClass=*)" attrs="1.1"
[23/Feb/2015:08:56:31 +0100] SEARCH RES conn=0 op=4 msgID=5 result=0 nentries=1 etime=0
[23/Feb/2015:08:56:31 +0100] SEARCH REQ conn=0 op=5 msgID=6 base="cn=File-Based Access Logger,cn=Loggers,cn=config" scope=baseObject filter="(objectClass=*)" attrs="objectclass"
[23/Feb/2015:08:56:31 +0100] SEARCH RES conn=0 op=5 msgID=6 result=0 nentries=1 etime=0

After, in combined mode:

[23/Feb/2015:13:00:28 +0100] SEARCH conn=48 op=8215 msgID=8216 base="dc=example,dc=com" scope=wholeSubtree filter="(uid=user.1)" attrs="ALL" result=0 nentries=1 etime=0
[23/Feb/2015:13:00:28 +0100] SEARCH conn=60 op=10096 msgID=10097 base="dc=example,dc=com" scope=wholeSubtree filter="(uid=user.6)" attrs="ALL" result=0 nentries=1 etime=0

The benefits of enabling the combined log format are multiple. Less data is written to disk for each operation, less I/O operations are involved, resulting in overall better throughput for the server. And it allows to keep more history of operations with the same volume of log files.

Do you think that OpenDJ 3.0 access log files should use the combined format by default ?

API Protection with OpenIG: Controlling access by methods

OpenIGUsually, one of the first thing you want to do when securing APIs is to only allow specifics calls to them. For example, you want to make sure that you can only read to specific URLs, or can call PUT but not POST to other ones.
OpenIG, the Open Identity Gateway, has a everything you need to do this by default using a DispatchHandler, in which you express the methods that you want to allow as a condition.
The configuration for the coming OpenIG 3.1 version, would look like this:

 {
     "name": "MethodFilterHandler",
     "type": "DispatchHandler",
     "config": {
         "bindings": [
         {
             "handler": "ClientHandler",
             "condition": "${exchange.request.method == 'GET' or exchange.request.method == 'HEAD'}",
             "baseURI": "http://www.example.com:8089"
         },
         {
             "handler": {
                 "type": "StaticResponseHandler",
                 "config": {
                     "status": 405,
                     "reason": "Method is not allowed",
                     "headers": {
                         "Allow": [ "GET", "HEAD" ]
                     }
                 }
             }
         }]
     }
 }

This is pretty straightforward, but if you want to allow another method, you need to update the both the condition and the rejection headers. And when you have multiple APIs with different methods that you want to allow or deny, you need to repeat this block of configuration or make a much complex condition expression.

But there is a simpler way, leveraging the scripting capabilities of OpenIG.
Create a file under your .openig/scripts/groovy named MethodFilter.groovy with the following content:

/**
 * The contents of this file are subject to the terms of the Common Development and
 * Distribution License 1.0 (the License). You may not use this file except in compliance with the
 * License.
 * Copyright 2014 ForgeRock AS.
 * Author: Ludovic Poitou
 */
import org.forgerock.openig.http.Response

/*
 * Filters requests that have the allowedmethods supplied using a
 * configuration like the following:
 *
 * {
 *     "name": "MethodFilter",
 *     "type": "ScriptableFilter",
 *     "config": {
 *         "type": "application/x-groovy",
 *         "file": "MethodFilter.groovy",
 *         "args": {
 *             "allowedmethods": [ "GET", "HEAD" ]
 *         }
 *     }
 * }
 */

if (allowedmethods.contains(exchange.request.method)) {
    // Call the next handler. This returns when the request has been handled.
    next.handle(exchange)
} else {
    exchange.response = new Response()
    exchange.response.status = 405
    exchange.response.reason = "Method not allowed: (" + exchange.request.method +")"
    exchange.response.headers.addAll("Allow", allowedmethods)
}

And now in all the places where you need to filter specific methods for an API, just add a filter to the Chain as below:

{
    "heap": [
        {
            "name": "MethodFilterHandler",
            "type": "Chain",
            "config": {
                "filters": [
                    {
                        "type": "ScriptableFilter",
                        "config": {
                            "type": "application/x-groovy",
                            "file": "MethodFilter.groovy",
                            "args": {
                                "allowedmethods": [ "GET", "HEAD" ]
                            }
                        }
                    }
                ],
                "handler": "ClientHandler"
            }
        }
    ],
    "handler": "MethodFilterHandler",
    "baseURI": "http://www.example.com:8089"
}

This solution allows to filter different methods for different APIs with a simple configuration element, the “allowedmethods” field, for greater reusability.

About LDAP Syntaxes and backward compatibility…

In the LDAP information model, a syntax constrains the structure and format of attribute values. OpenDJ defines and implements a large number of syntaxes (you can discover them by reading the ldapSyntaxes attribute from the cn=Schema entry).

But infrequently, we receive enquiries on an obscure and non standard syntax, often in the form of “I’m having an error importing schema from this or that legacy directory server”, with an error message that ends with “No such syntax is configured for use in the Directory Server”.

As syntaxes are constraining the structure and format of attribute values, they are implemented as code, specifically Java code in OpenDJ. It’s possible to implement new syntaxes by implementing the org.opends.server.api.AttributeSyntax abstract class, and installing the classes or the JAR in OpenDJ classpath. But often, it’s easier and more convenient to define a syntax by configuration, and OpenDJ offers 3 possibilities to define new syntaxes. In term of backward compatibility, I will only focus on the 2 main ones, by substitution and by pattern (the 3rd one allows to define enumeration of values).

With OpenDJ, you can define a new syntax by configuration and delegating the contraints to an already implemented syntax. A simple example is the URI syntax (which was defined is some very old schema with the OID  1.3.6.1.4.1.4401.1.1.1). A URI is really an ASCII string, and it might be sufficient to accept attributes with URI syntax to verify that all characters are pure ASCII. The standard syntax for ASCII strings is IA5String aka 1.3.6.1.4.1.1466.115.121.1.15.

ldapSyntaxes: ( 1.3.6.1.4.1.4401.1.1.1 DESC ‘URI’ X-SUBST ‘1.3.6.1.4.1.1466.115.121.1.15’ )

Insert the above line in the schema LDIF file before the attributeTypes, and you’re done.

The other option is to define the syntax as a pattern, using regular expressions. This could be better when willing to enforce additional constraints on an URI, for example, verifying that the URI is an LDAP one.

ldapSyntaxes: ( 999.999.999.1 DESC 'LDAP URI Syntax' X-PATTERN '^ldap://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]' )

So the next time you are trying to import some legacy schema to the OpenDJ directory server, and you have an error due to missing syntaxes, you know what to do to quickly resolve the problem.

Tips: Do not index virtual attributes in OpenDJ

OpenDJ-300x100OpenDJ, the open source LDAP directory service in Java, offer some interesting services to reduce and optimize the size and usage of data.

One of them is the Virtual Attribute feature, which allow certain attributes and values to be computed as needed, either based on some of the server internals or other attributes. OpenDJ ships with a number of virtual attributes by default : entryDN, entryUUID, etag, gouverningStructureRule, hasSubordinate, isMemberOf, numSubordinate, password Expiration Time (ds-pwp-password-expiration-time), structuralObjectClass, subSchemaSubEntry, …

Since these attributes are virtual and thus not stored as part of the entries in the database backend, you must not define any index for them. When possible, the virtual attribute provider will make use of default system index (like entryDN uses the DN index), but most of the time, these attributes are for reading and consuming.

If you do configure an index for one of the virtual attribute, the server will repeatedly report that the index is degraded with an error message similar to the following :

[21/Jan/2013:09:16:07 +0000] category=JEB severity=NOTICE msgID=8847510 msg=Due to changes in the configuration, index dc_example_dc_com_entryDN is currently operating in a degraded state and must be rebuilt before it can be used

And then some seaches may fail to return entries. So you must delete this index to let the server behave properly.