More about OpenDJ support for JSON attribute values

In a previous post, I introduced the new JSON syntax, JSON query and matching rules that are delivered as part of the OpenDJ LDAP directory server. Today, I will give more insights on how to customise the syntax, tune the matching rules for smarter and more efficient indexing, and I will highlight some best practices with using the JSON syntax.

JSON Syntax Validation

When defining an attribute with a JSON syntax, the server will validate that the JSON value is compliant with JSON RFC.  OpenDJ offers a few options to relax some of the constraints of a valid JSON. To change the settings of the syntax, you must use dsconfig --advanced.

>>>> Configure the properties of the Core Schema

Property Value(s)
 ----------------------------------------------------------------------
 1) allow-attribute-types-with-no-sup-or-syntax true
 2) allow-zero-length-values-directory-string false
 3) disabled-matching-rule NONE
 4) disabled-syntax NONE
 5) enabled true
 6) java-class org.opends.server.schema.CoreSchemaProvider
 7) json-validation-policy strict
 8) strict-format-certificates true
 9) strict-format-country-string true
 10) strict-format-jpeg-photos false
 11) strict-format-telephone-numbers false
 12) strip-syntax-min-upper-bound-attribute-type-description false

?) help
 f) finish - apply any changes to the Core Schema
 c) cancel
 q) quit

Enter choice [f]: 7


>>>> Configuring the "json-validation-policy" property

Specifies the policy that will be used when validating JSON syntax values.

Do you want to modify the "json-validation-policy" property?

1) Keep the default value: strict
 2) Change it to the value: disabled
 3) Change it to the value: lenient

?) help
 q) quit

Enter choice [1]:

Strict is the default mode.

Disabled means that the server will not try to validate the content of a JSON value.

Lenient means that it will validate the JSON value, but tolerate comments, single quotes and unquoted control characters.

JSON Matching Rule and Indexing

Like any attribute in the OpenDJ server, attributes with a JSON syntax can be indexed.

$ dsconfig -h localhost -p 4444 \
  -D "cn=Directory Manager" -w secret12 -X -n \
 set-backend-index-prop \--backend-name userRoot \
 --index-name json --set index-type:equality

By default, the server actually indexes each field of all JSON values. If the values are large and complex, indexing will  result in many disk I/O, possibly impacting performances for write operations.

If you know which fields of the JSON values will be queried for by the client applications, you can optimise the index and specify the JSON fields that are indexed. This is by creating a new custom schema provider for the JSON query. You can choose to overwrite the default JSON query matching rules (as illustrated below), and this will affect all JSON attributes, or you can choose to create a new rule (with a new name and OID).

In the example below, the custom schema provider overwrites the default caseIgnoreJsonQueryMatch, and only indexes the JSON fields _id and name with its subfields.

$ dsconfig -h localhost -p 4444 \
  -D "cn=Directory Manager" -w secret12 -X -n \
 create-schema-provider --provider-name "Json Schema" \
 --type json-schema --set enabled:true \
 --set case-sensitive-strings:false \
 --set ignore-white-space:true \
 --set matching-rule-name:caseIgnoreJsonQueryMatch \
 --set matching-rule-oid:1.3.6.1.4.1.36733.2.1.4.1 \
 --set indexed-field:_id \
 --set "indexed-field:name/**" 

When you overwrite the default matching rule, or you define a new one, you need to rebuild the indexes for all attributes that are making use of it.

Best Practices

The support for JSON attributes in OpenDJ is very new, but yet, we can recommend how to best use them.

The first thing, is to use the JSON syntax for attributes that are single valued. Indexing is designed to associate values with entries. Because JSON query indexes are built for all fields of the JSON objects, an entry will be returned if a query matches all fields, even though they are in different objects.

The JSON syntax is handy to store complex JSON objects in a single attribute and query them, through any field. However, the larger the values, the  more impact on the directory server’s performances. As, by default, all JSON fields are indexed, the more fields, the more expensive will be indexing. Also, because the JSON objects are LDAP attributes, the only way to change a value is to replace the value with a new one (or delete the value and add a new one, which are operations with even more bytes). There is no patch operation on the value. Finally, OpenDJ stores all attributes of an entry in a single database record. So any change in the entry itself will require to write the whole entry again.

As we’ve seen above, OpenDJ proposes a way to customise the JSON queries and the JSON fields that are indexed. We suggest that you make use of this capability and optimise the indexing of JSON objects for the queries run by the client applications.

If you plan to store different kinds of JSON objects in an OpenDJ directory service, define different attributes with the JSON syntax, and use a custom JSON query per attribute. For example, lets assume you will have entries that are persons with an address attribute with a JSON syntax, and some other entries that represent OAuth2 tokens, and the token main attribute has a JSON syntax. You should define an address attribute and a token attribute, both with the JSON syntax, but their specific matching rules, like below.

attributeTypes: ( 1.3.6.1.4.1.36733.2.1.1.999 NAME 'address'
  SYNTAX 1.3.6.1.4.1.36733.2.1.3.1
  EQUALITY caseIgnoreJsonAddressQueryMatch SINGLE-VALUE )

attributeTypes: ( 1.3.6.1.4.1.36733.2.1.1.999 NAME 'token'
  SYNTAX 1.3.6.1.4.1.36733.2.1.3.1 
  EQUALITY caseIgnoreJsonTokenQueryMatch SINGLE-VALUE )

where the matching rules are defined as such:

$ dsconfig -h localhost -p 4444 \
  -D "cn=Directory Manager" -w secret12 -X -n \
 create-schema-provider \
 --provider-name "Address Json Schema" \
 --type json-schema --set enabled:true \
 --set case-sensitive-strings:false \
 --set ignore-white-space:true \
 --set matching-rule-name:caseIgnoreJsonAddressQueryMatch \
 --set matching-rule-oid:1.3.6.1.4.1.36733.2.1.4.998

and

$ dsconfig -h localhost -p 4444 \
  -D "cn=Directory Manager" -w secret12 -X -n \
 create-schema-provider \
 --provider-name "Token Json Schema" \
 --type json-schema --set enabled:true \
 --set case-sensitive-strings:false \
 --set ignore-white-space:true \
 --set matching-rule-name:caseIgnoreJsonTokenQueryMatch \
 --set matching-rule-oid:1.3.6.1.4.1.36733.2.1.4.999 \
 --set indexed-field:token_type \
 --set indexed-field:expires_at \
 --set indexed-field:access_token

Note that there is an issue with OpenDJ 4.0.0-SNAPSHOTS (nightly builds) and when you define a new Schema Provider, you need to restart the server to have it be effective.

Storing JSON objects in LDAP attributes…

jsonUntil recently, the only way to store a JSON object to an LDAP directory server, was to store it as string (either a Directory String i.e a sequence of UTF-8 characters, or an Octet String i.e. a blob of octets).

But now, in OpenDJ, the Open source LDAP Directory services in Java, there is now support for new syntaxes : one for JSON objects and one for JSON Query. Associated with the JSON query, a couple of matching rules, that can be easily customised and extended, have been defined.

To use the syntax and matching rules, you should first extend the LDAP schema with one or more new attributes, and use these attributes in object classes. For example :

dn: cn=schema
objectClass: top
objectClass: ldapSubentry
objectClass: subschema
attributeTypes: ( 1.3.6.1.4.1.36733.2.1.1.999 NAME 'json'
SYNTAX 1.3.6.1.4.1.36733.2.1.3.1 EQUALITY caseIgnoreJsonQueryMatch SINGLE-VALUE )
objectClasses: (1.3.6.1.4.1.36733.2.1.2.999 NAME 'jsonObject'
SUP top MUST (cn $ json ) )

Just copy the LDIF above into config/schema/95-json.ldif, and restart the OpenDJ server. Make sure you use your own OIDs when defining schema elements. The ones above are samples and should not be used in production.

Then, you can add entries in the OpenDJ directory server like this:

$ ldapmodify -a -D cn=directory\ manager -w secret12 -h localhost -p 1389

dn: cn=bjensen,ou=people,dc=example,dc=com
objectClass: top
objectClass: jsonObject
cn: bjensen
json: { "_id":"bjensen", "_rev":"123", "name": { "first": "Babs", "surname": "Jensen" }, "age": 25, "roles": [ "sales", "admin" ] }

dn: cn=scarter,ou=people,dc=example,dc=com
objectClass: top
objectClass: jsonObject
cn: scarter
json: { "_id":"scarter", "_rev":"456", "name": { "first": "Sam", "surname": "Carter" }, "age": 48, "roles": [ "manager", "eng" ] }

The very nice thing about the JSON syntax and matching rules, is that OpenDJ understands how the values of the json attribute are structured, and it becomes possible to make specific queries, using the JSON Query syntax.

Let’s search for all jsonObjects that have a json value with a specific _id :

$ ldapsearch -D cn=directory\ manager -w secret12 -h localhost -p 1389 -b "dc=example,dc=com" -s sub "(json=_id eq 'scarter')"

dn: cn=scarter,ou=people,dc=example,dc=com
objectClass: top
objectClass: jsonObject
json: { "_id":"scarter", "_rev":"456", "name": { "first": "Sam", "surname": "Carter" }, "age": 48, "roles": [ "manager", "eng" ] }
cn: scarter

We can run more complex queries, still using the JSON Query Syntax:

$ ldapsearch -D cn=directory\ manager -w secret12 -h localhost -p 1389 -b "dc=example,dc=com" -s sub "(json=name/first sw 'b' and age lt 30)"

dn: cn=bjensen,ou=people,dc=example,dc=com
objectClass: top
objectClass: jsonObject
json: { "_id":"bjensen", "_rev":"123", "name": { "first": "Babs", "surname": "Jensen" }, "age": 25, "roles": [ "sales", "admin" ] }
cn: bjensen

For a complete description of the query  filter expressions, please refer to ForgeRock Common  REST (CREST) Query Filter documentation.

The JSON matching rule supports indexing which can be enabled using dsconfig against the appropriate attribute index. By default all JSON fields of the attribute are indexed.

In a followup post, I will give more advanced configuration of the JSON Syntax, detail how to customise the matching rule to index only specific JSON fields, and will outline some best practices with the JSON syntax and attributes.