Working with Indexes

Learn how to use different indexes efficiently by going through the ArangoDB Performance Course.

Index Identifiers and Handles

An index handle uniquely identifies an index in the database. It is a string and consists of the collection name and an index identifier separated by a /. The index identifier part is a numeric value that is auto-generated by ArangoDB.

A specific index of a collection can be accessed using its index handle or index identifier as follows:

db.collection.index("<index-handle>");
db.collection.index("<index-identifier>");
db._index("<index-handle>");

For example: Assume that the index handle, which is stored in the _id attribute of the index, is demo/362549736 and the index was created in a collection named demo. Then this index can be accessed as:

db.demo.index("demo/362549736");

Because the index handle is unique within the database, you can leave out the collection and use the shortcut:

db._index("demo/362549736");

An index may also be looked up by its name. Since names are only unique within a collection, rather than within the database, the lookup must also include the collection name when calling db._index():

db._index("demo/primary")
db.demo.index("primary")

Collection Methods

Listing all indexes of a collection

collection.getIndexes(withStats, withHidden)

collection.indexes(withStats, withHidden)

Returns an array of all indexes defined for the collection.

You can set the following parameters:

  • withStats (boolean, optional): whether to include index figures and estimates in the result. Default: false
  • withHidden (boolean, optional): whether to include hidden indexes in the result. Default: false

The indexes() method is an alias for getIndexes().

Note that _key implicitly has an index assigned to it.

Examples

Get the index definitions for a collection:

arangosh> db.test.ensureIndex({ type: "persistent", fields: [
........> "attribute", "secondAttribute.subAttribute"] });
arangosh> db.test.getIndexes();
Show execution results
Hide execution results
{ 
  "cacheEnabled" : false, 
  "deduplicate" : true, 
  "estimates" : true, 
  "fields" : [ 
    "attribute", 
    "secondAttribute.subAttribute" 
  ], 
  "id" : "test/73924", 
  "isNewlyCreated" : true, 
  "name" : "idx_1762249899624103936", 
  "selectivityEstimate" : 1, 
  "sparse" : false, 
  "type" : "persistent", 
  "unique" : false, 
  "code" : 201 
}
[ 
  { 
    "fields" : [ 
      "_key" 
    ], 
    "id" : "test/0", 
    "name" : "primary", 
    "selectivityEstimate" : 1, 
    "sparse" : false, 
    "type" : "primary", 
    "unique" : true 
  }, 
  { 
    "cacheEnabled" : false, 
    "deduplicate" : true, 
    "estimates" : true, 
    "fields" : [ 
      "attribute" 
    ], 
    "id" : "test/73916", 
    "name" : "idx_1762249899622006784", 
    "selectivityEstimate" : 1, 
    "sparse" : false, 
    "type" : "persistent", 
    "unique" : true 
  }, 
  { 
    "cacheEnabled" : false, 
    "deduplicate" : true, 
    "estimates" : true, 
    "fields" : [ 
      "uniqueAttribute" 
    ], 
    "id" : "test/73920", 
    "name" : "idx_1762249899623055360", 
    "selectivityEstimate" : 1, 
    "sparse" : false, 
    "type" : "persistent", 
    "unique" : true 
  }, 
  { 
    "cacheEnabled" : false, 
    "deduplicate" : true, 
    "estimates" : true, 
    "fields" : [ 
      "attribute", 
      "secondAttribute.subAttribute" 
    ], 
    "id" : "test/73924", 
    "name" : "idx_1762249899624103936", 
    "selectivityEstimate" : 1, 
    "sparse" : false, 
    "type" : "persistent", 
    "unique" : false 
  } 
]

Get the index definitions for a collection, including figures and hidden indexes:

arangosh> db.coll.indexes(true, true);
Show execution results
Hide execution results
[ 
  { 
    "fields" : [ 
      "_key" 
    ], 
    "figures" : { 
      "memory" : 68, 
      "cacheInUse" : false, 
      "cacheSize" : 0, 
      "cacheUsage" : 0 
    }, 
    "id" : "coll/0", 
    "name" : "primary", 
    "selectivityEstimate" : 1, 
    "sparse" : false, 
    "type" : "primary", 
    "unique" : true 
  }, 
  { 
    "analyzer" : "identity", 
    "cleanupIntervalStep" : 2, 
    "commitIntervalMsec" : 1000, 
    "consolidationIntervalMsec" : 1000, 
    "consolidationPolicy" : { 
      "type" : "tier", 
      "segmentsBytesFloor" : 2097152, 
      "segmentsBytesMax" : 5368709120, 
      "segmentsMax" : 10, 
      "segmentsMin" : 1, 
      "minScore" : 0 
    }, 
    "features" : [ 
      "frequency", 
      "norm" 
    ], 
    "fields" : [ 
      { 
        "name" : "attr" 
      } 
    ], 
    "figures" : { 
      "numDocs" : 2, 
      "numLiveDocs" : 2, 
      "numSegments" : 1, 
      "numFiles" : 6, 
      "indexSize" : 727 
    }, 
    "id" : "coll/73940", 
    "includeAllFields" : false, 
    "name" : "inv-idx", 
    "primarySort" : { 
      "fields" : [ ], 
      "compression" : "lz4" 
    }, 
    "searchField" : false, 
    "sparse" : true, 
    "storedValues" : [ ], 
    "trackListPositions" : false, 
    "type" : "inverted", 
    "unique" : false, 
    "version" : 1, 
    "writebufferActive" : 0, 
    "writebufferIdle" : 64, 
    "writebufferSizeMax" : 33554432 
  }, 
  { 
    "analyzers" : [ 
      "identity" 
    ], 
    "fields" : { 
    }, 
    "figures" : { 
      "numDocs" : 2, 
      "numLiveDocs" : 2, 
      "numSegments" : 1, 
      "numFiles" : 6, 
      "indexSize" : 1239 
    }, 
    "id" : "coll/73944", 
    "includeAllFields" : true, 
    "storeValues" : "none", 
    "trackListPositions" : false, 
    "type" : "arangosearch", 
    "view" : "hD3E7CF1CCC5F/73942" 
  } 
]

Creating an index

Ensures that an index exists:

collection.ensureIndex(index-description)

Ensures that an index according to the index-description exists. A new index will be created if none exists with the given description.

Calling this method returns an index object. Whether or not the index object existed before the call is indicated in the isNewlyCreated return attribute.

The index-description input value must contain at least a type attribute. Other attributes may be necessary, depending on the index type.

  • type: can be one of the following values:
    • "persistent": persistent (array) index, including vertex-centric index
    • "inverted": inverted index
    • "ttl": time-to-live index
    • "fulltext": full-text index (deprecated from ArangoDB 3.10 onwards)
    • "geo": geo-spatial index, with one or two attributes
    • "zkd": multi-dimensional index (experimental)
  • fields: an array of attribute paths, containing the document attributes (or sub-attributes) to be indexed. Some indexes allow using only a single path, and others allow multiple. If multiple attributes are used, their order matters.

    The . character denotes sub-attributes in attribute paths. Attributes with literal . in their name cannot be indexed. Attributes with the name _id cannot be indexed either, neither as a top-level attribute nor as a sub-attribute.

    If an attribute path contains an [*] extension (e.g. friends[*].id), it means that the index attribute value is treated as an array and all array members are indexed separately. This is possible with persistent and inverted indexes.

  • storedValues: in indexes of type persistent and inverted, additional attributes can be stored in the index. These additional attributes cannot be used for index lookups or for sorting, but they can be used for projections. This allows an index to fully cover more queries and avoid extra document lookups. Non-existing attributes are stored as null values inside storedValues. The maximum number of attributes in storedValues is 32. It is not possible to create multiple indexes with the same fields attributes and uniqueness but different storedValues attributes. That means the value of storedValues is not considered in index creation calls when checking if an index is already present or needs to be created. In unique indexes, only the attributes in fields are checked for uniqueness, but the attributes in storedValues are not checked for their uniqueness.

  • name: can be a string. Index names are subject to the same character restrictions as collection names. If omitted, a name will be auto-generated so that it is unique with respect to the collection, e.g. idx_832910498.

    The purpose of user-defined index names is have easy-to-remember names to use in index hints in AQL queries. If no index hints are used, going with the auto-generated index names is fine.

  • sparse: can be true or false. You can control the sparsity for persistent indexes. The inverted, fulltext, and geo index types are sparse by definition.

  • unique: can be true or false and is supported by persistent indexes. By default, all user-defined indexes are non-unique. Only the attributes in fields are checked for uniqueness. Any attributes in from storedValues are not checked for their uniqueness.

  • deduplicate: can be true or false and is supported by array indexes of type persistent. It controls whether inserting duplicate index values from the same document into a unique array index will lead to a unique constraint error or not. The default value is true, so only a single instance of each non-unique index value will be inserted into the index per document. Trying to insert a value into the index that already exists in the index will always fail, regardless of the value of this attribute.

  • estimates: can be true or false and is supported by indexes of type persistent. This attribute controls whether index selectivity estimates are maintained for the index. Not maintaining index selectivity estimates can have a slightly positive impact on write performance. The downside of turning off index selectivity estimates will be that the query optimizer will not be able to determine the usefulness of different competing indexes in AQL queries when there are multiple candidate indexes to choose from. The estimates attribute is optional and defaults to true if not set. It will have no effect on indexes other than persistent (with hash and skiplist being mere aliases for the persistent index type nowadays).

  • cacheEnabled: can be true or false and is supported by indexes of type persistent. The attribute controls whether an extra in-memory hash cache is created for the index. The hash cache can be used to speed up index lookups. The cache can only be used for queries that look up all index attributes via an equality lookup (==). The hash cache cannot be used for range scans, partial lookups or sorting. The cache will be populated lazily upon reading data from the index. Writing data into the collection or updating existing data will invalidate entries in the cache. The cache may have a negative effect on performance in case index values are updated more often than they are read. The maximum size of cache entries that can be stored is currently 4 MB, i.e. the cumulated size of all index entries for any index lookup value must be less than 4 MB. This limitation is there to avoid storing the index entries of “super nodes” in the cache. cacheEnabled defaults to false and should only be used for indexes that are known to benefit from an extra layer of caching.

Also check the documentation of the specific index type for additional options.

Examples

arangosh> db.test.ensureIndex({ type: "persistent", fields: [ "a" ], sparse: true });
arangosh> db.test.ensureIndex({ type: "persistent", fields: [ "a", "b" ], unique: true });
Show execution results
Hide execution results
{ 
  "cacheEnabled" : false, 
  "deduplicate" : true, 
  "estimates" : true, 
  "fields" : [ 
    "a" 
  ], 
  "id" : "test/73882", 
  "isNewlyCreated" : true, 
  "name" : "idx_1762249899524489216", 
  "selectivityEstimate" : 1, 
  "sparse" : true, 
  "type" : "persistent", 
  "unique" : false, 
  "code" : 201 
}
{ 
  "cacheEnabled" : false, 
  "deduplicate" : true, 
  "estimates" : true, 
  "fields" : [ 
    "a", 
    "b" 
  ], 
  "id" : "test/73886", 
  "isNewlyCreated" : true, 
  "name" : "idx_1762249899525537792", 
  "selectivityEstimate" : 1, 
  "sparse" : false, 
  "type" : "persistent", 
  "unique" : true, 
  "code" : 201 
}

Dropping an index via a collection handle

Drops an index:

collection.dropIndex(index)

Drops the index. If the index does not exist, then false is returned. If the index existed and was dropped, then true is returned. Note that you cannot drop some special indexes (e.g. the primary index of a collection or the edge index of an edge collection).


collection.dropIndex(index-handle)

Same as above. Instead of an index an index handle can be given.

arangosh> db.example.ensureIndex({ type: "persistent", fields: ["a", "b"] });
arangosh> var indexInfo = db.example.getIndexes();
arangosh> indexInfo;
arangosh> db.example.dropIndex(indexInfo[0])
arangosh> db.example.dropIndex(indexInfo[1].id)
arangosh> indexInfo = db.example.getIndexes();
Show execution results
Hide execution results
{ 
  "cacheEnabled" : false, 
  "deduplicate" : true, 
  "estimates" : true, 
  "fields" : [ 
    "a", 
    "b" 
  ], 
  "id" : "example/73755", 
  "isNewlyCreated" : true, 
  "name" : "idx_1762249899431165953", 
  "selectivityEstimate" : 1, 
  "sparse" : false, 
  "type" : "persistent", 
  "unique" : false, 
  "code" : 201 
}
[ 
  { 
    "fields" : [ 
      "_key" 
    ], 
    "id" : "example/0", 
    "name" : "primary", 
    "selectivityEstimate" : 1, 
    "sparse" : false, 
    "type" : "primary", 
    "unique" : true 
  }, 
  { 
    "cacheEnabled" : false, 
    "deduplicate" : true, 
    "estimates" : true, 
    "fields" : [ 
      "a", 
      "b" 
    ], 
    "id" : "example/73755", 
    "name" : "idx_1762249899431165953", 
    "selectivityEstimate" : 1, 
    "sparse" : false, 
    "type" : "persistent", 
    "unique" : false 
  } 
]
false
true
[ 
  { 
    "fields" : [ 
      "_key" 
    ], 
    "id" : "example/0", 
    "name" : "primary", 
    "selectivityEstimate" : 1, 
    "sparse" : false, 
    "type" : "primary", 
    "unique" : true 
  } 
]

Load Indexes into Memory

Loads suitable indexes of this collection into memory:

collection.loadIndexesIntoMemory()

This function tries to cache index entries of this collection in main memory. Index lookups that can be served from the memory cache can be much faster than lookups not stored in the cache, so you can get a nice performance boost.

The function iterates over suitable indexes of the collection and stores the indexed values, not the entire document data, in memory. This is implemented for edge indexes only.

The function returns as soon as the index warmup has been scheduled. The index warmup may still be ongoing in the background, even after the function has already returned. As all suitable indexes are scanned, it may cause significant I/O activity and background load.

This function honors memory limits. If the indexes you want to load are smaller than your memory limit, this function guarantees that most index values are cached. If the index is larger than your memory limit, this function fills up values up to this limit and there is no way to control which indexes of the collection should have priority over others.

It is guaranteed at all times that the in-memory cache data is consistent with the stored index data.

arangosh> db.example.loadIndexesIntoMemory();
Show execution results
Hide execution results
{ 
  "result" : true 
}

Database Methods

Fetching an index by handle

Finds an index:

db._index(index-handle)

Returns the index with index-handle or null if no such index exists.

arangosh> db.example.ensureIndex({ type: "persistent", fields: [ "a", "b" ] });
arangosh> var indexInfo = db.example.getIndexes().map(function(x) { return x.id; });
arangosh> indexInfo;
arangosh> db._index(indexInfo[0])
arangosh> db._index(indexInfo[1])
Show execution results
Hide execution results
{ 
  "cacheEnabled" : false, 
  "deduplicate" : true, 
  "estimates" : true, 
  "fields" : [ 
    "a", 
    "b" 
  ], 
  "id" : "example/68535", 
  "isNewlyCreated" : true, 
  "name" : "idx_1762249843473907712", 
  "selectivityEstimate" : 1, 
  "sparse" : false, 
  "type" : "persistent", 
  "unique" : false, 
  "code" : 201 
}
[ 
  "example/0", 
  "example/68535" 
]
{ 
  "fields" : [ 
    "_key" 
  ], 
  "id" : "example/0", 
  "name" : "primary", 
  "sparse" : false, 
  "type" : "primary", 
  "unique" : true, 
  "code" : 200 
}
{ 
  "cacheEnabled" : false, 
  "deduplicate" : true, 
  "estimates" : true, 
  "fields" : [ 
    "a", 
    "b" 
  ], 
  "id" : "example/68535", 
  "name" : "idx_1762249843473907712", 
  "sparse" : false, 
  "type" : "persistent", 
  "unique" : false, 
  "code" : 200 
}

Dropping an index via a database handle

Drops an index:

db._dropIndex(index)

Drops the index. If the index does not exist, then false is returned. If the index existed and was dropped, then true is returned.


db._dropIndex(index-handle)

Drops the index with index-handle.

arangosh> db.example.ensureIndex({ type: "persistent", fields: [ "a", "b" ] });
arangosh> var indexInfo = db.example.getIndexes();
arangosh> indexInfo;
arangosh> db._dropIndex(indexInfo[0])
arangosh> db._dropIndex(indexInfo[1].id)
arangosh> indexInfo = db.example.getIndexes();
Show execution results
Hide execution results
{ 
  "cacheEnabled" : false, 
  "deduplicate" : true, 
  "estimates" : true, 
  "fields" : [ 
    "a", 
    "b" 
  ], 
  "id" : "example/74552", 
  "isNewlyCreated" : true, 
  "name" : "idx_1762249899907219457", 
  "selectivityEstimate" : 1, 
  "sparse" : false, 
  "type" : "persistent", 
  "unique" : false, 
  "code" : 201 
}
[ 
  { 
    "fields" : [ 
      "_key" 
    ], 
    "id" : "example/0", 
    "name" : "primary", 
    "selectivityEstimate" : 1, 
    "sparse" : false, 
    "type" : "primary", 
    "unique" : true 
  }, 
  { 
    "cacheEnabled" : false, 
    "deduplicate" : true, 
    "estimates" : true, 
    "fields" : [ 
      "a", 
      "b" 
    ], 
    "id" : "example/74552", 
    "name" : "idx_1762249899907219457", 
    "selectivityEstimate" : 1, 
    "sparse" : false, 
    "type" : "persistent", 
    "unique" : false 
  } 
]
false
true
[ 
  { 
    "fields" : [ 
      "_key" 
    ], 
    "id" : "example/0", 
    "name" : "primary", 
    "selectivityEstimate" : 1, 
    "sparse" : false, 
    "type" : "primary", 
    "unique" : true 
  } 
]

Revalidating whether an index is used

So you’ve created an index, and since its maintenance isn’t for free, you definitely want to know whether your query can utilize it.

You can use explain to verify that a certain index is used:

arangosh> var explain = require("@arangodb/aql/explainer").explain;
arangosh> db.example.ensureIndex({ type: "persistent", fields: [ "a", "b" ] });
arangosh> explain("FOR doc IN example FILTER doc.a < 23 RETURN doc", {colors: false});
Show execution results
Hide execution results
{ 
  "cacheEnabled" : false, 
  "deduplicate" : true, 
  "estimates" : true, 
  "fields" : [ 
    "a", 
    "b" 
  ], 
  "id" : "example/68549", 
  "isNewlyCreated" : true, 
  "name" : "idx_1762249843480199169", 
  "selectivityEstimate" : 1, 
  "sparse" : false, 
  "type" : "persistent", 
  "unique" : false, 
  "code" : 201 
}
Query String (47 chars, cacheable: true):
 FOR doc IN example FILTER doc.a < 23 RETURN doc

Execution plan:
 Id   NodeType        Est.   Comment
  1   SingletonNode      1   * ROOT
  6   IndexNode          0     - FOR doc IN example   /* persistent index scan, index scan + document lookup */    
  5   ReturnNode         0       - RETURN doc

Indexes used:
 By   Name                      Type         Collection   Unique   Sparse   Cache   Selectivity   Fields         Stored values   Ranges
  6   idx_1762249843480199169   persistent   example      false    false    false      100.00 %   [ `a`, `b` ]   [  ]            (doc.`a` < 23)

Optimization rules applied:
 Id   RuleName
  1   use-indexes
  2   remove-filter-covered-by-index
  3   remove-unnecessary-calculations-2

43 rule(s) executed, 1 plan(s) created, peak mem [b]: 0, exec time [s]: 0.00018

(If you omit colors: false you will get nice colors in ArangoShell.)