Full-text search index

Full-text indexes are powered by the Apache Lucene indexing and search library, and can be used to index nodes and relationships by string properties. A full-text index allows you to write queries that match within the contents of indexed string properties. For instance, the range and text indexes described in previous sections can only perform limited matching on strings; exact, prefix, substring, or suffix matches. A full-text index will instead tokenize the indexed string values, so it can match terms anywhere within the strings. How the indexed strings are tokenized and broken into terms, is determined by what analyzer the full-text index is configured with. For instance, the swedish analyzer knows how to tokenize and stem Swedish words, and will avoid indexing Swedish stop words. The complete list of stop words for each analyzer is included in the result of the db.index.fulltext.listAvailableAnalyzers procedure.

Full-text indexes:

  • support the indexing of both nodes and relationships.

  • support configuring custom analyzers, including analyzers that are not included with Lucene itself.

  • can be queried using the Lucene query language.

  • can return the score for each result from a query.

  • are kept up to date automatically, as nodes and relationships are added, removed, and modified.

  • will automatically populate newly created indexes with the existing data in a store.

  • can be checked by the consistency checker, and they can be rebuilt if there is a problem with them.

  • are a projection of the store, and can only index nodes and relationships by the contents of their properties.

  • include only property values of types STRING or LIST<STRING>.

  • can support any number of documents in a single index.

  • are created, dropped, and updated transactionally, and is automatically replicated throughout a cluster.

  • can be accessed via Cypher® procedures.

  • can be configured to be eventually consistent, in which index updating is moved from the commit path to a background thread. Using this feature, it is possible to work around the slow Lucene writes from the performance critical commit process, thus removing the main bottlenecks for Neo4j write performance.

At first sight, the construction of full-text indexes can seem similar to regular indexes. However there are some things that are interesting to note: In contrast to other indexes, a full-text index can be:

  • applied to more than one label.

  • applied to more than one relationship type.

  • applied to more than one property at a time (similar to a composite index) but with an important difference: While a composite index applies only to entities that match the indexed label and all of the indexed properties, full-text index will index entities that have at least one of the indexed labels or relationship types, and at least one of the indexed properties.

For information on how to configure full-text indexes, refer to Operations Manual → Indexes to support full-text search.

Full-text search procedures

Full-text indexes are managed through commands and used through built-in procedures, see Operations Manual → Procedures for a complete reference.

The commands and procedures for full-text indexes are listed in the table below:

Usage Procedure/Command Description

Create full-text node index.

CREATE FULLTEXT INDEX ...

Create a node full-text index for the given labels and properties.

Create full-text relationship index.

CREATE FULLTEXT INDEX ...

Create a relationship full-text index for the given relationship types and properties.

List available analyzers.

db.index.fulltext.listAvailableAnalyzers

List the available analyzers that the full-text indexes can be configured with.

Use full-text node index.

db.index.fulltext.queryNodes

Query the given full-text index. Returns the matching nodes and their Lucene query score, ordered by score.

Use full-text relationship index.

db.index.fulltext.queryRelationships

Query the given full-text index. Returns the matching relationships and their Lucene query score, ordered by score.

Drop full-text index.

DROP INDEX ...

Drop the specified index.

Eventually consistent indexes.

db.index.fulltext.awaitEventuallyConsistentIndexRefresh

Wait for the updates from recently committed transactions to be applied to any eventually-consistent full-text indexes.

Listing all full-text indexes.

SHOW FULLTEXT INDEXES

Lists all full-text indexes, see the SHOW INDEXES command for details.

Create and configure full-text indexes

Full-text indexes are created with the CREATE FULLTEXT INDEX command. An index can be given a unique name when created (or get a generated one), which is used to reference the specific index when querying or dropping it. A full-text index applies to a list of labels or a list of relationship types, for node and relationship indexes respectively, and then a list of property names.

More details about the syntax descriptions can be found here.

Table 1. Syntax for creating full-text indexes
Command Description
CREATE FULLTEXT INDEX [index_name] [IF NOT EXISTS]
FOR (n:LabelName["|" ...])
ON EACH "[" n.propertyName[, ...] "]"
[OPTIONS "{" option: value[, ...] "}"]

Create a full-text index on nodes.

CREATE FULLTEXT INDEX [index_name] [IF NOT EXISTS]
FOR ()-"["r:TYPE_NAME["|" ...]"]"-()
ON EACH "[" r.propertyName[, ...] "]"
[OPTIONS "{" option: value[, ...] "}"]

Create a full-text index on relationships.

It is considered best practice to give the index a name when it is created. This name is needed for both dropping and querying the index. If the index is not explicitly named, it will get an auto-generated name.

The index name must be unique among all indexes and constraints.

Index provider and configuration settings can be specified using the OPTIONS clause. Supported settings are fulltext.analyzer, for specifying what analyzer to use when indexing and querying. Use the db.index.fulltext.listAvailableAnalyzers procedure to see what options are available. To make this index eventually consistent, fulltext.eventually_consistent should be set to true. This will ensure that updates from committing transactions are applied in a background thread.

The command is optionally idempotent. This means that its default behavior is to throw an error if an attempt is made to create the same index twice. With IF NOT EXISTS, no error is thrown and nothing happens should an index with the same name, schema or both already exist. It may still throw an error should a constraint with the same name exist.

Example 1. CREATE FULLTEXT INDEX

For instance, if we have a movie with a title.

Query
CREATE (m:Movie {title: "The Matrix"}) RETURN m.title
Table 2. Result
m.title

"The Matrix"

And we have a full-text index on the title and description properties of movies and books.

Query
CREATE FULLTEXT INDEX titlesAndDescriptions FOR (n:Movie|Book) ON EACH [n.title, n.description]
Result
Added 1 index.

Then our movie node from above will be included in the index, even though it only has one of the indexed labels, and only one of the indexed properties:

Query
CALL db.index.fulltext.queryNodes("titlesAndDescriptions", "matrix") YIELD node, score
RETURN node.title, node.description, score
Table 3. Result
node.title node.description score

"The Matrix"

<null>

0.13076457381248474

Rows: 1

The same is true for full-text indexes on relationships. Though a relationship can only have one type, a relationship full-text index can index multiple types, and all relationships will be included that match one of the relationship types, and at least one of the indexed properties.

The CREATE FULLTEXT INDEX command take an optional clause, called options. This have two parts, the indexProvider and indexConfig. The provider can only have the default value, 'fulltext-1.0'. The indexConfig is a map from string to string and booleans, and can be used to set index-specific configuration settings.

The fulltext.analyzer setting can be used to configure an index-specific analyzer. The possible values for the fulltext.analyzer setting can be listed with the db.index.fulltext.listAvailableAnalyzers procedure.

The fulltext.eventually_consistent setting, if set to true, will put the index in an eventually consistent update mode. This means that updates will be applied in a background thread "as soon as possible", instead of during transaction commit like other indexes.

Example 2. CREATE FULLTEXT INDEX
Query
CREATE FULLTEXT INDEX taggedByRelationshipIndex FOR ()-[r:TAGGED_AS]-() ON EACH [r.taggedByUser]
OPTIONS {
  indexConfig: {
    `fulltext.analyzer`: 'url_or_email',
    `fulltext.eventually_consistent`: true
  }
}

In this example, an eventually consistent relationship full-text index is created for the TAGGED_AS relationship type, and the taggedByUser property, and the index uses the url_or_email analyzer. This could, for instance, be a system where people are assigning tags to documents, and where the index on the taggedByUser property will allow them to quickly find all of the documents they have tagged. Had it not been for the relationship index, one would have had to add artificial connective nodes between the tags and the documents in the data model, just so these nodes could be indexed.

Result
Added 1 index.

Query full-text indexes

Full-text indexes will, in addition to any exact matches, also return approximate matches to a given query. Both the property values that are indexed, and the queries to the index, are processed through the analyzer such that the index can find that don’t exactly matches. The score that is returned alongside each result entry, represents how well the index thinks that entry matches the given query. The results are always returned in descending score order, where the best matching result entry is put first.

Example 3. Query full-text

To illustrate, in the example below, we search our movie database for "Full Metal Jacket", and even though there is an exact match as the first result, we also get three other less interesting results:

Query
CALL db.index.fulltext.queryNodes("titlesAndDescriptions", "Full Metal Jacket") YIELD node, score
RETURN node.title, score
Table 4. Result
node.title score

"Full Metal Jacket"

1.411118507385254

"Full Moon High"

0.44524085521698

"Yellow Jacket"

0.3509605824947357

"The Jacket"

0.3509605824947357

Rows: 4

Full-text indexes are powered by the Apache Lucene indexing and search library. This means that we can use Lucene’s full-text query language to express what we wish to search for. For instance, if we are only interested in exact matches, then we can quote the string we are searching for.

Example 4. Query full-text
Query
CALL db.index.fulltext.queryNodes("titlesAndDescriptions", '"Full Metal Jacket"') YIELD node, score
RETURN node.title, score

When we put "Full Metal Jacket" in quotes, Lucene only gives us exact matches.

Table 5. Result
node.title score

"Full Metal Jacket"

1.411118507385254

Rows: 1

Lucene also allows us to use logical operators, such as AND and OR, to search for terms.

Example 5. Query full-text
Query
CALL db.index.fulltext.queryNodes("titlesAndDescriptions", 'full AND metal') YIELD node, score
RETURN node.title, score

Only the Full Metal Jacket movie in our database has both the words full and metal.

Table 6. Result
node.title score

"Full Metal Jacket"

1.1113792657852173

Rows: 1

It is also possible to search for only specific properties, by putting the property name and a colon in front of the text being searched for.

Example 6. Query full-text
Query
CALL db.index.fulltext.queryNodes("titlesAndDescriptions", 'description:"surreal adventure"') YIELD node, score
RETURN node.title, node.description, score
Table 7. Result
node.title node.description score

"Metallica Through The Never"

"The movie follows the young roadie Trip through his surreal adventure with the band."

0.2615291476249695

Rows: 1

A complete description of the Lucene query syntax can be found in the Lucene documentation.

Handling of Text Array properties

If the indexed property contains a text array, each element of this array is analyzed independently and all produced terms are associated with the same property name. This means that when querying such an indexed node or relationship, there is a match if any of the array elements match the query. For scoring purposes, the full-text index treats it as a single-property value, and the score will represent how close the query is to matching the entire array.

Example 7. Text Array properties

For example, both of the following queries match the same node while referring different elements:

Query
CALL db.index.fulltext.queryNodes('reviews', 'best') YIELD node, score
RETURN
  node.title AS title,
  node.reviews AS reviews,
  score
Table 8. Result
title reviews score

'The Matrix'

['The best movie ever.', 'The movie is nonsense.']

+ 0.13076457381248474

Rows: 1

Query
CALL db.index.fulltext.queryNodes("reviews", 'nonsense') YIELD node, score
RETURN
  node.title AS title,
  node.reviews AS reviews,
  score
Table 9. Result
title reviews score

'The Matrix'

['The best movie ever.', 'The movie is nonsense.']

+ 0.13076457381248474

Rows: 1

Drop full-text indexes

A full-text node index is dropped by using the same command as for other indexes, DROP INDEX.

Example 8. DROP INDEX

In the following example, we will drop the taggedByRelationshipIndex that we created previously:

Query
DROP INDEX taggedByRelationshipIndex
Result
Removed 1 index.