Full-text search index
Full-text indexes are powered by the Apache Lucene indexing and search library, and can be used to index nodes and relationships by string properties.
A full-text index allows you to write queries that match within the contents of indexed string properties.
For instance, the range and text indexes described in previous sections can only perform limited matching on strings; exact, prefix, substring, or suffix matches.
A full-text index will instead tokenize the indexed string values, so it can match terms anywhere within the strings.
How the indexed strings are tokenized and broken into terms, is determined by what analyzer the full-text index is configured with.
For instance, the swedish analyzer knows how to tokenize and stem Swedish words, and will avoid indexing Swedish stop words.
The complete list of stop words for each analyzer is included in the result of the db.index.fulltext.listAvailableAnalyzers
procedure.
Full-text indexes:
-
support the indexing of both nodes and relationships.
-
support configuring custom analyzers, including analyzers that are not included with Lucene itself.
-
can be queried using the Lucene query language.
-
can return the score for each result from a query.
-
are kept up to date automatically, as nodes and relationships are added, removed, and modified.
-
will automatically populate newly created indexes with the existing data in a store.
-
can be checked by the consistency checker, and they can be rebuilt if there is a problem with them.
-
are a projection of the store, and can only index nodes and relationships by the contents of their properties.
-
include only property values of types
STRING
orLIST<STRING>
. -
can support any number of documents in a single index.
-
are created, dropped, and updated transactionally, and is automatically replicated throughout a cluster.
-
can be accessed via Cypher® procedures.
-
can be configured to be eventually consistent, in which index updating is moved from the commit path to a background thread. Using this feature, it is possible to work around the slow Lucene writes from the performance critical commit process, thus removing the main bottlenecks for Neo4j write performance.
At first sight, the construction of full-text indexes can seem similar to regular indexes. However there are some things that are interesting to note: In contrast to other indexes, a full-text index can be:
-
applied to more than one label.
-
applied to more than one relationship type.
-
applied to more than one property at a time (similar to a composite index) but with an important difference: While a composite index applies only to entities that match the indexed label and all of the indexed properties, full-text index will index entities that have at least one of the indexed labels or relationship types, and at least one of the indexed properties.
For information on how to configure full-text indexes, refer to Operations Manual → Indexes to support full-text search.
Full-text search procedures
Full-text indexes are managed through commands and used through built-in procedures, see Operations Manual → Procedures for a complete reference.
The commands and procedures for full-text indexes are listed in the table below:
Usage | Procedure/Command | Description |
---|---|---|
Create full-text node index. |
|
Create a node full-text index for the given labels and properties. |
Create full-text relationship index. |
|
Create a relationship full-text index for the given relationship types and properties. |
List available analyzers. |
List the available analyzers that the full-text indexes can be configured with. |
|
Use full-text node index. |
Query the given full-text index. Returns the matching nodes and their Lucene query score, ordered by score. |
|
Use full-text relationship index. |
Query the given full-text index. Returns the matching relationships and their Lucene query score, ordered by score. |
|
Drop full-text index. |
|
Drop the specified index. |
Eventually consistent indexes. |
Wait for the updates from recently committed transactions to be applied to any eventually-consistent full-text indexes. |
|
Listing all full-text indexes. |
|
Lists all full-text indexes, see the |
Create and configure full-text indexes
Full-text indexes are created with the CREATE FULLTEXT INDEX
command.
An index can be given a unique name when created (or get a generated one), which is used to reference the specific index when querying or dropping it.
A full-text index applies to a list of labels or a list of relationship types, for node and relationship indexes respectively, and then a list of property names.
More details about the syntax descriptions can be found here. |
Command | Description |
---|---|
|
Create a full-text index on nodes. |
|
Create a full-text index on relationships. |
It is considered best practice to give the index a name when it is created. This name is needed for both dropping and querying the index. If the index is not explicitly named, it will get an auto-generated name.
The index name must be unique among all indexes and constraints. |
Index provider and configuration settings can be specified using the OPTIONS
clause.
Supported settings are fulltext.analyzer
, for specifying what analyzer to use when indexing and querying.
Use the db.index.fulltext.listAvailableAnalyzers
procedure to see what options are available.
To make this index eventually consistent, fulltext.eventually_consistent
should be set to true
.
This will ensure that updates from committing transactions are applied in a background thread.
The command is optionally idempotent. This means that its default behavior is to throw an error if an attempt is made to create the same index twice.
With IF NOT EXISTS
, no error is thrown and nothing happens should an index with the same name, schema or both already exist.
It may still throw an error should a constraint with the same name exist.
For instance, if we have a movie with a title.
CREATE (m:Movie {title: "The Matrix"}) RETURN m.title
m.title |
---|
|
And we have a full-text index on the title
and description
properties of movies and books.
CREATE FULLTEXT INDEX titlesAndDescriptions FOR (n:Movie|Book) ON EACH [n.title, n.description]
Added 1 index.
Then our movie node from above will be included in the index, even though it only has one of the indexed labels, and only one of the indexed properties:
CALL db.index.fulltext.queryNodes("titlesAndDescriptions", "matrix") YIELD node, score
RETURN node.title, node.description, score
node.title | node.description | score |
---|---|---|
|
|
|
Rows: 1 |
The same is true for full-text indexes on relationships. Though a relationship can only have one type, a relationship full-text index can index multiple types, and all relationships will be included that match one of the relationship types, and at least one of the indexed properties.
The CREATE FULLTEXT INDEX
command take an optional clause, called options
. This have two parts, the indexProvider
and indexConfig
.
The provider can only have the default value, 'fulltext-1.0'
.
The indexConfig
is a map from string to string and booleans, and can be used to set index-specific configuration settings.
The fulltext.analyzer
setting can be used to configure an index-specific analyzer.
The possible values for the fulltext.analyzer
setting can be listed with the db.index.fulltext.listAvailableAnalyzers
procedure.
The fulltext.eventually_consistent
setting, if set to true
, will put the index in an eventually consistent update mode.
This means that updates will be applied in a background thread "as soon as possible", instead of during transaction commit like other indexes.
CREATE FULLTEXT INDEX taggedByRelationshipIndex FOR ()-[r:TAGGED_AS]-() ON EACH [r.taggedByUser]
OPTIONS {
indexConfig: {
`fulltext.analyzer`: 'url_or_email',
`fulltext.eventually_consistent`: true
}
}
In this example, an eventually consistent relationship full-text index is created for the TAGGED_AS
relationship type, and the taggedByUser
property, and the index uses the url_or_email
analyzer.
This could, for instance, be a system where people are assigning tags to documents, and where the index on the taggedByUser
property will allow them to quickly find all of the documents they have tagged.
Had it not been for the relationship index, one would have had to add artificial connective nodes between the tags and the documents in the data model, just so these nodes could be indexed.
Added 1 index.
Query full-text indexes
Full-text indexes will, in addition to any exact matches, also return approximate matches to a given query.
Both the property values that are indexed, and the queries to the index, are processed through the analyzer such that the index can find that don’t exactly matches.
The score
that is returned alongside each result entry, represents how well the index thinks that entry matches the given query.
The results are always returned in descending score order, where the best matching result entry is put first.
To illustrate, in the example below, we search our movie database for "Full Metal Jacket"
, and even though there is an exact match as the first result, we also get three other less interesting results:
CALL db.index.fulltext.queryNodes("titlesAndDescriptions", "Full Metal Jacket") YIELD node, score
RETURN node.title, score
node.title | score |
---|---|
|
|
|
|
|
|
|
|
Rows: 4 |
Full-text indexes are powered by the Apache Lucene indexing and search library. This means that we can use Lucene’s full-text query language to express what we wish to search for. For instance, if we are only interested in exact matches, then we can quote the string we are searching for.
CALL db.index.fulltext.queryNodes("titlesAndDescriptions", '"Full Metal Jacket"') YIELD node, score
RETURN node.title, score
When we put "Full Metal Jacket" in quotes, Lucene only gives us exact matches.
node.title | score |
---|---|
|
|
Rows: 1 |
Lucene also allows us to use logical operators, such as AND
and OR
, to search for terms.
CALL db.index.fulltext.queryNodes("titlesAndDescriptions", 'full AND metal') YIELD node, score
RETURN node.title, score
Only the Full Metal Jacket
movie in our database has both the words full
and metal
.
node.title | score |
---|---|
|
|
Rows: 1 |
It is also possible to search for only specific properties, by putting the property name and a colon in front of the text being searched for.
CALL db.index.fulltext.queryNodes("titlesAndDescriptions", 'description:"surreal adventure"') YIELD node, score
RETURN node.title, node.description, score
node.title | node.description | score |
---|---|---|
|
|
|
Rows: 1 |
A complete description of the Lucene query syntax can be found in the Lucene documentation.
Handling of Text Array properties
If the indexed property contains a text array, each element of this array is analyzed independently and all produced terms are associated with the same property name. This means that when querying such an indexed node or relationship, there is a match if any of the array elements match the query. For scoring purposes, the full-text index treats it as a single-property value, and the score will represent how close the query is to matching the entire array.
For example, both of the following queries match the same node while referring different elements:
CALL db.index.fulltext.queryNodes('reviews', 'best') YIELD node, score
RETURN
node.title AS title,
node.reviews AS reviews,
score
title | reviews | score |
---|---|---|
|
|
|
Rows: 1 |
CALL db.index.fulltext.queryNodes("reviews", 'nonsense') YIELD node, score
RETURN
node.title AS title,
node.reviews AS reviews,
score
title | reviews | score |
---|---|---|
|
|
|
Rows: 1 |
Drop full-text indexes
A full-text node index is dropped by using the same command as for other indexes, DROP INDEX
.
In the following example, we will drop the taggedByRelationshipIndex
that we created previously:
DROP INDEX taggedByRelationshipIndex
Removed 1 index.
Was this page helpful?