Search Data Store
Overview
Search Data Store provides full text search for Elide.
Requirements
This store leverages Hibernate Search which requires Hibernate 6+.
Usage
SearchDataStore
wraps another fully featured store and supports full text search on fields that are indexed using
Hibernate Search. If the query cannot be answered by the SearchDataStore
, it delegates the query to the underlying
(wrapped) data store.
Annotating Entity
Use Hibernate Search annotations to describe how your entities are indexed and stored in Lucene or Elasticsearch. Some
of the annotations (like AnalyzerDef
) can be defined once at the package level if desired.
@Entity
@Include
@Indexed
@Data // Lombok
public class Item {
@Id
private long id;
@FullTextField(
name = "name",
searchable = Searchable.YES,
projectable = Projectable.NO,
analyzer = "case_insensitive"
)
@KeywordField(name = "sortName", sortable = Sortable.YES, projectable = Projectable.NO, searchable = Searchable.YES)
private String name;
@FullTextField(searchable = Searchable.YES, projectable = Projectable.NO, analyzer = "case_insensitive")
private String description;
@GenericField(searchable = Searchable.YES, projectable = Projectable.NO, sortable = Sortable.YES)
private Date modifiedDate;
private BigDecimal price;
}
(Optional) Defining a Custom Analyzer
The Item
entity above references a non-standard analyzer - case_insensitive
. This analyzer needs to be
programmatically created:
public class MyLuceneAnalysisConfigurer implements LuceneAnalysisConfigurer {
@Override
public void configure(LuceneAnalysisConfigurationContext ctx) {
ctx.analyzer("case_insensitive")
.custom()
.tokenizer(NGramTokenizerFactory.class)
.param("minGramSize", "3")
.param("maxGramSize", "50")
.tokenFilter(LowerCaseFilterFactory.class);
}
}
and then configured by setting the property hibernate.search.backend.analysis.configurer
to the new analyzer.
<persistence xmlns="http://java.sun.com/xml/ns/persistence"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/persistence http://java.sun.com/xml/ns/persistence/persistence_2_0.xsd"
version="2.0">
<persistence-unit name="searchDataStoreTest">
<class>com.paiondata.elide.datastores.search.models.Item</class>
<properties>
<property name="hibernate.search.backend.analysis.configurer" value="class:com.paiondata.elide.datastores.search.MyLuceneAnalysisConfigurer"/>
<property name="hibernate.search.backend.directory.type" value="local-heap"/>
...
</properties>
</persistence-unit>
</persistence>
Wrapping DataStore
/* Create your JPA data store */
DataStore store = ...
/* Wrap it with a SearchDataStore */
EntityManagerFactory emf = ...
boolean indexOnStartup = true; //Create a fresh index when the server starts
searchStore = new SearchDataStore(store, emf, indexOnStartup);
/* Configure Elide with your store */
ElideSettings = new ElideSettingsBuidler(searchStore).build();
Indexing Data
We can index data either by:
- When the
SearchDataStore
is initialized, indicate (by settingindexOnStartup
totrue
) that the search store should build a complete index. - Issuing created, updated, and delete requests against our Elide service.
- Using an out of band process using Hibernate Search APIs.
Caveats
Data Type Support
Only text fields (String) are supported/tested. Other data types (dates, numbers, etc) have not been tested. Embedded index support has not been implemented.
Filter Operators
Only INFIX, and PREFIX filter operators (and their case insensitive equivalents) are supported. Note that hibernate search only indexes and analyzes fields as either case sensitive or not case-sensitive - so a given field will only support the INFIX/PREFIX filter operator that matches how the field was indexed.
All other filter operators are passed to the underlying wrapped JPA store.
Analyzer Assumptions
Index Analysis
To implement correct behavior for Elide's INFIX and PREFIX operators, the search store assumes an ngram (non-edge) tokenizer is used. This allows white spaces and punctuation to be included in the index.
If the client provides a filter predicate with a term which is smaller or larger than the min/max ngram sizes respectively, it will not be found in the index.
The search store can be configured to return a 400 error to the client in those scenarios by passing the minimum and
maximum ngram size to the constructor of the SearchDataStore
. The sizes are global and apply to all Elide entities
managed by the store instance:
new SearchDataStore(jpaStore, emf, true, 3, 50);
Search Term Analysis
Elide creates a Hibernate Search SimpleQueryString
for each predicate. It first escapes white space and punctuation
in any user provided input (to match Elide's default behavior when not using the SearchDataStore
). The resulting
single token is used to construct a prefix query.
Sorting and Pagination
When using the INFIX operator, sorting and pagination are pushed to down Lucene/ElasticSearch. When using the PREFIX operator, they are performed in-memory in the Elide service.
Elide constructs a Prefix query, which together with an ngram index fully implements the INFIX operator. However, the ngram analyzer adds ngrams to the index that do not start on word boundaries. For the prefix operator, the search store first performs the lucene filter and then filters again in-memory to return the correct set of matching terms.
In this instance, because filtering is performed partially in memory, Elide also sorts and paginates in memory as well.