KStem token filteredit
Provides KStem-based stemming for
the English language. The kstem
filter combines
algorithmic stemming with a built-in
dictionary.
The kstem
filter tends to stem less aggressively than other English stemmer
filters, such as the porter_stem
filter.
The kstem
filter is equivalent to the
stemmer
filter’s
light_english
variant.
This filter uses Lucene’s KStemFilter.
Exampleedit
The following analyze API request uses the kstem
filter to stem the foxes
jumping quickly
to the fox jump quick
:
GET /_analyze { "tokenizer": "standard", "filter": [ "kstem" ], "text": "the foxes jumping quickly" }
The filter produces the following tokens:
[ the, fox, jump, quick ]
Add to an analyzeredit
The following create index API request uses the
kstem
filter to configure a new custom
analyzer.
To work properly, the kstem
filter requires lowercase tokens. To ensure tokens
are lowercased, add the lowercase
filter
before the kstem
filter in the analyzer configuration.
PUT /my-index-000001 { "settings": { "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "whitespace", "filter": [ "lowercase", "kstem" ] } } } } }