Keep types token filteredit
Keeps or removes tokens of a specific type. For example, you can use this filter
to change 3 quick foxes
to quick foxes
by keeping only <ALPHANUM>
(alphanumeric) tokens.
Token types
Token types are set by the tokenizer when converting characters to tokens. Token types can vary between tokenizers.
For example, the standard
tokenizer can
produce a variety of token types, including <ALPHANUM>
, <HANGUL>
, and
<NUM>
. Simpler analyzers, like the
lowercase
tokenizer, only produce the word
token type.
Certain token filters can also add token types. For example, the
synonym
filter can add the <SYNONYM>
token
type.
Some tokenizers don’t support this token filter, for example keyword, simple_pattern, and simple_pattern_split tokenizers, as they don’t support setting the token type attribute.
This filter uses Lucene’s TypeTokenFilter.
Include exampleedit
The following analyze API request uses the keep_types
filter to keep only <NUM>
(numeric) tokens from 1 quick fox 2 lazy dogs
.
GET _analyze { "tokenizer": "standard", "filter": [ { "type": "keep_types", "types": [ "<NUM>" ] } ], "text": "1 quick fox 2 lazy dogs" }
The filter produces the following tokens:
[ 1, 2 ]
Exclude exampleedit
The following analyze API request uses the keep_types
filter to remove <NUM>
tokens from 1 quick fox 2 lazy dogs
. Note the mode
parameter is set to exclude
.
GET _analyze { "tokenizer": "standard", "filter": [ { "type": "keep_types", "types": [ "<NUM>" ], "mode": "exclude" } ], "text": "1 quick fox 2 lazy dogs" }
The filter produces the following tokens:
[ quick, fox, lazy, dogs ]
Configurable parametersedit
-
types
- (Required, array of strings) List of token types to keep or remove.
-
mode
-
(Optional, string) Indicates whether to keep or remove the specified token types. Valid values are:
-
include
- (Default) Keep only the specified token types.
-
exclude
- Remove the specified token types.
-
Customize and add to an analyzeredit
To customize the keep_types
filter, duplicate it to create the basis
for a new custom token filter. You can modify the filter using its configurable
parameters.
For example, the following create index API request
uses a custom keep_types
filter to configure a new
custom analyzer. The custom keep_types
filter
keeps only <ALPHANUM>
(alphanumeric) tokens.
PUT keep_types_example { "settings": { "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "standard", "filter": [ "extract_alpha" ] } }, "filter": { "extract_alpha": { "type": "keep_types", "types": [ "<ALPHANUM>" ] } } } } }