Mapping character filteredit
The mapping
character filter accepts a map of keys and values. Whenever it
encounters a string of characters that is the same as a key, it replaces them
with the value associated with that key.
Matching is greedy; the longest pattern matching at a given point wins. Replacements are allowed to be the empty string.
The mapping
filter uses Lucene’s
MappingCharFilter.
Exampleedit
The following analyze API request uses the mapping
filter
to convert Hindu-Arabic numerals (٠١٢٣٤٥٦٧٨٩) into their Arabic-Latin
equivalents (0123456789), changing the text My license plate is ٢٥٠١٥
to
My license plate is 25015
.
GET /_analyze { "tokenizer": "keyword", "char_filter": [ { "type": "mapping", "mappings": [ "٠ => 0", "١ => 1", "٢ => 2", "٣ => 3", "٤ => 4", "٥ => 5", "٦ => 6", "٧ => 7", "٨ => 8", "٩ => 9" ] } ], "text": "My license plate is ٢٥٠١٥" }
The filter produces the following text:
[ My license plate is 25015 ]
Configurable parametersedit
-
mappings
-
(Required*, array of strings) Array of mappings, with each element having the form
key => value
.Either this or the
mappings_path
parameter must be specified. -
mappings_path
-
(Required*, string) Path to a file containing
key => value
mappings.This path must be absolute or relative to the
config
location, and the file must be UTF-8 encoded. Each mapping in the file must be separated by a line break.Either this or the
mappings
parameter must be specified.
Customize and add to an analyzeredit
To customize the mappings
filter, duplicate it to create the basis for a new
custom character filter. You can modify the filter using its configurable
parameters.
The following create index API request
configures a new custom analyzer using a custom
mappings
filter, my_mappings_char_filter
.
The my_mappings_char_filter
filter replaces the :)
and :(
emoticons
with a text equivalent.
PUT /my-index-000001 { "settings": { "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "standard", "char_filter": [ "my_mappings_char_filter" ] } }, "char_filter": { "my_mappings_char_filter": { "type": "mapping", "mappings": [ ":) => _happy_", ":( => _sad_" ] } } } } }
The following analyze API request uses the custom
my_mappings_char_filter
to replace :(
with _sad_
in
the text I'm delighted about it :(
.
GET /my-index-000001/_analyze { "tokenizer": "keyword", "char_filter": [ "my_mappings_char_filter" ], "text": "I'm delighted about it :(" }
The filter produces the following text:
[ I'm delighted about it _sad_ ]