SearchBlox supports custom Elasticsearch analyzers which have been extended from standard analyzers in Elasticsearch.
The analyzers determine how a string is converted to tokens to improve their searchability or recall.
They are also used to split the terms for the filters used in SearchBlox.
The character that splits the term is called **separator**.
Click here to learn about using [Custom Fields in Search](🔗).
# Mapping Files for Collections
Mapping files, like mapping.json will be generated seperately for each collections. These analyzers are mapped in the JSON files available in `<SEARCHBLOX_INSTALLATION_PATH>/webapps/ROOT/WEB-INF/mappings/collections/
`
If fields are to be analyzed, then they have to be mapped to the relevant analyzer in the JSON file in the following format:
Analyzers supported in SearchBlox are given below.
# sb_analyzer
sb_analyzer considers space, comma, hyphen operators as the separators to tokenize the content indexed. This analyzer strips off most special characters from the content while indexing. sb_analyzer is the default analyzer for most string fields used in searches such as title, description, and content. This is the most common analyzer used for custom fields in order to filter them. The following JSON code needs to be specified in the JSON file for the specific field and the analyzer has to be mentioned as **sb_analyzer** in the "analyzer" field.
For example, if the meta field has the following data
`<meta name="test" content="world ,news, breaking news, tv radio, part-time />
`,
on filtering the field using sb_analyzer, the filter would have the following terms
world
news
breaking
news
tv
radio
part
time
# sb_analyzer_special
sb_analyzer_special is similar to sb_analyzer except the special characters are not stripped off the content while indexing. This analyzer is to make the special characters appear in context while search. The following JSON code needs to be specified in the JSON file for the specific field and the analyzer has to be mentioned as **sb_analyzer_special** in the "analyzer" field.
For example, if the meta field has the following data
`<meta name="test" content="world ,news, breaking news, tv radio, part-time />
`,
on filtering the field using sb_analyzer, the filter would have the following terms
world
news
breaking
news
tv
radio
part-time
# comma_analyzer
comma_analyzer considers comma character as a separator or tokenizer in the content indexed. Currently, comma_analyzer is used for keywords field. The following JSON code needs to be specified in the JSON file for the specific field and the analyzer has to be mentioned as **comma_analyzer** in the "analyzer" field.
For example, if the meta field has the following data
`<meta name="test" content="world ,news, breaking news, tv radio, part-time" />
`,
on filtering the field using comma_analyzer, the filter would have the following terms
world
news
breaking news
tv radio
part-time
# pipe_analyzer
pipe_analyzer is a custom analyzer developed which uses pipe operator as a separator, this analyzer can be used if both comma, as well as space, are not to be used as separators. This analyzer is a custom one that is not used by default in SearchBlox. The following JSON code needs to be specified in the JSON file for the specific field and the analyzer has to be mentioned as **pipe_analyzer **in the "analyzer" field.
For example, if the meta field has the following data
`<meta name="test" content="world news| breaking news| tv radio, part-time" />
`,
on filtering the field using comma_analyzer, the filter would have the following terms
world news
breaking news
tv radio, part-time
# whitespace
whitespace analyzer uses space character as a separator or tokenizer in the content indexed. The following JSON code needs to be specified in the JSON file for the specific field and the analyzer has to be mentioned as **whitespace** in the "analyzer" field.
For example, if the meta field is have the following data
`<meta name="test" content="world news breaking news tv radio part-time" />
`,
on filtering the field using whitespace, the filter would have the following terms
world
news
breaking
news
tv
radio
part-time
# sb_analyzer_alphanumeric
sb_analyzer_alphanumeric is mostly similar to sb_analyzer except the following special characters are stripped off the content while indexing. Most special characters are also used as separators. Please find the list of characters stripped off when using this analyzer:
The following JSON code needs to be specified in the JSON file for the specific field and the analyzer has to be mentioned as **sb_analyzer_alphanumeric** in the "analyzer" field.
For example, if the meta field has the following data
`<meta name="sbaspl" content="cat_ pat.bat+vat!(mat)rat{sat}fat@chat$dat"/>
`,
on filtering the field using sb_analyzer_alphanumeric , the filter would have the following terms
cat
pat
bat
vat
mat
rat
sat
fatchatdat
# category_analyzer
category_analyzer is similar to comma_analyzer except that its resulting values remain case sensitive.
The following JSON code needs to be specified in the JSON file for the specific field and the analyzer has to be mentioned as category_analyzer in the "analyzer" field.
For example, if the meta field has the following data:
`<meta name="test" content="World, news, Breaking News, TV, Part-time" />,
`
on filtering the field using category_analyzer, the filter would have the following terms:
World
news
Breaking News
TV
Part-time