Custom Analyzers
- SearchBlox supports custom OpenSearch analyzers, which are extended from standard OpenSearch analyzers.
- Analyzers determine how a string is broken into tokens to improve search and recall.
- They are also used to split terms for filters in SearchBlox.
- The character that splits a term is called a separator.
- Click here to learn about using Custom Fields in Search.
Mapping Files for Collections
- Mapping files such as mapping.json are created separately for each collection. These analyzers are listed in the JSON files located at <SEARCHBLOX_INSTALLATION_PATH>/webapps/ROOT/WEB-INF/mappings/collections/.
- If you want a field to be analyzed, map it to the appropriate analyzer in the JSON file using the following format:
{
"type": "text",
"store": true,
"fielddata": true,
"analyzer": "comma_analyzer"
},
Analyzers supported in SearchBlox are given below.
sb_analyzer
- **sb_analyzer uses spaces, commas, and hyphens as separators to break content into tokens. It also removes most special characters during indexing.
- sb_analyzer is the default analyzer for common string fields such as title, description, and content, and is often used for custom fields to enable filtering.
- To use it for a specific field, specify sb_analyzer in the "analyzer" field in the JSON mapping file.
"test": {
"type": "text",
"store": true,
"analyzer": "sb_analyzer"
"fielddata": true
},
-
For example, if a meta field contains:
<meta name="test" content="world ,news, breaking news, tv radio, part-time /> -
Using sb_analyzer to filter this field, the terms generated would be:
- world
- news
- breaking
- news
- tv
- radio
- part
- time
sb_analyzer_special
sb_analyzer_special works like sb_analyzer, but it keeps special characters in the content during indexing. This allows special characters to appear in the search context.
To use it for a field, specify sb_analyzer_special in the "analyzer" field in the JSON mapping file.
"test": {
"type": "text",
"store": true,
"analyzer": "sb_analyzer_special",
"fielddata": true
},
For example, if a meta field contains:
<meta name="test" content="world ,news, breaking news, tv radio, part-time />,
Using sb_analyzer_special to filter this field, the terms generated would be:
- world
- news
- breaking
- news
- tv
- radio
- part-time
comma_analyzer
comma_analyzer uses the comma character to split content into tokens. It is commonly used for the keywords field.
To use it for a field, specify comma_analyzer in the "analyzer" field in the JSON mapping file.
"test": {
"type": "text",
"store": true,
"analyzer": "comma_analyzer",
"fielddata": true
},
For example, if a meta field contains:
<meta name="test" content="world ,news, breaking news, tv radio, part-time" />,
Using comma_analyzer to filter this field, the terms generated would be:
- world
- news
- breaking news
- tv radio
- part-time
pipe_analyzer
pipe_analyzer is a custom analyzer that uses the pipe (|) character as a separator. It is useful when you do not want to use comma or space as separators. This analyzer is not used by default in SearchBlox.
To use it for a field, specify pipe_analyzer in the "analyzer" field in the JSON mapping file.
"keywords": {
"type": "text",
"store": true,
"analyzer": "pipe_analyzer",
"fielddata": true
},
For example, if a meta field contains:
<meta name="test" content="world news| breaking news| tv radio, part-time" />,
Using pipe_analyzer to filter this field, the terms generated would be:
- world news
- breaking news
- tv radio, part-time
whitespace
whitespace analyzer uses the space character to split content into tokens. To use it for a field, specify whitespace in the "analyzer" field in the JSON mapping file.
"test": {
"type": "text",
"store": true,
"analyzer": "whitespace",
"fielddata": true
},
For example, if a meta field contains:
<meta name="test" content="world news breaking news tv radio part-time" />,
Using whitespace analyzer to filter this field, the terms generated would be:
- world
- news
- breaking
- news
- tv
- radio
- part-time
sb_analyzer_alphanumeric
sb_analyzer_alphanumeric is similar to sb_analyzer, but certain special characters are removed during indexing. Most special characters are also used as separators.
Characters stripped off and separator
_ . + ! # ^ & * ( ) { } > < : ; ' " ~ , - \ / []
Characters stripped off but not separator
@ $ % ?
To use it for a field, specify sb_analyzer_alphanumeric in the "analyzer" field in the JSON mapping file.
"description": {
"type": "text",
"store": true,
"analyzer": "sb_analyzer_alphanumeric"
},
For example, if a meta field contains:
<meta name="sbaspl" content="cat_ pat.bat+vat!(mat)rat{sat}fat@chat$dat"/>,
Using sb_analyzer_alphanumeric to filter this field, the terms generated would be:
- cat
- pat
- bat
- vat
- mat
- rat
- sat
- fatchatdat
category_analyzer
Category_analyzer works like comma_analyzer, but it keeps the original case of the values (case sensitive).
To use it for a field, specify category_analyzer in the "analyzer" field in the JSON mapping file.
"test": {
"type": "text",
"store": true,
"analyzer": "category_analyzer",
"fielddata": true
},
For example, if a meta field contains:
<meta name="test" content="World, news, Breaking News, TV, Part-time" />,
Using category_analyzer to filter this field, the terms generated would be:
- World
- news
- Breaking News
- TV
- Part-time
Updated 23 days ago
