Stopwords
Improve search efficiency by customizing stop-words in SearchBlox’s language-specific XML files
Stop-words are non-information-bearing words such as "the," "and," "a," "is," and "on" that are excluded during indexing and searching processes. Each language has its own specific stop-word list. Search engines typically ignore these common words to:
- Conserve disk storage space
- Accelerate search performance
- Enhance search relevance
SearchBlox implements language-specific stop-words based on the language selected during collection creation. The stop-word definition files are stored at:
<SEARCHBLOX_INSTALLATION_PATH>\webapps\ROOT\stopwords
Customizing Stop-Words
- To modify the default stop-words for a language (using English as an example):
Navigate to the appropriate language file:
<SEARCHBLOX_INSTALLATION_PATH>\webapps\ROOT\stopwords\English_en.xml
- Edit the XML file to add or remove specific stop-words
- Restart SearchBlox
- Re-index the affected collection to apply changes
These modifications will alter how SearchBlox processes common words in your indexed content, potentially affecting search relevance and performance.
Important Note:
Please note that the collection must be cleared and reindexed for the changes to take effect. When you modify the stopwords file for a specific language, the changes will not be reflected in the search results until the collection is cleared and reindexed.
Updated about 2 months ago