Stopwords

Improve search efficiency by customizing stop-words in SearchBlox’s language-specific XML files

Stop-words are non-information-bearing words such as "the," "and," "a," "is," and "on" that are excluded during indexing and searching processes. Each language has its own specific stop-word list. Search engines typically ignore these common words to:

  • Conserve disk storage space
  • Accelerate search performance
  • Enhance search relevance

SearchBlox implements language-specific stop-words based on the language selected during collection creation. The stop-word definition files are stored at:

<SEARCHBLOX_INSTALLATION_PATH>\webapps\ROOT\stopwords

Customizing Stop-Words

  • To modify the default stop-words for a language (using English as an example):
    Navigate to the appropriate language file:
    <SEARCHBLOX_INSTALLATION_PATH>\webapps\ROOT\stopwords\English_en.xml
  • Edit the XML file to add or remove specific stop-words
  • Restart SearchBlox
  • Re-index the affected collection to apply changes

These modifications will alter how SearchBlox processes common words in your indexed content, potentially affecting search relevance and performance.

🚧

Important Note:

Please note that the collection must be cleared and reindexed for the changes to take effect. When you modify the stopwords file for a specific language, the changes will not be reflected in the search results until the collection is cleared and reindexed.


What’s Next