Email Collections can index content from PST files including attachments and documents from file-systems. It is recommended to index only PST files using this collection.
After logging in to the Admin Console, click Add Collection button. The Add Collection screen will be displayed.
- Enter a unique name for the collection (for example, EmailArchive).
- Select Email radio button.
- Choose the language of the content.
- Click Add to create the new collection.
The email collection settings page allows you to configure the directory paths and filters for the collection. To access the paths settings for the collection, click on the collection name in the collections list.
The directory path is the starting path for the crawler. The crawler recursively indexes files within the folders. Enter at least one directory path for the collection. For example,
Allow and Disallow filters make it possible to manage a collection by excluding unwanted documents.
The directory path is the starting path for the crawler.
Select which formats are eligible to be part of the collection using the checkboxes.
The Settings sub-tab holds tunable parameters for the email collection. SearchBlox comes pre-configured with parameters when a new collection is created.
The settings that can be configured are listed as follows:
The keyword-in-context returns search results with the description displayed from content areas where the search term occurs.
Maximum Document Age
Specifies the maximum allowable age in days of a document in the collection.
Maximum Document Size
Specifies the maximum allowable size in kilobytes of a document in the collection.
When enabled, prevents indexing duplicate documents.
Boost search terms for the collection by setting a value greater than 1 (maximum value 9999).
When stemming is enabled, inflected words are reduced to a root form. For example, "running", "runs", and "ran" are the inflected form of a run.
When enabled, a spelling index is created at the end of the indexing process.
Provides the indexer activity in detail in ../searchblox/logs/index.log.
The details that occur in the index.log when logging or debug logging mode is enabled are:
- You can extract emails as text and attachments in a specific folder (all emails and attachments will be exported to the specified location).
- Location can be specified at
- Please restart SearchBlox after entering the storage location in
pst.yml. Then clear and reindex the collection.
The following operations can be performed in email collections:
Starts the indexer for the selected collection. Starts indexing from the directory paths.
Clears the current index for the selected collection.
For each collection, any of the following scheduled indexer activity can be set:
- Indexer activity is controlled from the Index sub-tab in the collection. The current status of an indexer for a particular collection is indicated.
- Indexing operation starts the indexer for the selected collection from the directory path. On reindexing (clicking on index again after the initial index operation), all crawled documents will be reindexed. If documents have been deleted from the directory since the first index operation, they will be deleted from the index. New documents will also be indexed.
- Index operation can also be performed from the Collections dashboard.
- Scheduling can be performed only from the Index sub-tab.
Schedule Frequency supported in SearchBlox is as shown:
- Every Minute
- Every 48 Hours
- Every 96 Hours
- If you need to use the extraction and download feature of email collection please make the required changes in pst.yml mentioned earlier, restart the instance and then index the collection.
- Do not schedule the same time for two operations (Index, Clear). This will create conflict between activities.
- If you have multiple collections, always schedule the activity to prevent more than 2-3 collections indexing or refreshing at the same time.
Log files starting with the name EmailCollection_ are generated in
<SEARCHBLOX_INSTALLATION_PATH>/webapps/searchblox/logs folder, which lists the status of the action performed on each PST file.
Updated over 1 year ago