Amazon S3 Collection
You can create an Amazon S3 collection by following the steps given below.
Creating Amazon S3 Collection
- After logging in to the Admin Console, click Add Collection button. The Add Collection screen will be displayed.
- Enter a unique name for your collection (for example, AmazonS3).
- Select Amazon S3 collection radio button.
- Click Add to create the collection.
Amazon S3 Collection Settings
- The Settings sub-tab holds settings for Amazon S3 and tunable parameters for the search.
- Amazon S3 settings must be set explicitly in the Amazon S3 collections.
- The mandatory fields for AmazonS3 collection are
- Access key
- Secret key
- Bucket name
- SearchBlox also comes pre-configured with few other Amazon S3 parameters like includes, excludes when a new collection is created.
- The following table has the list of settings for Amazon S3 Collection
Field | Description |
---|---|
Access Key | Access key from Amazon S3 security credentials. Mandatory field. |
Secret Key | Security key from Amazon S3 security credentials. Mandatory field. |
Name | Optional name. |
Bucket | Amazon S3 bucket to index. Mandatory field. |
Path Prefix | Path prefix to index in this bucket example: Work/. This is optional. If specified, it should be an existing path with the trailing /. |
Includes | File types to be included. example: .pdf, .jpg. |
Excludes | File types to be excluded. example: *.zip. |
Keyword-in-Context Display | The keyword-in-context returns search results with the description displayed from content areas where the search term occurs. |
Boosting | Boost search terms for the collection by setting a value greater than 1 (maximum value 9999). |
Stemming | When stemming is enabled, inflected words are reduced to root form. For example, "running", "runs", and "ran" are the inflected form of "run". |
Spelling Suggestions | When enabled, a spelling index is created at the end of the indexing process. |
Additional Note
- Do not log transactions to S3 buckets since those log files will also be indexed, increasing bandwidth usage.
- If logging is needed, then disallow the log files by excluding them (using extensions) in Collection Settings.
Indexing and Other Operations
The following operations can be performed in AmazonS3 collection:
Activity | Description |
---|---|
Index | Starts the indexer for the selected collection. |
Clear | Clears the current index for the selected collection. |
Scheduled Activity | For each collection, any of the following scheduled indexer activity can be set: Index - Set the frequency and the start date/time for indexing a collection. Clear - Set the frequency and the start date/time for clearing a collection. |
- Indexer activity is controlled from the Index sub-tab in the collection. The current status of an indexer for a particular collection is indicated.
- Indexing operation starts the indexer for the Amazon S3 collection.
- On reindexing that is, clicking on index again after the initial index operation, all crawled documents will be reindexed. If documents have been deleted from S3 since the first index operation, they will be deleted from the index. New documents will also be indexed.
- Also, indexing is controlled from the Index sub-tab for a collection or through API. The current status of a collection is always indicated on the Collection Dashboard and the Index page.
- Index operation can also be initiated from the Collection Dashboard.
- Scheduling can be performed only from the Index sub-tab.
Schedule Frequency
Schedule Frequency supported in SearchBlox is as follows:
- Once
- Every Minute
- Hourly
- Daily
- Every 48 Hours
- Every 96 Hours
- Weekly
- Monthly
Best Practices
- It is mandatory to provide access key, secret key, bucket name and update rate in S3 collection settings.
- It is possible to include or exclude file types using collection settings. Please use them to avoid indexing unnecessary file types.
- Do not schedule the same time for index and clear operations
- If you have multiple collections, always schedule the activity to prevent more than 2-3 collections indexing at the same time.
Updated almost 4 years ago