You can create an Amazon S3 collection by following the steps given below.
- After logging in to the Admin Console, select the Collections tab and click on Create a New Collection or "+" icon.
- Choose Amazon S3 Collection as Collection Type
- Enter a unique name for your collection (for example, AmazonS3).
- Choose Private/Public Collection Access and Collection Encryption as per the requirements.
- Choose the language of the content (if the language is other than English).
- Click Save to create the collection.
- Once the AmazonS3 collection is created you will be taken to the AmazonS3 tab
- The Settings sub-tab holds settings for Amazon S3 and tunable parameters for the search.
- Amazon S3 settings must be set explicitly in the Amazon S3 collections.
- The mandatory fields for AmazonS3 collection are
- Access key
- Secret key
- Bucket name
- SearchBlox also comes pre-configured with few other AmazonS3 parameters like includes, excludes when a new collection is created.
- The following table has the list of settings for AmazonS3 Collection
|Access key from Amazon S3 security credentials.
|Security key from Amazon S3 security credentials.
|Amazon S3 bucket to index.
|Path prefix to index in this bucket example: Work/.
This is optional. If specified, it should be an existing path with the trailing /.
|File types to be included. example: .pdf, .jpg.
|File types to be excluded. example: *.zip.
|Relevance - Remove Duplicates
|Avoids the indexing of duplicate documents, i.e., documents which have the same exact content. The default is NO
|Relevance - Stemming
|Stemming considers the inflected words of the root form within the search page. For example, "running", "runs", and "ran" are all inflected forms of run. The default is YES.
|Relevance - Spelling Suggestions
|When enabled, a spelling index is created at the end of the indexing process.
|The keyword-in-context returns search results with the description displayed from content areas where the search term occurs.
|Enable Detailed Log Settings
|When debug mode is enabled, indexing activity gets logged in detail within the index.log. Log details include: Indexing status of each URL along with timestamp, URL indexing status along with timestamp, status code and time taken for indexing. By default this is set to NO.
|Enable Content API
|Provides the ability to crawl the document content with special characters included.
- Do not log transactions to S3 buckets since those log files will also be indexed, increasing bandwidth usage.
- If logging is needed, then disallow the log files by excluding them (using extensions) in Collection Settings.
Sets the frequency and the start date/time for indexing a collection. Schedule Frequency supported in SearchBlox is as follows:
- Every 48 Hours
- Every 96 Hours
The following operations can be performed in AmazonS3 collection:
|For each collection, indexing can be scheduled based on the above options.
Using Data Fields tab we can create custom fields for search and we can see the Default Data Fields with non-encrypted collection. SearchBlox supports 4 types of Data Fields as listed below:
- Once the Data fields are configured, collection must be cleared and re-indexed to take effect.
To know more about Data Fields please refer to Data Fields Tab
- It is mandatory to provide access key, secret key, bucket name and update rate in S3 collection settings.
- It is possible to include or exclude file types using collection settings. Please use them to avoid indexing unnecessary file types.
- If you have multiple collections, always schedule the activity to prevent more than 2-3 collections indexing at the same time.
Updated over 1 year ago