Amazon S3 Collection
Creating Amazon S3 Collection
You can create an Amazon S3 collection by following the steps given below.
- After logging in to the Admin Console, select the Collections tab and click on Create a New Collection or "+" icon.
- Choose Amazon S3 Collection as Collection Type
- Enter a unique name for your collection (for example, AmazonS3).
- Choose Private/Public Collection Access and Collection Encryption as per the requirements.
- Choose the language of the content (if the language is other than English).
- Click Save to create the collection.
- Once the AmazonS3 collection is created you will be taken to the AmazonS3 tab
AmazonS3 Collection Settings
- The Settings sub-tab holds settings for Amazon S3 and tunable parameters for the search.
- Amazon S3 settings must be set explicitly in the Amazon S3 collections.
- The mandatory fields for AmazonS3 collection are
- Access key
- Secret key
- Bucket name
- SearchBlox also comes pre-configured with few other AmazonS3 parameters like includes, excludes when a new collection is created.
- The following table has the list of settings for AmazonS3 Collection
|Access Key||Access key from Amazon S3 security credentials.|
|Secret Key||Security key from Amazon S3 security credentials.|
|Bucket||Amazon S3 bucket to index.|
|Path Prefix||Path prefix to index in this bucket example: Work/.|
This is optional. If specified, it should be an existing path with the trailing /.
|Includes||File types to be included. example: .pdf, .jpg.|
|Excludes||File types to be excluded. example: *.zip.|
|Relevance - Remove Duplicates||Avoids the indexing of duplicate documents, i.e., documents which have the same exact content. The default is NO|
|Relevance - Stemming||Stemming considers the inflected words of the root form within the search page. For example, "running", "runs", and "ran" are all inflected forms of run. The default is YES.|
|Relevance - Spelling Suggestions||When enabled, a spelling index is created at the end of the indexing process.|
|Keyword-in-Context Display||The keyword-in-context returns search results with the description displayed from content areas where the search term occurs.|
|Enable Detailed Log Settings||When debug mode is enabled, indexing activity gets logged in detail within the index.log. Log details include: Indexing status of each URL along with timestamp, URL indexing status along with timestamp, status code and time taken for indexing. By default this is set to NO.|
|Enable Content API||Provides the ability to crawl the document content with special characters included.|
- Do not log transactions to S3 buckets since those log files will also be indexed, increasing bandwidth usage.
- If logging is needed, then disallow the log files by excluding them (using extensions) in Collection Settings.
Schedule and Index
Sets the frequency and the start date/time for indexing a collection. Schedule Frequency supported in SearchBlox is as follows:
- Every 48 Hours
- Every 96 Hours
The following operations can be performed in AmazonS3 collection:
|Schedule||For each collection, indexing can be scheduled based on the above options.|
Data Fields Tab
Using Data Fields tab we can create custom fields for search and we can see the Default Data Fields with non-encrypted collection. SearchBlox supports 4 types of Data Fields as listed below:
- Once the Data fields are configured, collection must be cleared and re-indexed to take effect.
To know more about Data Fields please refer to Data Fields Tab
- It is mandatory to provide access key, secret key, bucket name and update rate in S3 collection settings.
- It is possible to include or exclude file types using collection settings. Please use them to avoid indexing unnecessary file types.
- If you have multiple collections, always schedule the activity to prevent more than 2-3 collections indexing at the same time.
Updated 9 months ago