## Creating Amazon S3 Collection

You can create an Amazon S3 collection by following the steps given below.

  • After logging in to the Admin Console, select the Collections tab and click on Create a New Collection or "+" icon.

  • Choose Amazon S3 Collection as Collection Type

  • Enter a unique name for your collection (for example, AmazonS3).

  • Choose Private/Public Collection Access and Collection Encryption as per the requirements.

  • Choose the language of the content (if the language is other than English).

  • Click Save to create the collection.


  • Once the AmazonS3 collection is created you will be taken to the AmazonS3 tab

## AmazonS3 Collection Settings

  • The Settings sub-tab holds settings for Amazon S3 and tunable parameters for the search.

  • Amazon S3 settings must be set explicitly in the Amazon S3 collections.

  • The mandatory fields for AmazonS3 collection are

    • Access key

    • Secret key

    • Bucket name

  • SearchBlox also comes pre-configured with few other AmazonS3 parameters like includes, excludes when a new collection is created.

  • The following table has the list of settings for AmazonS3 Collection

**Access Key**Access key from Amazon S3 security credentials. Mandatory field.
**Secret Key**Security key from Amazon S3 security credentials. Mandatory field.
**Name**Optional name.
**Bucket**Amazon S3 bucket to index. Mandatory field.
**Path Prefix**Path prefix to index in this bucket example: Work/. This is optional. If specified, it should be an existing path with the trailing /.
**Includes**File types to be included. example: _.pdf, _.jpg.
**Excludes**File types to be excluded. example: *.zip.
**Relevance - Remove Duplicates**Avoids the indexing of duplicate documents, i.e., documents which have the same exact content. The default is NO
**Relevance - Stemming**Stemming considers the inflected words of the root form within the search page. For example, "running", "runs", and "ran" are all inflected forms of run. The default is YES.
**Relevance - Spelling Suggestions**When enabled, a spelling index is created at the end of the indexing process.
**Keyword-in-Context Display**The keyword-in-context returns search results with the description displayed from content areas where the search term occurs.
**Enable Detailed Log Settings**When debug mode is enabled, indexing activity gets logged in detail within the index.log. Log details include: Indexing status of each URL along with timestamp, URL indexing status along with timestamp, status code and time taken for indexing. By default this is set to NO.
**Enable Content API**Provides the ability to crawl the document content with special characters included.

Additional Note

  • Do not log transactions to S3 buckets since those log files will also be indexed, increasing bandwidth usage.

  • If logging is needed, then disallow the log files by excluding them (using extensions) in Collection Settings.

## **Schedule and Index**

Sets the frequency and the start date/time for indexing a collection. Schedule Frequency supported in SearchBlox is as follows:

  • Once

  • Hourly

  • Daily

  • Every 48 Hours

  • Every 96 Hours

  • Weekly

  • Monthly

The following operations can be performed in AmazonS3 collection:

**Schedule**For each collection, indexing can be scheduled based on the above options.

## **Data Fields Tab**

Using Data Fields tab we can create custom fields for search and we can see the Default Data Fields with non-encrypted collection. SearchBlox supports 4 types of Data Fields as listed below:

Keyword Number Date Text

  • Once the Data fields are configured, collection must be cleared and re-indexed to take effect.

To know more about Data Fields please refer to [Data Fields Tab](🔗)

Best Practices

  • It is mandatory to provide access key, secret key, bucket name and update rate in S3 collection settings.

  • It is possible to include or exclude file types using collection settings. Please use them to avoid indexing unnecessary file types.

  • If you have multiple collections, always schedule the activity to prevent more than 2-3 collections indexing at the same time.