Amazon S3 Collection

You can create an Amazon S3 collection by following the steps given below.

Creating Amazon S3 Collection

  • After logging in to the Admin Console, click Add Collection button. The Add Collection screen will be displayed.
  • Enter a unique name for your collection (for example, AmazonS3).
  • Select Amazon S3 collection radio button.
  • Click Add to create the collection.
924

Amazon S3 Collection Settings

  • The Settings sub-tab holds settings for Amazon S3 and tunable parameters for the search.
  • Amazon S3 settings must be set explicitly in the Amazon S3 collections.
  • The mandatory fields for AmazonS3 collection are
    • Access key
    • Secret key
    • Bucket name
  • SearchBlox also comes pre-configured with few other Amazon S3 parameters like includes, excludes when a new collection is created.
  • The following table has the list of settings for Amazon S3 Collection
FieldDescription
Access KeyAccess key from Amazon S3 security credentials.
Mandatory field.
Secret KeySecurity key from Amazon S3 security credentials.
Mandatory field.
NameOptional name.
BucketAmazon S3 bucket to index.
Mandatory field.
Path PrefixPath prefix to index in this bucket example: Work/.
This is optional. If specified, it should be an existing path with the trailing /.
IncludesFile types to be included. example: .pdf, .jpg.
ExcludesFile types to be excluded. example: *.zip.
Keyword-in-Context DisplayThe keyword-in-context returns search results with the description displayed from content areas where the search term occurs.
BoostingBoost search terms for the collection by setting a value greater than 1 (maximum value 9999).
StemmingWhen stemming is enabled, inflected words are reduced to root form. For example, "running", "runs", and "ran" are the inflected form of "run".
Spelling SuggestionsWhen enabled, a spelling index is created at the end of the indexing process.
870

📘

Additional Note

  • Do not log transactions to S3 buckets since those log files will also be indexed, increasing bandwidth usage.
  • If logging is needed, then disallow the log files by excluding them (using extensions) in Collection Settings.

Indexing and Other Operations

The following operations can be performed in AmazonS3 collection:

ActivityDescription
IndexStarts the indexer for the selected collection.
ClearClears the current index for the selected collection.
Scheduled ActivityFor each collection, any of the following scheduled indexer activity can be set:
Index - Set the frequency and the start date/time for indexing a collection.
Clear - Set the frequency and the start date/time for clearing a collection.
  • Indexer activity is controlled from the Index sub-tab in the collection. The current status of an indexer for a particular collection is indicated.
  • Indexing operation starts the indexer for the Amazon S3 collection.
  • On reindexing that is, clicking on index again after the initial index operation, all crawled documents will be reindexed. If documents have been deleted from S3 since the first index operation, they will be deleted from the index. New documents will also be indexed.
  • Also, indexing is controlled from the Index sub-tab for a collection or through API. The current status of a collection is always indicated on the Collection Dashboard and the Index page.
  • Index operation can also be initiated from the Collection Dashboard.
  • Scheduling can be performed only from the Index sub-tab.

Schedule Frequency

Schedule Frequency supported in SearchBlox is as follows:

  • Once
  • Every Minute
  • Hourly
  • Daily
  • Every 48 Hours
  • Every 96 Hours
  • Weekly
  • Monthly
445

👍

Best Practices

  • It is mandatory to provide access key, secret key, bucket name and update rate in S3 collection settings.
  • It is possible to include or exclude file types using collection settings. Please use them to avoid indexing unnecessary file types.
  • Do not schedule the same time for index and clear operations
  • If you have multiple collections, always schedule the activity to prevent more than 2-3 collections indexing at the same time.