Amazon S3 Collection

Creating Amazon S3 Collection

Follow these steps to create a new Amazon S3 Collection:

Log in to the Admin Console
Navigate to the Collections tab
Click on "Create a New Collection" or the "+" icon
Select "Amazon S3 Collection" as the Collection Type
Enter a unique name for your collection (e.g., "AmazonS3")
Configure RAG settings (Enable for Hybrid RAG search, Disable for standard search)
Set Collection Access permissions (Private/Public)
Configure Collection Encryption according to your security requirements
Select the content language (if other than English)
Click "Save" to create your collection

Once the AmazonS3 collection is created you will be taken to the AmazonS3 tab

AmazonS3 Collection Settings

The Settings sub-tab holds settings for Amazon S3 and tunable parameters for the search.
Amazon S3 settings must be set explicitly in the Amazon S3 collections.
The mandatory fields for AmazonS3 collection are
- Access key
- Secret key
- Bucket name
SearchBlox also comes pre-configured with few other AmazonS3 parameters like includes, excludes when a new collection is created.
The following table has the list of settings for AmazonS3 Collection

Field	Description
Access Key	Access key from Amazon S3 security credentials. Mandatory field.
Secret Key	Security key from Amazon S3 security credentials. Mandatory field.
Name	Optional name.
Bucket	Amazon S3 bucket to index. Mandatory field.
Path Prefix	Path prefix to index in this bucket example: Work/. This is optional. If specified, it should be an existing path with the trailing /.
Includes	File types to be included. example: .pdf, .jpg.
Excludes	File types to be excluded. example: *.zip.
Relevance - Remove Duplicates	Avoids the indexing of duplicate documents, i.e., documents which have the same exact content. The default is NO
Relevance - Stemming	Stemming considers the inflected words of the root form within the search page. For example, "running", "runs", and "ran" are all inflected forms of run. The default is YES.
Relevance - Spelling Suggestions	When enabled, a spelling index is created at the end of the indexing process.
Keyword-in-Context Display	The keyword-in-context returns search results with the description displayed from content areas where the search term occurs.
Enable Detailed Log Settings	When debug mode is enabled, indexing activity gets logged in detail within the index.log. Log details include: Indexing status of each URL along with timestamp, URL indexing status along with timestamp, status code and time taken for indexing. By default this is set to NO.
Enable Content API	Provides the ability to crawl the document content with special characters included.

📘
Generate Title, Description and Topics using SearchAI PrivateLLM and Enable Hybrid Search:

Choose and enable Generate Using LLM and Auto Relevance

By clicking Compare Keyword Search with Hybrid will redirect to the Comparison Plugin

Settings Description
Title Generates concise and relevant titles for the indexed documents using LLM.
Description Generates the description for indexed documents using LLM.
Topic Generates relevant topics for indexed documents using LLM based on document's content.
Auto Relevance Enable/Disable Hybrid Search for automatic relevance ranking

Settings	Description
Title	Generates concise and relevant titles for the indexed documents using LLM.
Description	Generates the description for indexed documents using LLM.
Topic	Generates relevant topics for indexed documents using LLM based on document's content.
Auto Relevance	Enable/Disable Hybrid Search for automatic relevance ranking

📘
Additional Note

Do not log transactions to S3 buckets since those log files will also be indexed, increasing bandwidth usage.

If logging is needed, then disallow the log files by excluding them (using extensions) in Collection Settings.

Schedule and Index

Sets the frequency and the start date/time for indexing a collection. Schedule Frequency supported in SearchBlox is as follows:

Once
Hourly
Daily
Every 48 Hours
Every 96 Hours
Weekly
Monthly

The following operations can be performed in AmazonS3 collection:

Activity	Description
Enable Scheduler for Indexing	Once enabled, you can set the Start Date and Frequency
Save	For each collection, indexing can be scheduled based on the above options.
View all Collection Schedules	Redirects to the Schedules section, where all the Collection Schedules are listed.

Data Fields Tab

Using Data Fields tab we can create custom fields for search and we can see the Default Data Fields with non-encrypted collection. SearchBlox supports 4 types of Data Fields as listed below:

Keyword
Number
Date
Text

Once the Data fields are configured, collection must be cleared and re-indexed to take effect.

To know more about Data Fields please refer to Data Fields Tab

👍
Best Practices

It is mandatory to provide access key, secret key, bucket name and update rate in S3 collection settings.

It is possible to include or exclude file types using collection settings. Please use them to avoid indexing unnecessary file types.

If you have multiple collections, always schedule the activity to prevent more than 2-3 collections indexing at the same time.

Amazon S3 Collection

Creating Amazon S3 Collection

AmazonS3 Collection Settings

📘
Generate Title, Description and Topics using SearchAI PrivateLLM and Enable Hybrid Search:

📘
Additional Note

Schedule and Index

Data Fields Tab

👍
Best Practices

Creating Amazon S3 Collection

AmazonS3 Collection Settings

📘Generate Title, Description and Topics using SearchAI PrivateLLM and Enable Hybrid Search:

📘Additional Note

Schedule and Index

Data Fields Tab

👍Best Practices

📘
Generate Title, Description and Topics using SearchAI PrivateLLM and Enable Hybrid Search:

📘
Additional Note

👍
Best Practices