Sitecore Collection
SearchBlox provides a Sitecore Collection that allows you to index content from Sitecore CMS. It connects to the Sitecore instance and crawls the published pages and content for indexing. Once indexed, the Sitecore content becomes searchable through the SearchBlox platform. This helps organizations easily search and manage website content stored in Sitecore.
Creating a Sitecore Collection
You can create a Sitecore Collection by following these steps:
-
Log in to the Admin Console, go to the Collections tab, and click Create or the “+” icon.
-
Select Sitecore Collection as the collection type.
-
Enter a unique name for the collection. The name must contain 3–36 alphanumeric characters, and only underscores (_) are allowed.
-
Enable or disable RAG (Retrieval Augmented Generation) depending on your requirement. Enable it if the collection will be used for AI-powered search or chatbot responses.
-
Enable Knowledge Graph if you want SearchBlox to extract entities and relationships from the documents in the collection.
-
Choose whether the collection should be Private or Public. Enable Private Collection Access to restrict the collection to authenticated users only.
-
Configure Collection Encryption if you want to encrypt document content or specific metadata fields.
-
Select the Collection Language based on the language used in the documents. The default language is English.
-
Click Create to create the Sitecore Collection.
-
After the collection is created, you will be redirected to the Sitecore Settings / Authentication section to configure the connection and access details.

Configuring Sitecore Settings
To configure Sitecore integration for your collection, follow these steps:
-
Go to the Sitecore Credentials tab within the collection.
-
Enter the Base URL.
This is the URL of your Sitecore instance (e.g.,https://your-sitecore.com) used to connect and retrieve content. -
Enter the Username.
Provide the Sitecore username required for authentication. -
Enter the Password.
Provide the corresponding password for the Sitecore user to securely access the instance. -
Enter the API Key (Optional).
If your Sitecore setup requires additional authentication, provide the API key here. -
Enter the Database.
Specify the Sitecore database name from which content should be crawled (e.g.,web,master). -
Enter the Root Item ID.
Provide the GUID of the Sitecore item from where crawling should begin. This defines the starting point of content indexing. -
Enter the Language.
Specify the language code for the content to be indexed (e.g.,enfor English,esfor Spanish). -
Click Save to store the configuration and enable the system to crawl and index content from your Sitecore instance.

Schedule and Index
Sets the frequency and the start date/time for indexing a collection. Schedule Frequency supported in SearchBlox is as follows:
- Once
- Hourly
- Daily
- Every 48 Hours
- Every 96 Hours
- Weekly
- Monthly
The following operation can be performed in Azure blob collections
| Activity | Description |
|---|---|
| Enable Scheduler for Indexing | Once enabled, you can set the Start Date and Frequency |
| Schedule | For each collection, indexing can be scheduled based on the above options. |
| View all Schedules | Redirects to the Schedules section, where all the Collection Schedules are listed. |
Manage Documents Tab
-
Using Manage Documents tab we can do the following operations:
- Filter
- View content
- View metadata
- Refresh
- Delete
-
To delete a file from your collection, enter the file path and click "Delete".
-
To see the status of an indexed file, click "View Metadata".
Data Fields Tab
Using Data Fields tab we can create custom fields for search and we can see the Default Data Fields with non-encrypted collection. SearchBlox supports 4 types of Data Fields as listed below:
Keyword
Number
Date
Text
- Once the Data fields are configured, collection must be cleared and re-indexed to take effect.
Sitecore Collection Models
The Models page allows you to configure and override AI models used for embeddings, reranking, and LLM-based features within the collection.

Embedding
- Provider specifies the embedding provider used to generate vector representations of documents.
- Model defines the embedding model used to convert document content into vectors for semantic search.
Reranker
- Provider specifies the reranker provider used for improving search result relevance.
- Model defines the reranker model used to re-score and reorder search results based on relevance.
LLM
-
Provider specifies the Large Language Model provider used for AI-powered features.
-
Model defines the LLM used for tasks such as document enrichment, summaries, and SmartFAQs.
-
These settings override global configurations and apply only to the current collection.
Updated 13 days ago
