AEM Collection
AEM Collections index pages and assets in the AEM content repository, treating each page or asset as a separate document.
Prerequisites
Before creating an AEM Collection, make sure:
- AEM author instance is running and accessible
- AEM publisher instance is running and accessible
- Admin credentials (username and password) for the AEM author instance are available
Note:
SearchBlox should have access to AEM instances and reachable to crawl the AEM site pages.
Create an AEM Collection
Follow these steps to create a new AEM Collection:
- Log in to the Admin Console
- Go to the Collections tab
- Click on "Create a New Collection" or the "+" icon
- Select "AEM Collection" as the Collection Type
- Enter a unique name for your collection (e.g., "intranet site")
- Configure RAG settings (Enable for ChatBot and Hybrid RAG search)
- Set Collection Access permissions (Private/Public)
- Select the content language (if not English)
- Click "Save" to create your collection

- After creating the AEM collection, you will be taken to the Settings tab.
Settings Tab
- Provide the
Authenticationfields.
| Field | Description |
|---|---|
| Author Instance URL | URL of the AEM Author instance to index documents from the content repository. |
| Publisher Instance URL | URL of the AEM Publisher instance. Documents are served from the publisher, while indexing happens from the author instance. |
| Username | AEM username with admin privileges. Not required if service security is disabled. |
| Password | Corresponding password for the AEM username. Not required if service security is disabled. |
Choose the settings for Generate Using LLM and Hybrid Search.

| Settings | Description |
|---|---|
| Title | Generates concise and relevant titles for the indexed documents using LLM. |
| Description | Generates the description for indexed documents using LLM. |
| Topic | Generates relevant topics for indexed documents using LLM based on document's content. |
| Auto Relevance | Enable/Disable Hybrid Search for automatic relevance ranking |
- Click on
Savebutton and Click onTest Connection.
AEM Collection Paths to Index Specific Site Pages
- AEM collection paths let you set Allow/Disallow paths for the crawler. To index specific site pages or assets, add the allow path format. To access the paths, click on the collection name in the Collections list.
Allow/Disallow Paths
- Allow/Disallow paths let the crawler include or exclude URLs.
- They help manage a collection by excluding unwanted URLs.
- All Allow and Disallow paths relate to the publisher instance URL.
| Field | Description |
|---|---|
| Allow Paths | https://xxx.xxx.xx.xx:xxxx/wk-events/ /aqua-collections/ /wellness-care/ https://xxx.xxx.xx.xx:xxxx/wk-events/standard.html .* (Allows the crawler to go any external URL or domain.) |
| Disallow Paths | .jsp /cgi-bin/ /videos/ ?params |
| Allowed Formats | Select the document formats to be searchable in the collection. |
| Enable Content API | Allows crawling of document content with special characters included. |
Schedule and Index
AEM collection should be indexed only on published pages. You can set the schedule for indexing a collection with the following frequency options:
- Once
- Hourly
- Daily
- Every 48 Hours
- Every 96 Hours
- Weekly
- Monthly
The following operation can be performed in AEM collections
| Activity | Description |
|---|---|
| Enable Scheduler for Indexing | Turn on to set the Start Date and Frequency for indexing. |
| Schedule | Set the indexing schedule for each collection based on the selected options. |
| View all Collection Schedules | Go to the Schedules section to see all collection schedules. |
Data Fields Tab
Using the Data Fields tab, you can create custom fields for search and view the default fields in non-encrypted collections. SearchBlox supports 4 types of Data Fields:
- Keyword
- Number
- Date
- Text
- After configuring Data Fields, you must clear and re-index the collection for changes to take effect.
To know more about Data Fields please refer to Data Fields Tab
Models
Embedding
- Provider specifies the embedding provider used to generate vector representations of documents.
- Model defines the embedding model used to convert document content into vectors for semantic search.
Reranker
- Provider specifies the reranker provider used for improving search result relevance.
- Model defines the reranker model used to re-score and reorder search results based on relevance.
LLM
-
Provider specifies the Large Language Model provider used for AI-powered features.
-
Model defines the LLM used for tasks such as document enrichment, summaries, and SmartFAQs.
-
These settings override global configurations and apply only to the current collection.
Updated 14 days ago
