AEM Collection
AEM Collections index pages and assets in the AEM content repository, treating each page or asset as a separate document.
Prerequisites
Before creating an AEM Collection, make sure:
- AEM author instance is running and accessible
- AEM publisher instance is running and accessible
- Admin credentials (username and password) for the AEM author instance are available
Note:
SearchBlox should have access to AEM instances and reachable to crawl the AEM site pages.
Create an AEM Collection
Follow these steps to create a new AEM Collection:
- Log in to the Admin Console
- Go to the Collections tab
- Click on "Create a New Collection" or the "+" icon
- Select "AEM Collection" as the Collection Type
- Enter a unique name for your collection (e.g., "intranet site")
- Configure RAG settings (Enable for ChatBot and Hybrid RAG search)
- Set Collection Access permissions (Private/Public)
- Select the content language (if not English)
- Click "Save" to create your collection
- After creating the AEM collection, you will be taken to the Settings tab.
AEM Collection – Authentication Settings
The Authentication section in an AEM Collection allows you to configure how SearchBlox connects to your Adobe Experience Manager (AEM) instance to index content.
Fields Overview
Author Instance URL
The URL of the AEM Author instance from which documents are indexed. This is the source of content (e.g., http://localhost:4502). The URL must begin with http:// or https://.
Publisher Instance URL
The publish domain of the AEM instance. When configured, indexed documents are served from the publish domain while indexing is still performed from the author instance. The URL must begin with http:// or https://.
Authentication Type
Defines how SearchBlox authenticates with AEM. Three options are available:
- Basic – Username and password authentication.
- IMS S2S– OAuth-based Server-to-Server authentication using Adobe IMS client credentials.
- IMS JWT – Service account authentication using a JSON Web Token (JWT).
Index Mode
Set to Auto by default. It discovers published pages first, then falls back to all pages. This mode is recommended for most setups.
Authentication Type: Basic
When Basic is selected as the Authentication Type, the following credentials are required to connect SearchBlox to the AEM instance:
Username
The AEM username used to authenticate. If service security is disabled on the AEM instance, this field can be left blank. Minimum 3 characters.
Password
The corresponding password for the AEM user account. Similarly, if service security is disabled, this field is not required. Minimum 3 characters.
NOTE: Basic authentication is straightforward and suitable for development or internal environments where OAuth-based credentials are not configured.
Generate Using LLM
This section allows SearchBlox to automatically enrich indexed AEM documents using a Large Language Model (LLM) at the time of indexing. The following fields can be toggled on or off:
| Field | Description |
|---|---|
| Title | Automatically generates concise and relevant titles for documents during indexing. |
| Description | Automatically generates relevant descriptions for documents during indexing. |
| Topics | Automatically generates relevant topics/tags for documents during indexing. |
NOTE: All three toggles are set to No by default. Enabling them improves content discoverability by ensuring documents have meaningful, AI-generated metadata even when the original AEM content lacks it.

Authentication Type: IMS S2S (Server-to-Server)
IMS S2S is Adobe's OAuth 2.0 Server-to-Server credential method, used to authenticate machine-to-machine integrations without user involvement. When selected, the following fields are required:
| Field | Description |
|---|---|
| Client ID | The Adobe Developer Console client ID from the Service Account (Server-to-Server) credential. |
| Client Secret | The client secret from Adobe Developer Console, stored encrypted at rest. |
| Scopes | Comma-separated IMS permission scopes (for example, AdobeID, openid, aem.folders, aem.assets.author). Leave blank to use the default scopes. |
| Organization ID | The Adobe Organization ID from the Developer Console project (format: XXXXXXXX@AdobeOrg). |
Note: IMS S2S is the recommended authentication method for modern AEM integrations. It replaces the older JWT-based service account approach and supports secure, long-lived OAuth credentials managed through the Adobe Developer Console.

Authentication Type: IMS JWT
When IMS JWT is selected, SearchBlox authenticates with AEM using a JSON Web Token (JWT) via Adobe's Identity Management System (IMS). This is a service account-based approach where a signed JWT is exchanged for an access token. The following fields are required:
| Field | Description |
|---|---|
| Client ID | The Adobe Developer Console client ID from the Service Account (JWT) credential. |
| Client Secret | The client secret from Adobe Developer Console, stored encrypted at rest. |
| Organization ID | The Adobe Organization ID from the Developer Console project (format: XXXXXXXX@AdobeOrg). |
| Technical Account ID | The technical account ID associated with the JWT credential in the Developer Console (format: [email protected]). |
| Metascopes | Comma-separated JWT metascopes that define the permissions granted to the integration (e.g., ent_aem_cloud_api). Leave blank to use the default scope. |
| Private Key | The RSA private key in PEM format, including the full -----BEGIN RSA PRIVATE KEY----- and -----END RSA PRIVATE KEY----- markers. Stored encrypted at rest. |
NOTE: IMS JWT is suited for legacy Adobe service account integrations. Adobe has deprecated this method in favor of IMS S2S (OAuth Server-to-Server), so new integrations are encouraged to use IMS S2S where possible.


| Settings | Description |
|---|---|
| Title | Generates concise and relevant titles for the indexed documents using LLM. |
| Description | Generates the description for indexed documents using LLM. |
| Topic | Generates relevant topics for indexed documents using LLM based on document's content. |
| Auto Relevance | Enable/Disable Hybrid Search for automatic relevance ranking |
- Click on
Savebutton and Click onTest Connection.
AEM Collection Paths to Index Specific Site Pages
- AEM collection paths let you set Allow/Disallow paths for the crawler. To index specific site pages or assets, add the allow path format. To access the paths, click on the collection name in the Collections list.
Allow/Disallow Paths
- Allow/Disallow paths let the crawler include or exclude URLs.
- They help manage a collection by excluding unwanted URLs.
- All Allow and Disallow paths relate to the publisher instance URL.
| Field | Description |
|---|---|
| Allow Paths | https://xxx.xxx.xx.xx:xxxx/wk-events/ /aqua-collections/ /wellness-care/ https://xxx.xxx.xx.xx:xxxx/wk-events/standard.html .* (Allows the crawler to go any external URL or domain.) |
| Disallow Paths | .jsp /cgi-bin/ /videos/ ?params |
| Allowed Formats | Select the document formats to be searchable in the collection. |
| Enable Content API | Allows crawling of document content with special characters included. |
Synonyms
Synonyms help the search show relevant documents even when the exact search word is not used.
For example, if someone searches for “global,” the results can also include documents that use “world” or “international.”
We have an option to load Synonyms from the existing documents.

Schedule and Index
AEM collection should be indexed only on published pages. You can set the schedule for indexing a collection with the following frequency options:
- Once
- Hourly
- Daily
- Every 48 Hours
- Every 96 Hours
- Weekly
- Monthly
The following operation can be performed in AEM collections
| Activity | Description |
|---|---|
| Enable Scheduler for Indexing | Turn on to set the Start Date and Frequency for indexing. |
| Schedule | Set the indexing schedule for each collection based on the selected options. |
| View all Collection Schedules | Go to the Schedules section to see all collection schedules. |
Manage Documents
Using the Manage Documents tab, you can perform the following operations:
- Add/Update
- Filter
- View Content
- View Metadata
- Refresh
- Delete
To add a document, click the + icon as shown in the screenshot.
Enter the document URL and click Add/Update.
Once the document is added or updated, the document URL will be displayed on the screen, and you will be able to perform the operations listed above.

Data Fields Tab
Using the Data Fields tab, you can create custom fields for search and view the default fields in non-encrypted collections. SearchBlox supports 4 types of Data Fields:
| Type | Description |
|---|---|
| Keyword | Used for alphanumeric values such as IDs, tags, codes, or other exact-match fields. |
| Number | Used for numeric values such as prices, quantities, ratings, or counts. |
| Date | Used for date values that can be searched, sorted, and filtered. |
| Text | Used for full-text search within custom field content. |
- After configuring Data Fields, you must clear and re-index the collection for changes to take effect.
To know more about Data Fields please refer to Data Fields Tab
Prompts
When LLM/RAG is enabled, you can edit AI-based prompts for Title, Description, Topic, Image Description, and Smart FAQs.
You can customize these prompts anytime, and use Restore Default to reset them back to the original SearchBlox settings.


Models
The Models section lets you override the global embedding, reranking, and LLM settings for this specific collection. Changes made here apply only to the current collection and do not affect other collections.
Embedding
- Provider specifies the embedding provider used to generate vector representations of documents.
- Model defines the embedding model used to convert document content into vectors for semantic search.
Reranker
- Provider specifies the reranker provider used for improving search result relevance.
- Model defines the reranker model used to re-score and reorder search results based on relevance.
LLM
-
Provider specifies the Large Language Model provider used for AI-powered features.
-
Model defines the LLM used for tasks such as document enrichment, summaries, and SmartFAQs.
-
These settings override global configurations and apply only to the current collection.
Monitoring & Webhooks
The Monitoring & Webhooks tab provides settings for monitoring content changes and configuring webhook endpoints for automatic synchronization between AEM and SearchBlox.
Content Monitoring
Scheduled Monitoring
Enables automatic synchronization based on the configured schedule. When enabled, SearchBlox periodically checks the content source and performs synchronization according to the selected interval.
Delta Sync
Controls whether synchronization processes only changed content or performs a full synchronization.
- When enabled, only new, updated, or deleted content is synchronized.
- When disabled, every synchronization performs a full crawl of the content source.
Sync Interval
Specifies how frequently SearchBlox checks for content updates when Scheduled Monitoring is enabled. The selected interval determines how often synchronization jobs are executed.
Sync History
The Sync History section displays information about previous synchronization jobs.
The table includes:
- Type – The type of synchronization that was performed.
- Status – The result of the synchronization job.
- Started – The date and time when the synchronization began.
- Duration – The time taken to complete the synchronization.
Webhooks
The Webhooks section provides endpoints and security settings used to receive content update notifications from AEM.
Standard AEM Webhook URL
Endpoint used by a classic or on-premise AEM replication agent to notify SearchBlox when content is published.
Adobe I/O Events Webhook URL
Endpoint used by Adobe Experience Manager as a Cloud Service to send Adobe I/O event notifications to SearchBlox. Incoming requests are validated using the Adobe I/O signature.
Standard Webhook Secret
Shared secret value used to validate requests sent to the Standard AEM Webhook URL. Leave the field blank to retain the current secret.
Adobe I/O Webhook Secret
Signing secret used to verify Adobe I/O event notifications. This value is used to validate the x-adobe-signature included in incoming requests. Leave the field blank to retain the current secret.
Save
Saves any changes made to the webhook configuration and monitoring settings.

Best Practices
- Verify that both the AEM Author and Publisher instance URLs are correct and accessible from the SearchBlox server before saving the collection settings.
- Always click Test Connection after saving the settings to confirm that the connection is working before starting indexing.
- Use Allow Paths to limit indexing to specific sections of the site. Indexing the entire AEM repository without path restrictions can significantly increase indexing time.
- Index only published pages to ensure that search results reflect live content.
- When managing multiple collections, schedule indexing so that only 2–3 collections run simultaneously to optimize system performance.
