AEM Collection

AEM Collections index pages and assets available in the AEM content repository. Each page and asset is treated as an individual document within the collection.

Prerequisites

Before creating an AEM Collection, ensure the following requirements are met:

AEM author instance is running and accessible
AEM publisher instance is running and accessible
Admin privileged credentials (username and password) for the AEM author instance are available

👍
Note:
SearchBlox should have access to AEM instances and reachable to crawl the AEM site pages.

Create an AEM Collection

Follow these steps to create a new AEM Collection:

Log in to the Admin Console
Navigate to the Collections tab
Click on "Create a New Collection" or the "+" icon
Select "AEM Collection" as the Collection Type
Enter a unique name for your collection (e.g., "intranet site")
Configure RAG settings (Enable for ChatBot and Hybrid RAG search)
Set Collection Access permissions (Private/Public)
Select the content language (if other than English)
Click "Save" to create your collection

Once the AEM collection is created you will be taken to the Settings tab.

Settings Tab

Provide the Authentication fields.

Field	Description
Author Instance URL	The Author Instance URL of the AEM Instance to index the documents from AEM content repositories.
Publisher Instance URL	Publish Domain of the AEM instance. When you set this up, documents get served from publish domain though indexing is done from author instance.
Username	Specifies the AEM username. If the service security is disabled, you do not need to provide the username. Admin privileged credentials for AEM author instance
Password	Specifies the corresponding password value. If the service security is disabled, you do not need to provide the password.

Choose the settings for Generate Using LLM and Hybrid Search.

Settings	Description
Title	Generates concise and relevant titles for the indexed documents using LLM.
Description	Generates the description for indexed documents using LLM.
Topic	Generates relevant topics for indexed documents using LLM based on document's content.
Auto Relevance	Enable/Disable Hybrid Search for automatic relevance ranking

Click on Save button and Click on Test Connection.

AEM Collection Paths to Index Specific Site Pages

The AEM collection Paths allow you to configure the Allow/Disallow paths for the crawler. If we need to index specific site pages or assets, we need to add the allow path format. To access the paths for the AEM collection, click on the collection name from the Collections list.

Allow/Disallow Paths

Allow/Disallow paths ensure the crawler can include or exclude URLs.
Allow and Disallow paths make it possible to manage a collection by excluding unwanted URLs.
All Allow paths and Disallow Paths are related to publisher instance URL

Field	Description
Allow Paths	https://xxx.xxx.xx.xx:xxxx/wk-events/ /aqua-collections/ /wellness-care/ https://xxx.xxx.xx.xx:xxxx/wk-events/standard.html .* (Allows the crawler to go any external URL or domain.)
Disallow Paths	.jsp /cgi-bin/ /videos/ ?params
Allowed Formats	Select the document formats that need to be searchable within the collection.
Enable Content API	Provides the ability to crawl the document content with special characters included.

Schedule and Index

AEM collection should be indexed only on published pages. Sets the frequency and the start date/time for indexing a collection. Schedule Frequency supported in SearchBlox is as follows:

Once
Hourly
Daily
Every 48 Hours
Every 96 Hours
Weekly
Monthly

The following operation can be performed in AEM collections

Activity	Description
Enable Scheduler for Indexing	Once enabled, you can set the Start Date and Frequency
Schedule	For each collection, indexing can be scheduled based on the above options.
View all Collection Schedules	Redirects to the Schedules section, where all the Collection Schedules are listed.

Data Fields Tab

Using Data Fields tab we can create custom fields for search and we can see the Default Data Fields with non-encrypted collection. SearchBlox supports 4 types of Data Fields as listed below:

Keyword
Number
Date
Text

Once the Data fields are configured, collection must be cleared and re-indexed to take effect.

To know more about Data Fields please refer to Data Fields Tab

Updated 9 months ago

Prerequisites

👍Note:

Create an AEM Collection

Settings Tab

AEM Collection Paths to Index Specific Site Pages

Allow/Disallow Paths

Schedule and Index

Data Fields Tab

👍
Note: