AEM Collection

The AEM Collection Indexes pages and assets available on the AEM content repository. Each page and asset is considered as a document.

Prerequisites

  • AEM author instance should be running.
  • AEM publisher instance should be running.
  • Admin privileged credentials (username and password) for AEM author instance.

👍

Note:

SearchBlox should have access to AEM instances and reachable to crawl the AEM site pages.

Create an AEM Collection

You can create an AEM Collection with the following steps:

  • After logging in to the Admin Console, select the Collections tab and click on Create a New
    Collection or "+" icon.
  • Choose AEM Collection as Collection Type.
  • Enter a unique Collection name for the data source (For example, intranet site).
  • Choose Private/Public Collection Access.
  • Choose the language of the content (if the language is other than English).
  • Click Save to create the collection.
1674
  • Once the AEM collection is created you will be taken to the Authentication tab.

Authentication Tab

FieldDescription
Author Instance URLThe Author Instance URL of the AEM Instance to index the documents from AEM content repositories.
Publisher Instance URLPublish Domain of the AEM instance. When you set this up, documents get served from publish domain though indexing is done from author instance.
UsernameSpecifies the AEM username. If the service security is disabled, you do not need to provide the username.
Admin privileged credentials for AEM author instance
PasswordSpecifies the corresponding password value. If the service security is disabled, you do not need to provide the password.
1340

AEM Collection Paths to Index Specific Site Pages

The AEM collection Paths allow you to configure the Allow/Disallow paths for the crawler. If we need to index specific site pages or assets, we need to add the allow path format. To access the paths for the AEM collection, click on the collection name from the Collections list.

Allow/Disallow Paths

  • Allow/Disallow paths ensure the crawler can include or exclude URLs.
  • Allow and Disallow paths make it possible to manage a collection by excluding unwanted URLs.
  • All Allow paths and Disallow Paths are related to publisher instance URL
FieldDescription
Allow Pathshttps://192.168.25.32:4503/wk-events/
/aqua-collections/
/wellness-care/
https://192.168.25.32:4503/wk-events/standard.html
.* (Allows the crawler to go any external URL or domain.)
Disallow Paths.jsp
/cgi-bin/
/videos/
?params
Allowed FormatsSelect the document formats that need to be searchable within the collection.
Enable Content APIProvides the ability to crawl the document content with special characters included.
1525 1434

Schedule and Index

AEM collection should be indexed only on published pages. Sets the frequency and the start date/time for indexing a collection. Schedule Frequency supported in SearchBlox is as follows:

  • Once
  • Hourly
  • Daily
  • Every 48 Hours
  • Every 96 Hours
  • Weekly
  • Monthly

The following operation can be performed in AEM collections

ActivityDescription
ScheduleFor each collection, indexing can be scheduled based on the above options.
1551

Data Fields Tab

Using Data Fields tab we can create custom fields for search and we can see the Default Data Fields with non-encrypted collection. SearchBlox supports 4 types of Data Fields as listed below:

Keyword
Number
Date
Text

  • Once the Data fields are configured, collection must be cleared and re-indexed to take effect.

To know more about Data Fields please refer to Data Fields Tab