Content Indexed

This page describes the fields that SearchBlox indexes when crawling web content and how to configure custom meta fields for search filters and facets.

Indexed Fields

When SearchBlox crawls a web page, it stores all page content in a searchable content field. It also indexes the following predefined fields:

FieldDescription
TitlePage title
DescriptionMeta description of the page
KeywordsMeta keywords associated with the page
URLPage URL
Last ModifiedDate when the page was last modified
SizeSize of the document

File-Specific Fields

The following fields are indexed for files such as PDFs that are discovered and indexed through a WEB Collection:

FieldDescription
AuthorAuthor of the document
Doc_creation_dateDate when the document was created
Doc_modification_dateDate when the document was last modified

Note: Map any additional fields you require in the mapping.json file.

Custom Meta Fields

Custom meta fields from your web pages are automatically indexed and can be used in search filters.

Example:

query=test&filter=custom:value

To make a custom meta field searchable, map it to the content field in the mapping.json file located in ../ROOT/WEB-INF:

"custom": {
  "type": "text",
  "store": true,
  "fielddata": true,
  "analyzer": "sb_analyzer",
  "copy_to": "content"
}

To use a custom meta field as a facet field, configure it without the copy_to parameter:

"custom": {
  "type": "text",
  "store": true,
  "fielddata": true,
  "analyzer": "sb_analyzer"
}

Important: After updating mapping.json for facet fields, you must clear and reindex the collection for the changes to take effect.

For more information, see *Custom Fields in Search*.