Content Manager Features

Feature

Description

Open Graph Meta Tag

Content Manager can make use of og:image HTML tag within their web pages to get thumbnail images for the documents in Search Page.

Selective HTML indexing

Content can be excluded using Using <noindex>...</noindex> or <!--googleoff: all-->...<!--googleon: all--> or <!--stopindex-->...<!--startindex--> tags.

Meta Robots

Using meta robots one can avoid indexing a page, but allow crawling of the page, and the other way around, it is also possible to avoid indexing of a page using meta robots.

Robots.txt

Rules specified in robots.txt would be considered by default by the crawler and will take the highest precedence over other WEB collection rules or settings.

Canonical

If the canonical URLs to be indexed then enable the canonical setting in the web collection. Eg of canonical tag:

Sitemap

Standard XML sitemaps are supported by SearchBlox. Sitemaps from robots.txt will also be supported.

Custom Date Header

Custom Date Header indexing can be configured and supported if your .htaccess file has a header in this date format.
Header set SearchBlox-Last-modified "Wed, 01 Jan 2000 12:00:01 GMT"

HTML Parser - Document Description

Web Collection > Description setting configures the HTML parser to read the description for a document from one of the HTML tags: META, H1, H2, H3, H4, H5, H6. So make sure you have the valid tags in the webpage, if this setting is used.

Selective LastModified Date

Documents can be indexed with Default Header Date, Custom Lastmodified Date(lastmodified or last-modified) and Customer Header Date.

Search Results Display

Deafult SearchBlox fields title, description, keywords, lastmodified values can be controlled from web pages. Once webpages are re-indexed/refreshed with the collection you can see the updated content in the Search page.

Custom Meta Fields

Custom meta tags can be added in your web pages and ingested with the web collections. To know more: Custom Data Fields


Did this page help you?