Open Graph Meta Tag
Content Manager can make use of og:image HTML tag within their web pages to get thumbnail images for the documents in Search Page.
Selective HTML indexing
Content can be excluded using Using
Using meta robots one can avoid indexing a page, but allow crawling of the page, and the other way around, it is also possible to avoid indexing of a page using meta robots.
Rules specified in robots.txt would be considered by default by the crawler and will take the highest precedence over other WEB collection rules or settings.
If the canonical URLs to be indexed then enable the canonical setting in the web collection. Eg of canonical tag:
Standard XML sitemaps are supported by SearchBlox. Sitemaps from robots.txt will also be supported.
Custom Date Header
Custom Date Header indexing can be configured and supported if your .htaccess file has a header in this date format.
HTML Parser - Document Description
Web Collection > Description setting configures the HTML parser to read the description for a document from one of the HTML tags: META, H1, H2, H3, H4, H5, H6. So make sure you have the valid tags in the webpage, if this setting is used.
Selective LastModified Date
Documents can be indexed with Default Header Date, Custom Lastmodified Date(lastmodified or last-modified) and Customer Header Date.
Search Results Display
Deafult SearchBlox fields title, description, keywords, lastmodified values can be controlled from web pages. Once webpages are re-indexed/refreshed with the collection you can see the updated content in the Search page.
Custom Meta Fields
Custom meta tags can be added in your web pages and ingested with the web collections. To know more: Custom Data Fields
Updated 13 days ago