Using Meta Robots

Using noindex, nofollow Meta Robots

The WEB Collection crawler behaviour can be controlled using meta robots tags. Meta robots rules are applied after robots.txt and path settings in WEB collections.

Using meta robots, you can:

  • Avoid indexing a page but still crawl it
  • Index a page but avoid crawling it
  • Avoid both indexing and crawling a page
  • Index and crawl a page

Meta Robots Tags

Meta robots tags must be placed in the HTML <head> section of the page.

BehaviourTag
Crawl but do not index<meta name="robots" content="noindex, follow">
Index but do not crawl<meta name="robots" content="index, nofollow">
Do not index and do not crawl<meta name="robots" content="noindex, nofollow">
Index and crawl<meta name="robots" content="index, follow"> or omit the tag entirely

Using Content Exclusion Meta Tags

In HTTP collections, you can exclude specific sections of an HTML page from being indexed — such as headers, footers, or navigation — using the following tags:

TagSyntax
noindex<noindex>Content to exclude</noindex>
stopindex / startindex<!--stopindex-->Content to exclude<!--startindex-->
googleoff / googleon<!--googleoff: all-->Content to exclude<!--googleon: all-->

Rules for Content Exclusion Tags

  • Content inside these tags will not be indexed
  • These tags cannot be used inside the <head> section or within meta tags
  • These tags must not be nested inside each other
  • Each tag must be properly closed:
    • stopindex must be closed with startindex
    • noindex start must be closed with noindex end
    • googleoff must be closed with googleon

For further reference on these tags see Wikipedia or Google Search documentation.