Using Meta Robots

Using noindex,nofollow Meta Robots

WEB Collection crawler behavior can be controlled using Meta robots tags.

  • Meta robots rules are applied after robots.txt and path settings in WEB collections.

  • Using Meta robots, a user can:

    • Avoid indexing a page but still crawl it.**
    • Index a page but avoid crawling it.**
    • Avoid both indexing and crawling the page.**
    • Index and crawl the page.

Meta Robots

These meta tags, which control indexing and crawling of a page, must be placed in the HTML header.**

noindex, follow

  • To avoid indexing a page but allow crawling, add the following tag in the page:
    <meta name="robots" content="noindex, follow">

index, nofollow

  • To allow indexing but avoid crawling, add the following tag:
    <meta name="robots" content="index, nofollow">

noindex, nofollow

  • To avoid both indexing and crawling, add the following tag:
    <meta name="robots" content="noindex, nofollow">

index, follow

  • To index and crawl the page, either leave out the meta robots tag or add:
    <meta name="robots" content="index, follow">

`

Using Content Exclusion Meta Tags

In HTTP collections, you can exclude content from sections of an HTML page from being indexed (like headers, footers, or navigation) from being indexed using:

  • Noindex tags
  • Stopindex / Startindex tags
  • Googleon / Googleoff tags

Content Exclusion Metatags Supported

  • noindex tags
    <noindex> Content to Exclude</noindex>
  • stopindex, startindex tags
    <!--stopindex-->Content to Exclude <!--startindex-->
  • googleon,googleoff tags
    <!--googleoff: all-->Content to Exclude<!--googleon: all-->

Rules for noindex,stopindex,google on-off tags

  • Content inside stopindex/startindex, noindex, or googleoff/googleon tags will not be indexed.

  • These tags cannot be used in the head section or meta tags.

  • These tags should not be nested inside each other.

  • Each tag must be properly closed:

    • stopindex → startindex
    • noindex start → noindex end
    • googleoff → googleon
  • Please check the standards for these tags in Wikipedia or Google.