Using Meta Robots

Using noindex,nofollow Meta Robots

WEB Collection crawler can be controlled using Meta robots tags.

  • Rules for crawling based on meta robots would be considered after robots.txt and path settings specified in WEB collections.
  • Using meta robots
    • User can avoid indexing a page, but crawl the same.
    • User can index a page, but avoid crawling the same.
    • User can avoid indexing as well as crawling the page.
    • User can index and crawl the page.

Meta Robots

These meta tags which direct the indexing and crawling of a page has to be specified in the header section of HTML code.

noindex, follow

  • To avoid indexing a page but allow crawling the following meta tag has to be specified in the page that is crawled:
    <meta name="robots" content="noindex, follow">

index, nofollow

  • To avoid crawling but allow indexing the following meta tag has to be specified:
    <meta name="robots" content="index, nofollow">

noindex, nofollow

  • To avoid both indexing and crawling the following meta tag has to be provided:
    <meta name="robots" content="noindex, nofollow">

index, follow

  • To index as well as crawl either you can avoid giving the meta robots or specify the same as shown:
    <meta name="robots" content="index, follow">

Using Content Exclusion Meta Tags

In HTTP collections, it might be required to exclude content from sections of an HTML page from being indexed, such as headers, footers, and navigation.

  • Noindex tags
  • Stopindex, Startindex tags
  • Googleon, Googleoff tags

Content Exclusion Metatags Supported

  • noindex tags
    <noindex> Content to Exclude</noindex>
  • stopindex, startindex tags
    <!--stopindex-->Content to Exclude <!--startindex-->
  • googleon,googleoff tags
    <!--googleoff: all-->Content to Exclude<!--googleon: all-->

Rules for noindex,stopindex,google on-off tags

  • Body content that is enclosed with stopindex/startindex tags or noindex tags or googleon/googleoff tags will not be included in the index
  • These tags are not applicable in the head section or meta tags.
  • These tags should not be nested.
  • stopindex should be followed by startindex tags, noindex start tag should be followed by noindex end tag and googleoff tag should be followed by googleon tag.
  • Please check the standards for these tags in Wikipedia or Google.