Using Meta Robots

Using noindex,nofollow Meta Robots

WEB Collection crawler can be controlled using Meta robots tags.

Rules for crawling based on meta robots would be considered after robots.txt and path settings specified in WEB collections.
Using meta robots
- User can avoid indexing a page, but crawl the same.
- User can index a page, but avoid crawling the same.
- User can avoid indexing as well as crawling the page.
- User can index and crawl the page.

Meta Robots

These meta tags which direct the indexing and crawling of a page has to be specified in the header section of HTML code.

noindex, follow

To avoid indexing a page but allow crawling the following meta tag has to be specified in the page that is crawled:
<meta name="robots" content="noindex, follow">

index, nofollow

To avoid crawling but allow indexing the following meta tag has to be specified:
<meta name="robots" content="index, nofollow">

noindex, nofollow

To avoid both indexing and crawling the following meta tag has to be provided:
<meta name="robots" content="noindex, nofollow">

index, follow

To index as well as crawl either you can avoid giving the meta robots or specify the same as shown:
<meta name="robots" content="index, follow">

Using Content Exclusion Meta Tags

In HTTP collections, it might be required to exclude content from sections of an HTML page from being indexed, such as headers, footers, and navigation.

Noindex tags
Stopindex, Startindex tags
Googleon, Googleoff tags

Content Exclusion Metatags Supported

noindex tags
<noindex> Content to Exclude</noindex>
stopindex, startindex tags
Content to Exclude 
googleon,googleoff tags
Content to Exclude

Rules for noindex,stopindex,google on-off tags

Body content that is enclosed with stopindex/startindex tags or noindex tags or googleon/googleoff tags will not be included in the index
These tags are not applicable in the head section or meta tags.
These tags should not be nested.
stopindex should be followed by startindex tags, noindex start tag should be followed by noindex end tag and googleoff tag should be followed by googleon tag.
Please check the standards for these tags in Wikipedia or Google.