WEB Collection crawler can be controlled using Meta robots tags.
- Rules for crawling based on meta robots would be considered after robots.txt and path settings specified in WEB collections.
- Using meta robots
- User can avoid indexing a page, but crawl the same.
- User can index a page, but avoid crawling the same.
- User can avoid indexing as well as crawling the page.
- User can index and crawl the page.
These meta tags which direct the indexing and crawling of a page has to be specified in the header section of HTML code.
- To avoid indexing a page but allow crawling the following meta tag has to be specified in the page that is crawled:
<meta name="robots" content="noindex, follow">
- To avoid crawling but allow indexing the following meta tag has to be specified:
<meta name="robots" content="index, nofollow">
- To avoid both indexing and crawling the following meta tag has to be provided:
<meta name="robots" content="noindex, nofollow">
- To index as well as crawl either you can avoid giving the meta robots or specify the same as shown:
<meta name="robots" content="index, follow">
In HTTP collections, it might be required to exclude content from sections of an HTML page from being indexed, such as headers, footers, and navigation.
- Noindex tags
- Stopindex, Startindex tags
- Googleon, Googleoff tags
- noindex tags
<noindex> Content to Exclude</noindex>
- stopindex, startindex tags
<!--stopindex-->Content to Exclude <!--startindex-->
- googleon,googleoff tags
<!--googleoff: all-->Content to Exclude<!--googleon: all-->
- Body content that is enclosed with stopindex/startindex tags or noindex tags or googleon/googleoff tags will not be included in the index
- These tags are not applicable in the head section or meta tags.
- These tags should not be nested.
- stopindex should be followed by startindex tags, noindex start tag should be followed by noindex end tag and googleoff tag should be followed by googleon tag.
- Please check the standards for these tags in Wikipedia or Google.
Updated about 1 year ago