Using Meta Robots
Using noindex, nofollow Meta Robots
The WEB Collection crawler behaviour can be controlled using meta robots tags. Meta robots rules are applied after robots.txt and path settings in WEB collections.
Using meta robots, you can:
- Avoid indexing a page but still crawl it
- Index a page but avoid crawling it
- Avoid both indexing and crawling a page
- Index and crawl a page
Meta Robots Tags
Meta robots tags must be placed in the HTML <head> section of the page.
| Behaviour | Tag |
|---|---|
| Crawl but do not index | <meta name="robots" content="noindex, follow"> |
| Index but do not crawl | <meta name="robots" content="index, nofollow"> |
| Do not index and do not crawl | <meta name="robots" content="noindex, nofollow"> |
| Index and crawl | <meta name="robots" content="index, follow"> or omit the tag entirely |
Using Content Exclusion Meta Tags
In HTTP collections, you can exclude specific sections of an HTML page from being indexed — such as headers, footers, or navigation — using the following tags:
| Tag | Syntax |
|---|---|
noindex | <noindex>Content to exclude</noindex> |
stopindex / startindex | <!--stopindex-->Content to exclude<!--startindex--> |
googleoff / googleon | <!--googleoff: all-->Content to exclude<!--googleon: all--> |
Rules for Content Exclusion Tags
- Content inside these tags will not be indexed
- These tags cannot be used inside the
<head>section or within meta tags - These tags must not be nested inside each other
- Each tag must be properly closed:
stopindexmust be closed withstartindexnoindexstart must be closed withnoindexendgoogleoffmust be closed withgoogleon
For further reference on these tags see Wikipedia or Google Search documentation.
