Best Practices

Collection Path Settings

Provide allow path to limit indexing to the domain provided in the root path.
Provide disallow path if you need to avoid certain subpath from the domain indexed.
Do not disable HTML format in allowed formats as indexing is not possible by disallowing HTML file.

Increase or decrease spider depth to limit the level of crawling.
If you want to avoid data indexed by size or age, please provide relevant values in the settings.
If you do not want to limit based on size or age, please provide -1 as the value.
Provide relevant User-Agent if there is any specific limit in your robots.txt.
If you need canonical URLs to be considered for indexing disable ignore canonical.

If you need to avoid a specific portion of your content from indexing please use stopindex tags in your webpages.
Using robots.txt you can control the access to subpaths from your site.
You can control crawling and indexing from your webpages using robots meta tags.
If you have all your URLs listed in sitemap.xml, you can index all pages faster by enabling follow sitemaps and indexing the collection.
If your webpages have redirects, please ensure to enable redirects in the collection's settings and also provide the redirected URL in the allowed path of the collection.

While scheduling collections, ensure that the collection starts indexing after the previously scheduled operation gets completed.
You can schedule clear operation two minutes before index operation if you need to perform clear before indexing.
You can use daily schedule for indexing if the indexing operation completes within a day; otherwise, please use a weekly schedule.