Using Sitemaps

Using Sitemaps

Sitemaps help search crawlers discover and index all the important pages on your website, especially pages that may not be easily reachable through normal link crawling. By providing a structured list of URLs, sitemaps ensure more complete and accurate indexing of your content in SearchBlox.

Note: Sitemap settings can be configured from the WEB collection settings in the SearchBlox Admin Console.

Enabling Sitemaps

By default, Follow Sitemaps is disabled in the WEB collection settings. To enable it:

  1. Go to the SearchBlox Admin Console
  2. Navigate to Collections > WEB Collection > Settings
  3. Enable Follow Sitemaps

Once enabled, SearchBlox will index only the sitemap URLs provided, following the rules below.

How Sitemaps Work in SearchBlox

  • SearchBlox supports sitemap.xml listed in robots.txt when Follow Sitemaps is enabled
  • If a sitemap.xml file contains links to other sitemaps, SearchBlox will index all the linked sitemaps automatically
  • Only standard XML sitemaps are supported — compressed XML files (tar or gzip) are not supported
  • When Follow Sitemaps is enabled, only sitemap links such as https://example.com/sitemap.xml are indexed
  • Allow/Disallow path rules are applied when indexing sitemap URLs

Adding Multiple Sitemaps

You can index multiple sitemaps by providing multiple sitemap URLs in the root path of the WEB collection settings. Add each sitemap URL on a separate line.

Example

https://example.com/sitemap.xml
https://example.com/sitemap-blog.xml
https://example.com/sitemap-products.xml

Limitations

Important: The following WEB collection settings do not apply when indexing via sitemaps:

  • Spider depth
  • Other standard WEB collection crawl settings

Sitemaps are processed independently of the normal crawl behaviour. Only the path Allow/Disallow rules are respected during sitemap indexing.

Supported Sitemap Format

SearchBlox supports standard XML sitemaps following the Sitemap Protocol. A basic sitemap looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/page1</loc>
    <lastmod>2024-01-01</lastmod>
  </url>
  <url>
    <loc>https://example.com/page2</loc>
    <lastmod>2024-01-02</lastmod>
  </url>
</urlset>

Note: Compressed sitemap files in tar or gzip format are not supported. Ensure your sitemap is a plain XML file before indexing.