Using Sitemaps
Using Sitemaps
Sitemaps help search crawlers discover and index all the important pages on your website, especially pages that may not be easily reachable through normal link crawling. By providing a structured list of URLs, sitemaps ensure more complete and accurate indexing of your content in SearchBlox.
Note: Sitemap settings can be configured from the WEB collection settings in the SearchBlox Admin Console.
Enabling Sitemaps
By default, Follow Sitemaps is disabled in the WEB collection settings. To enable it:
- Go to the SearchBlox Admin Console
- Navigate to Collections > WEB Collection > Settings
- Enable Follow Sitemaps
Once enabled, SearchBlox will index only the sitemap URLs provided, following the rules below.
How Sitemaps Work in SearchBlox
- SearchBlox supports sitemap.xml listed in robots.txt when Follow Sitemaps is enabled
- If a sitemap.xml file contains links to other sitemaps, SearchBlox will index all the linked sitemaps automatically
- Only standard XML sitemaps are supported — compressed XML files (tar or gzip) are not supported
- When Follow Sitemaps is enabled, only sitemap links such as
https://example.com/sitemap.xmlare indexed - Allow/Disallow path rules are applied when indexing sitemap URLs
Adding Multiple Sitemaps
You can index multiple sitemaps by providing multiple sitemap URLs in the root path of the WEB collection settings. Add each sitemap URL on a separate line.
Example
https://example.com/sitemap.xml
https://example.com/sitemap-blog.xml
https://example.com/sitemap-products.xml
Limitations
Important: The following WEB collection settings do not apply when indexing via sitemaps:
- Spider depth
- Other standard WEB collection crawl settings
Sitemaps are processed independently of the normal crawl behaviour. Only the path Allow/Disallow rules are respected during sitemap indexing.
Supported Sitemap Format
SearchBlox supports standard XML sitemaps following the Sitemap Protocol. A basic sitemap looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/page1</loc>
<lastmod>2024-01-01</lastmod>
</url>
<url>
<loc>https://example.com/page2</loc>
<lastmod>2024-01-02</lastmod>
</url>
</urlset>
Note: Compressed sitemap files in tar or gzip format are not supported. Ensure your sitemap is a plain XML file before indexing.
