Rules for URL Pattern

In the Web Collection there are three important fields in path settings

Root Path
Allow Path
Disallow Path

Root Path

The Root Path specifies the exact URL where the crawler will begin its operation. This is also referred to as the "start URL" and serves as the entry point for all crawling activities.
URL patterns should not be provided in the root path.
The Root Path must contain a complete URL, including the protocol (http:// or https://). This URL should be fully functional - when copied and pasted into a browser's address bar, it should directly access the intended starting page for the crawl operation. Incomplete URLs or relative paths will not function correctly as Root Path entries.
If the root path gets redirected to some other URL, please give the redirected URL in the root path or provide the same in allow path for successful indexing.
SearchBlox supports only HTTP and HTTPS protocols in the WEB collection. No other protocol would be supported.
Example:
http://www.searchblox.com
https://www.searchblox.com
SearchBlox does not support
googleconnector://
Regex patterns cannot be provided in the root path. Regex prefixes are not supported.

Allow Path

Allow path ensures that the crawling/indexing includes a particular domain or path based on the path or regex provided

	Example
Provide the complete path to include a domain	https://www.searchblox.com/
Provide a subpath or folder to include that subpath in indexing	/blog/
to limit indexing to a particular suffix or end of URL string	pdf$ com$

Disallow Path

Disallow path ensures that the crawling/indexing excludes a particular domain or path based on the path or regex provided

	Example
Provide the complete path to exclude a domain	https://www.searchblox.com/
Provide a subpath or folder to exclude that subpath in indexing	/blog/
to exclude indexing to a particular suffix or end of URL string	pdf$ com$
comment character: # can be used as a comment character in disallow path	#comment

🚧
Basic regular expressions in GNU regular expression libraries can allow and disallow paths.
https://www.gnu.org/software/gnulib/manual/html_node/Regular-expressions.html