Configuring SearchBlox

Create Custom Collection
Navigate to: Collections → Create New Collection
Select Custom Collection type Custom Collection.

Installing the Network Crawler

Contact [email protected] to get the download link for SearchBlox-network-crawler.

For Linux Systems:

# 1. Create installation directory
sudo mkdir -p /opt/searchblox-network

# 2. Download the latest network crawler package

# 3. Extract the package
sudo unzip /tmp/searchblox-network-crawler-latest.zip -d /opt/searchblox-network

# 4. Set permissions
sudo chown -R searchblox:searchblox /opt/searchblox-network
sudo chmod -R 755 /opt/searchblox-network/bin

For Windows Systems:
Create folder C:\\searchblox-network
Download the Windows package
Extract the ZIP contents to C:\\searchblox-network

Configuring SMB

The extracted folder will contain a folder named /conf, which contains all the configurations needed for the crawler.

Locating the Config File
Linux: /opt/searchblox-network/conf/config.yml
Windows: C:\searchblox-network\conf\config.yml

Config.yml
This is the configuration file that is used to map SearchBlox to the network crawler. Edit the file in your favorite editor.

Field	Description
`apikey`	The API Key of your SearchBlox instance. Found in the Admin tab.
`colname`	Name of the custom collection you created.
`colid`	The Collection ID of your collection. Found in the Collections tab near the collection name.
`url`	The URL of your SearchBlox instance.
`sbpkey`	The SB-PKEY of your SearchBlox instance. Found in the Users tab (Admin users only). Create an admin user if needed.

apikey: DD7B0E5E6BB786F10D70A86399806591
colname: custom
colid: 2
url: https://localhost:8443/
sbpkey: MNiwiA0TNlIBG0jZpWVPNuszaT/jT39G03kpF01gUpjGQK8+ZSKtQMNVqKxxke/wEthSWw==

searchblox.yml
This is the Opensearch configuration file that is used by SearchBlox network crawler. Edit the file in your favorite editor.

Setting	Description
`searchblox.elasticsearch.url`	URL used by Opensearch with port. Configure if using IP or domain.
`searchblox.elasticsearch.host`	Hostname used for Opensearch.
`searchblox.elasticsearch.port`	Port used for Opensearch.
`searchblox.elasticsearch.basic.username`	Username for Opensearch.
`searchblox.elasticsearch.basic.password`	Password for Opensearch.
`es.home`	Windows or Linux path. For Linux: `/opt/searchblox/opensearch`. Adjust based on your OS.

searchblox.elasticsearch.host: localhost
searchblox.elasticsearch.port: 9200
searchblox.elasticsearch.basic.username: admin
searchblox.elasticsearch.basic.password: xxxxxxx
es.home: C:\SearchBloxServer\opensearch

windowsshare.yml
Enter the details of the domain server, authentication domain, username, password, folder path, disallow path, allowed format and recrawl interval in C:/searchblox-network/conf/windowsshare.yml. You can also enter details of more than one server, or more than one path in same server, in windowsshare.yml file.

You can find the details in the content of the file as shown here.

//The recrawl interval in days.
recrawl : 1
servers:
//The IP or domain of the Server.
  - server: 89.107.56.109
//The authentication domain if available it can be optional
    authentication-domain:
//The Administrator Username
    username: administrator
//The Administrator password
    password: xxxxxxxx
//The Folder path where the data need to be indexed.
    shared-folder-path: [/test/jason/pencil/]
//The disallow path inside the path that needed to be indexer.
    disallow-path: [/admin/,/js/]
//The file formats that need to be allowed for indexing
    allowed-format: [ txt,doc,docx,xls,xlsx,xltm,ppt,pptx,html,htm,pdf,odt,ods,rtf,vsd,xlsm,mpp,pps,one,potx,pub,pptm,odp,dotx,csv,docm,pot ]
//Details of another server or another AD path to crawl
  - server: 89.107.56.109
    authentication-domain:
    username: administrator
    password: xxxxxxxxxx
    shared-folder-path: [/test/jason/newone/]
    disallow-path: [/admin/,/js/]
    allowed-format: [ txt,doc,docx,xls,xlsx,xltm,ppt,pptx,html,htm,pdf,odt,ods,rtf,vsd,xlsm,mpp,pps,one,potx,pub,pptm,odp,dotx,csv,docm,pot ]

Starting the Crawler

To start the crawler:
On Linux, run start.sh.
On Windows, run start.bat.
The crawler runs in the background. Logs are available in the logs folder.

📘
Note
Single Instance Limitation:
Only one network crawler can run at a time. To crawl different paths or servers, update the configurations in Windowsshare.yml.

Re-running for a New Collection:
Before re-running the crawler for a different collection, delete the sb_network index using an Elasticsearch-compatible tool.

Stopping the Crawler:
The network connector must be stopped manually when no longer needed.
Plain Password Requirement:
If your server restricts plain-text passwords, enable them by adding the following parameter to start.bat:
-Djcifs.smb.client.disablePlainTextPasswords=false  

Searching Securely Using SearchBlox

To enable Secure Search using Active Directory:

Navigate to Search → Security Settings.
Check the Enable Secured Search option.
Enter the required LDAP configuration details.
Test the connection to verify settings.
Once enabled, Secure Search will function based on your Active Directory configuration.

Enter the Active Directory details

Lable	Description
LDAP URL	LDAP URL that specifies base search for the entries
Search Base	Search Base for the active directory
Username	Admin username
Password	Password for the username
Filter-Type	Filter type could be default or document.
Enable document filter	Enable this option to filter search results based on users

Once you setup security groups, Login using AD credentials here:
```
https://localhost:8443/search
```

Admin Access to File Share

To index files from an SMB share that requires authentication:

Run SearchBlox Service with Admin Access
Ensure the SearchBlox server service is running under an Admin account or an account with read access to the shared files.
Enter the required credentials when prompted.
Run the Network Crawler with Admin Privileges
Similarly, execute the network crawler under an Admin account (or an account with sufficient permissions) to successfully crawl and index the files.

How to increase memory in Network Connector

For Windows

Navigate to: <network_crawler_installationPath>/start.bat
Locate the line: rem set JAVA_OPTS=%JAVA_OPTS% -Xms1G -Xmx1G
Remove rem to uncomment and adjust memory (e.g., 2G or 3G):
set JAVA_OPTS=%JAVA_OPTS% -Xms2G -Xmx2G

For Linux

Navigate to:
<network_crawler_installationPath>/start.sh
Uncomment and modify the line: JAVA_OPTS="$JAVA_OPTS -Xms2G -Xmx2G"

Note: Adjust values (2G, 3G, etc.) based on available system resources.

Delete sb_network to rerun the crawler in another collection.

Prerequisite Steps:

Delete Existing Index
Before re-running the crawler for a different collection, you must first delete the existing sb_network index using an Elasticsearch-compatible tool.
Verify Index Existence
You can check if the index exists by visiting & look for the sb_network index in the response.
```
https://localhost:9200/_cat/indices
```

Postman can be used to access Opensearch.

Start Postman and create a Postman request to delete an index, use the DELETE command as shown here:

Look for the "acknowledged": "true" message.

Check https://localhost:9200/_cat/indices; sb_network index should not be available among the indices.

Rerun the crawler after making necessary changes to your config.yml.