SearchBlox

SearchBlox Developer Hub

Welcome to the SearchBlox developer hub. Here you will find comprehensive guides and documentation to help you start working with SearchBlox as quickly as possible, as well as support if you get stuck. Let's jump right in!

Documentation

Network Connector

Configuring SearchBlox

Before installing network crawler SearchBlox has to be installed and set up successfully.
Create a collection with type Custom Collection.

Installing the Network Crawler

Please contact support@searchblox.com to get the download link for SearchBlox-network-crawler.
Download the latest version of SearchBlox-network-crawler. Extract the downloaded zip to /opt/searchblox-network in Linux and C: /searchblox-network in windows machine.

Configuring SMB

The extracted folder will contain a folder named /conf which contains all the configuration needed for the crawler.
Config.yml
This is the configuration file that is used to map the SearchBlox to the network crawler. Edit the file in your favorite editor.
apikey: The API Key of your SearchBlox Instance you can find it in the Admin tab of the SearchBlox Instance.
colname: Name of the collection which you created.
colid: The Collection ID of the collection you created. It can be found in the Collections tab near to the collection name from the SearchBlox Instance.
url: The URL of SearchBlox Instance.

apikey: 267BACD0F31A74F557426DD2A552ECDD
colname: custom
colid: 2
url: http://localhost:8080/searchblox/

Windowsshare.yml
Please give the details of the domain server, authentication domain, username, password, folder path, disallow path, allowed format and recrawl interval in C:/searchblox-network/conf/Windowsshare.yml. You can also give details of more than one server or more than one path in same server in Windowsshare.yml file.
You can find the details in the content of the file as given below.

//The recrawl interval in days.
recrawl : 12
servers:
//The IP or domain of the Server.
  - server: 89.107.56.109
//The authentication domain if available it can be optional
    authentication-domain:
//The Administrator Username
    username: administrator
//The Administrator password
    password: may@2016
//The Folder path where the data need to be indexed.
    shared-folder-path: [/test/jason/pencil/]
//The disallow path inside the path that needed to be indexer.
    disallow-path: [/admin/,/js/]
//The file formats that need to be allowed for indexing
    allowed-format: [ txt,doc,docx,xls,xlsx,xltm,ppt,pptx,html,htm,pdf,odt,ods,rtf,vsd,xlsm,mpp,pps,one,potx,pub,pptm,odp,dotx,csv,docm,pot ]
//Details of another server or another AD path to crawl
  - server: 89.107.56.109
    authentication-domain:
    username: administrator
    password: may@2016
    shared-folder-path: [/test/jason/newone/]
    disallow-path: [/admin/,/js/]
    allowed-format: [ txt,doc,docx,xls,xlsx,xltm,ppt,pptx,html,htm,pdf,odt,ods,rtf,vsd,xlsm,mpp,pps,one,potx,pub,pptm,odp,dotx,csv,docm,pot ]

Starting the Crawler

The Crawler can be started by sh start.sh in Linux and start.bat in windows.
The crawler starts in the background you can see the logs in logs folder.

Note

Please note that you would be able to run only one network crawler at a time. If you need to run the crawler for different paths or different servers you can give the details in the same network crawler in Windowsshare.yml file as mentioned in the above section.
If you need to re-run the crawler in another collection, You need to delete sb_network index using a tool that could communicate with elasticsearch.
For example you can download "sense plugin" for chrome and delete the index using
DELETE sb_network
Please refer the last section in this page.

If plain passwords are not allowed in your server please enable the plain password using the below line in start.bat of the network connector
-Djcifs.smb.client.disablePlainTextPasswords=false

Searching Securely Using Searchblox

Please enable Active directory secure search under Search → Security settings as given below:
Secure Search can be used based on active directory configuration by enabling the checkbox for Secured Search and giving the required settings as below:

  • Select Enable Secured Search
  • Give the Active directory details

LDAP URL

LDAP URL that specifies base search for the entries

Search Base

Search Base for the active directory

Username

Admin username

Password

Password for the username

Filter-Type

Filter type could be default or document

Enable document filter

If you need to filter search results based on users, then enable this option

  • Test the connection, the connection should be successful

and perform secure search

Please note the above is applicable for version 8.5 and above
For versions below 8.5, please go to ../webapps/searchblox/WEB-INF/secured.yml and give your credentials as below

url: ldap://198.50.196.176:389
search-base : "DC=ad,DC=searchBlox,DC=com"
username: Administrator@searchblox.com
password: Domain@2016
#There are two types 1. Default 2. document
type: document
#Filter search result with SSID,
document-filter: true

Note

If you need to filter search results based on users then please give true for document-filter

Restart the server once you made the changes.
For help with configuration details, contact the system administrator.
Navigate to the below URL. which will request you to login with username and password.
Navigate to the below URL and log in with the system username and password. From there, search the accessible files.
http://localhost:8080/searchblox/search/login_securesearch.jsp

Admin Access to File Share

If the SMB file share is available on another server on the same network and requires permission, run the SearchBlox server service with Admin access and enter the credentials as listed in the screenshot below. Running as admin account or account with access to files only will help successfully index files from the share.
Also please make sure to run the network crawler service as admin in a similar manner.

How to increase memory in Network Connector

For Windows
Please go to

<network_crawler_installationPath>/start.bat
and allocate more RAM by making changes in the following line
rem set JAVA_OPTS=%JAVA_OPTS% -Xms1G -Xmx1G
instead of 1G please give 2 or 3G

For Linux
Please go to

<network_crawler_installationPath>/start.sh
uncomment the following line and allocate more memory
JAVA_OPTS="$JAVA_OPTS -Xms1G -Xmx1G"

Delete sb_network to rerun the crawler in another Collection

To rerun the network crawler in another collection you need to delete the sb_network index using a tool that could communicate to elasticsearch. In our example we use the sense plugin for chrome.
Please go to http://localhost:9200/_cat/indices and check whether you can view the sb_network index.

Download and install sense tool for chrome https://chrome.google.com/webstore/detail/sense-beta/lhjgkmllcaadmopgmanpapmpjgmfcfig?hl=en

Delete sb_network as in the screenshot below by giving the following statement and executing the same by clicking play button
DELETE sb_network

You need to get "acknowledged": "true" message

Please check http://localhost:9200/_cat/indices sb_network index should not be available among the indices
Rerun the crawler after making necessary changes to your config.yml

Network Connector