The SearchBlox HTTP-API enables you to index and search web content using simple HTTP POST and GET actions. The HTTP-API can add and delete HTTP collections, update paths and settings, schedule indexing and stop indexing the collection. Searchblox HTTP-API provides methods for working with HTTP collections of REST requests with JSON payloads.

Adding a Collection

Create a new Collection. You can create HTTP, File and Database collections using this API.
##Index URL
http://localhost:8080/searchblox/rest/collection/add

Method

POST

Media Type

application/json

Headers

content-type : application/json
accept: application/json

Document Syntax

Create HTTP Collection

{
      "apikey" : "61282E82E5D6D8D409EFC87E8415CDAA",
      "colname":"httpcollection",
      "coltype":"http",
         "language":"en"
}

Create File Collection

{
      "apikey" : "61282E82E5D6D8D409EFC87E8415CDAA",
      "colname":"filecollection",
      "coltype":"file",
      "language":"fr"
}

Create Database Collection

{
      "apikey" : "61282E82E5D6D8D409EFC87E8415CDAA",
      "colname":"dbcollection",
      "coltype":"db",
      "language":"de"
}

Document Description

JSON FieldsValue
apikeyAPI key accessible in the SearchBlox Admin Console. It is also present in the config.xml file.
colnameName of the collection.
coltypeType of the collection. The values given for HTTP, file and database collection are HTTP, file and db respectively.
languageLanguage of the collection specified in two-letter code
https://developer.searchblox.com/v9.2/docs/supported-languages#language-codes

Response Codes

Response CodeMessage
601Invalid API Key
50001Collection Already Exists
Collection Type not found
5000Collection created successfully

Deleting a Collection

You can delete collections using this API.

Index URL

http://localhost:8080/searchblox/rest/collection/delete

Method

POST

Media Type

application/json

Headers

content-type : application/json
accept: application/json

Document Syntax

{
      "apikey" : "61282E82E5D6D8D409EFC87E8415CDAA",
      "colname":"httpcollection"
}

Document Description

JSON fieldsapikey
apikeyAPI key accessible in the SearchBlox Admin Console. It is also present in the config.xml file.
colnameName of the collection.

Update the Collection Path

You can update HTTP collection path settings to configure rooturls, allowpaths, disallowpaths and formats for indexing.

Index URL

http://localhost:8080/searchblox/rest/collection/updatePath

Method

POST

Media Type

application/json

Headers

content-type : application/json
accept: application/json

Document Syntax

{
  "apikey": "61282E82E5D6D8D409EFC87E8415CDAA",
  "colname": "httpcollection",
  "rooturls": [
    "http://www.google.co.in",
    "http://www.bing.com"
  ],
  "allowpaths": [
    ".*"
  ],
  "disallowpaths": [
    "http://www.google.co.in/test/bingo"
  ],
  "allowformat": [
    "HTML",
    "text"
  ]
}

Document Description

JSON FieldsValue
apikeyAPI key accessible in the SearchBlox Admin Console. It is also present in the config.xml file.
colnameName of the collection.
rooturlsThe root URL is the starting URL for the spider. It requests this URL, indexes the content, and follows links from the URL. Make sure the root URL entered has regular HTML HREF links that the spider can follow.
allowpathsThe allowpath limits the spider to stay only within the given path or list of paths.
Example: https://www.searchblox.com/ (Informs the spider to stay only within the searchblox.com site).
disallowpathsThe disallowpath is the path or list of paths that you do not want the spider to crawl or index.
allowformatsAllowformats are the file types that are to be indexed. File types other than those specified here will not be indexed.

Update the Collection Settings

Update HTTP collection settings where you can configure various parameters to filter indexing.

Index URL

http://localhost:8080/searchblox/rest/collection/updateSettings

Method

POST

Media Type

application/json

Headers

content-type : application/json
accept: application/json

Document Syntax

{
    "apikey": "61282E82E5D6D8D409EFC87E8415CDAA",
    "colname": "httpcollection",
    "keyword-in-context": "false",
    "remove-duplicates": "false",
    "boost": "100",
    "stemming": "false",
    "spelling": "true",
    "logging": "true",
    "html-settings": {
        "description": "meta",
        "max-doc-age": "-1",
        "max-doc-size": "-1",
        "spider-max-depth": "6",
        "spider-max-delay": "1",
        "user-agent": "SearchBlox",
        "referer": "Google",
        "ignore-robots": "false",
        "follow-sitemap": "false",
        "follow-redirect": "true"
    },
    "basic-auth-settings": {
        "username": "searchblox",
        "password": "testing"
    },
    "form-auth-settings": {
        "form-url": "http://www.google.co.in",
        "form-action": "post",
        "form": [{
            "name": "httpcollection",
            "value": "google"
        }, {
            "name": "httpcollection1",
            "value": "searchblox"
        }]
    },
    "proxy-settings": {
        "server-url": "http://searchblox.com/proxy",
        "username": "proxy",
        "password": "adasd"
    }
}

Document Description

JSON FieldsAttributesValue
apikeyAPI key accessible in the SearchBlox Admin Console. It is also present in the config.xml file.
colnameName of the Collection.
keyword-in-contextValue is set to Yes or No to enable or disable keyword-in-context display respectively.
The keyword-in-context returns search results with the description displayed from content areas where the search term occurs. To enable give yes and to disable give no.
remove-duplicatesValue is set to Yes or No to remove duplicates or allow duplicate documents while indexing respectively.
boostBoost search terms for the collection by setting a value greater than 1 (maximum value 9999).
stemmingValue is set to Yes or No to enable or disable stemming respectively. When stemming is enabled, inflected words are reduced to root form. For example, "running", "runs", and "ran" are the inflected form of run.
loggingValue is set to Yes or No to enable or disable logging respectively.
html-settingsdescriptionThis description setting configures the HTML parser to read the description for a document. You can specify any one of the following HTML tags to be read as description.
Description, h1, h2, h3, h4 ,h5, h6.
max-doc-ageSpecifies the maximum allowable age in days of a document in the collection. By giving -1 we do not specify any maximum allowable age.
max-doc-sizeSpecifies the maximum allowable size in kilobytes of a document in the collection. By giving -1 we do not specify any maximum document size.
spider-max depthSpecifies the maximum depth the spider is allowed to proceed to index documents. Value can be specified from 1-10.
spider-max-delaySpecifies the wait time in milliseconds for the spider between HTTP requests to a web server. By giving 0 we specify no delay.
user-agentSpecifies the name under which the spider requests documents from a webserver.
refererSpecifies the URL value set in the request headers to specify where the user agent previously visited.
ignore-robotsValue is set to Yes or No to tell the spider to obey robot rules or not.
follow-sitemapValue is set to Yes or No to tell the spider whether sitemaps alone can be indexed, or if all of the URLs have to be indexed respectively.
follow-redirectIs set to Yes or No to instruct the spider to automatically follow redirects or not.
basic-auth-settingsusernameThese settings help in indexing content secured by HTTP Basic authentication.
Username for basic authentication has to be specified for this attribute.
passwordPassword for basic authentication has to be specified for this attribute.
form-auth-settingsform-urlThese settings help in indexing content protected using form based authentication.
Form-url is the ACTION URL of the authentication HTML form.
Form-actionSpecifies whether the form action is a POST or GET.
form – name, valueThe set of name/value pairs that are required. For example, username and password information for authentication are set here.
Example
Name,Value
Web User,myself
Password,abc123
Login,true
proxy-settingsserver-urlThese settings help in indexing content through proxy servers.
This specifies the URL to access the proxy server.
usernameWhen the proxy server requires authentication, set the username.
passwordSet the password.

Update the Scheduler Settings

Index URL

http://localhost:8080/searchblox/rest/collection/updateScheduler

Method

POST

Media Type

application/json

Headers

content-type : application/json
accept: application/json

Document Syntax

{
    "apikey": "61282E82E5D6D8D409EFC87E8415CDAA",
    "colname": "httpcollection",
     "index":{
         "frequency":"ONCE",
         "timestamp":"21-01-2016 19:05:00"
         },
      "clear":{
        "frequency":"MINUTELY",
        "timestamp":"21-01-2016 18:05:00"
             }
}

Document Description

XML TagAttributeValue
apikeyAPI key accessible in the SearchBlox Admin Console. It is also present in the config.xml file.
colnameName of the Collection
indexfrequencySpecifies the frequency of indexing of web documents. The values can be ONCE, DAILY, MINUTELY, WEEKLY and MONTHLY.
timestampSpecifies the timestamp when the indexing has to start. Example: 21-01-2016 19:05:00.
clearfrequencySpecifies the frequency of clearing of indexed documents. The values can be ONCE, DAILY, MINUTELY, WEEKLY and MONTHLY.
timestampSpecifies the timestamp when the clearing has to occur. Example: 21-01-2016 19:05:00.

Index or Stop Indexing the collection

Index URL

http://localhost:8080/searchblox/rest/collection/actions

Method

POST

Media Type

application/json

Headers

content-type : application/json
accept: application/json

Document Syntax

Indexing collection

{
    "apikey": "61282E82E5D6D8D409EFC87E8415CDAA",
    "colname": "httpcollection",
    "action":"index"
}

Stop Indexing Collection

{
    "apikey": "61282E82E5D6D8D409EFC87E8415CDAA",
    "colname": "httpcollection",
    "action":"stop"
}

Document Description

JSON fieldsValue
apikeyAPI key is accessible in the SearchBlox Admin Console. It is also present in the config.xml file.
colnameName of the collection.
action- Specifies the type of action to be performed. Index is specified to start indexing the collection.
- If the indexing process is going on, stop is specified to stop indexing the collection.

Response Code Description

5000Collection Created Successfully
50001Collection Exists/Collection Type Not Found
50002Invalid JSON
50003Collection Deleted Successfully
50005Collection Path Saved Successfully
50006Specified collection is not a CUSTOM collection
50007Invalid Request/Collection Not Found
50008Collection Indexing/Collection Indexing has been stopped/Collection schedule saved successfully
50009Invalid Request/ Collection Not Found
601API Key Not Valid

What’s Next