HTTP API
The SearchBlox HTTP-API enables you to index and search web content using simple HTTP POST and GET actions. The HTTP-API can add and delete HTTP collections, update paths and settings, schedule indexing and stop indexing the collection. SearchBlox HTTP-API provides methods for working with HTTP collections of REST requests with JSON payloads.
Adding a Collection
Create a new Collection. You can create HTTP, File and Database collections using this API.
##Index URL
https://localhost:8443/searchblox/rest/collection/add
Method
POST
Media Type
application/json
Headers
content-type : application/json
accept: application/json
Headers
SB-PKEY
LmfxTTDSeYxHTntJMHuhwRrGVICMaVN/wl/zPuQ3LtQDNRMnng5GpKIkgt0q1rCC/h6wDA==
Document Syntax
Create HTTP Collection
{
"apikey" : "61282E82E5D6D8D409EFC87E8415CDAA",
"colname":"httpcollection",
"coltype":"http",
"language":"en"
}
Create File Collection
{
"apikey" : "61282E82E5D6D8D409EFC87E8415CDAA",
"colname":"filecollection",
"coltype":"file",
"language":"fr"
}
Create Database Collection
{
"apikey" : "61282E82E5D6D8D409EFC87E8415CDAA",
"colname":"dbcollection",
"coltype":"db",
"language":"de"
}
Document Description
JSON Fields | Value |
---|---|
apikey | API key accessible in the SearchBlox Admin Console. It is also present in the config.xml file. |
colname | Name of the collection. |
coltype | Type of the collection. The values given for HTTP, file and database collection are HTTP, file and db respectively. |
language | Language of the collection specified in two-letter code https://developer.searchblox.com/docs/supported-languages |
Response Codes
Response Code | Message |
---|---|
601 | Invalid API Key |
50001 | Collection Already Exists Collection Type not found |
5000 | Collection created successfully |
Deleting a Collection
You can delete collections using this API.
Index URL
https://localhost:8443/searchblox/rest/collection/delete
Method
POST
Media Type
application/json
Headers
content-type : application/json
accept: application/json
Headers
SB-PKEY
LmfxTTDSeYxHTntJMHuhwRrGVICMaVN/wl/zPuQ3LtQDNRMnng5GpKIkgt0q1rCC/h6wDA==
Document Syntax
{
"apikey" : "61282E82E5D6D8D409EFC87E8415CDAA",
"colname":"httpcollection"
}
Document Description
JSON fields | apikey |
---|---|
apikey | API key accessible in the SearchBlox Admin Console. It is also present in the config.xml file. |
colname | Name of the collection. |
Update the Collection Path
You can update HTTP collection path settings to configure rooturls, allowpaths, disallowpaths and formats for indexing.
Index URL
https://localhost:8443/searchblox/rest/collection/updatePath
Method
POST
Media Type
application/json
Headers
content-type : application/json
accept: application/json
Headers
SB-PKEY
LmfxTTDSeYxHTntJMHuhwRrGVICMaVN/wl/zPuQ3LtQDNRMnng5GpKIkgt0q1rCC/h6wDA==
Document Syntax
{
"apikey": "61282E82E5D6D8D409EFC87E8415CDAA",
"colname": "httpcollection",
"rooturls": [
"http://www.google.co.in",
"http://www.bing.com"
],
"allowpaths": [
".*"
],
"disallowpaths": [
"http://www.google.co.in/test/bingo"
],
"allowformat": [
"HTML",
"text"
]
}
Document Description
JSON Fields | Value |
---|---|
apikey | API key accessible in the SearchBlox Admin Console. It is also present in the config.xml file. |
colname | Name of the collection. |
rooturls | The root URL is the starting URL for the spider. It requests this URL, indexes the content, and follows links from the URL. Make sure the root URL entered has regular HTML HREF links that the spider can follow. |
allowpaths | The allowpath limits the spider to stay only within the given path or list of paths. Example: https://www.searchblox.com/ (Informs the spider to stay only within the searchblox.com site). |
disallowpaths | The disallowpath is the path or list of paths that you do not want the spider to crawl or index. |
allowformats | Allowformats are the file types that are to be indexed. File types other than those specified here will not be indexed. |
Update the Collection Settings
Update HTTP collection settings where you can configure various parameters to filter indexing.
Index URL
https://localhost:8443/searchblox/rest/collection/updateSettings
Method
POST
Media Type
application/json
Headers
content-type : application/json
accept: application/json
Headers
SB-PKEY
LmfxTTDSeYxHTntJMHuhwRrGVICMaVN/wl/zPuQ3LtQDNRMnng5GpKIkgt0q1rCC/h6wDA==
Document Syntax
{
"apikey": "61282E82E5D6D8D409EFC87E8415CDAA",
"colname": "httpcollection",
"keyword-in-context": "false",
"remove-duplicates": "false",
"boost": "100",
"stemming": "false",
"spelling": "true",
"logging": "true",
"html-settings": {
"description": "meta",
"max-doc-age": "-1",
"max-doc-size": "-1",
"spider-max-depth": "6",
"spider-max-delay": "1",
"user-agent": "SearchBlox",
"referer": "Google",
"ignore-robots": "false",
"follow-sitemap": "false",
"follow-redirect": "true"
},
"basic-auth-settings": {
"username": "searchblox",
"password": "testing"
},
"form-auth-settings": {
"form-url": "http://www.google.co.in",
"form-action": "post",
"form": [{
"name": "httpcollection",
"value": "google"
}, {
"name": "httpcollection1",
"value": "searchblox"
}]
},
"proxy-settings": {
"server-url": "http://searchblox.com/proxy",
"username": "proxy",
"password": "adasd"
}
}
Document Description
JSON Fields | Attributes | Value |
---|---|---|
apikey | API key accessible in the SearchBlox Admin Console. It is also present in the config.xml file. | |
colname | Name of the Collection. | |
keyword-in-context | Value is set to Yes or No to enable or disable keyword-in-context display respectively. The keyword-in-context returns search results with the description displayed from content areas where the search term occurs. To enable give yes and to disable give no. | |
remove-duplicates | Value is set to Yes or No to remove duplicates or allow duplicate documents while indexing respectively. | |
boost | Boost search terms for the collection by setting a value greater than 1 (maximum value 9999). | |
stemming | Value is set to Yes or No to enable or disable stemming respectively. When stemming is enabled, inflected words are reduced to root form. For example, "running", "runs", and "ran" are the inflected form of run. | |
logging | Value is set to Yes or No to enable or disable logging respectively. | |
html-settings | description | This description setting configures the HTML parser to read the description for a document. You can specify any one of the following HTML tags to be read as description. Description, h1, h2, h3, h4 ,h5, h6. |
max-doc-age | Specifies the maximum allowable age in days of a document in the collection. By giving -1 we do not specify any maximum allowable age. | |
max-doc-size | Specifies the maximum allowable size in kilobytes of a document in the collection. By giving -1 we do not specify any maximum document size. | |
spider-max depth | Specifies the maximum depth the spider is allowed to proceed to index documents. Value can be specified from 1-10. | |
spider-max-delay | Specifies the wait time in milliseconds for the spider between HTTP requests to a web server. By giving 0 we specify no delay. | |
user-agent | Specifies the name under which the spider requests documents from a webserver. | |
referer | Specifies the URL value set in the request headers to specify where the user agent previously visited. | |
ignore-robots | Value is set to Yes or No to tell the spider to obey robot rules or not. | |
follow-sitemap | Value is set to Yes or No to tell the spider whether sitemaps alone can be indexed, or if all of the URLs have to be indexed respectively. | |
follow-redirect | Is set to Yes or No to instruct the spider to automatically follow redirects or not. | |
basic-auth-settings | username | These settings help in indexing content secured by HTTP Basic authentication. Username for basic authentication has to be specified for this attribute. |
password | Password for basic authentication has to be specified for this attribute. | |
form-auth-settings | form-url | These settings help in indexing content protected using form based authentication. Form-url is the ACTION URL of the authentication HTML form. |
Form-action | Specifies whether the form action is a POST or GET. | |
form – name, value | The set of name/value pairs that are required. For example, username and password information for authentication are set here. Example Name,Value Web User,myself Password,abc123 Login,true | |
proxy-settings | server-url | These settings help in indexing content through proxy servers. This specifies the URL to access the proxy server. |
username | When the proxy server requires authentication, set the username. | |
password | Set the password. |
Update the Scheduler Settings
Index URL
https://localhost:8443/searchblox/rest/collection/updateScheduler
Method
POST
Media Type
application/json
Headers
content-type : application/json
accept: application/json
Headers
SB-PKEY
LmfxTTDSeYxHTntJMHuhwRrGVICMaVN/wl/zPuQ3LtQDNRMnng5GpKIkgt0q1rCC/h6wDA==
Document Syntax
{
"apikey": "61282E82E5D6D8D409EFC87E8415CDAA",
"colname": "httpcollection",
"index":{
"frequency":"ONCE",
"timestamp":"21-01-2016 19:05:00"
},
"clear":{
"frequency":"MINUTELY",
"timestamp":"21-01-2016 18:05:00"
}
}
Document Description
XML Tag | Attribute | Value |
---|---|---|
apikey | API key accessible in the SearchBlox Admin Console. It is also present in the config.xml file. | |
colname | Name of the Collection | |
index | frequency | Specifies the frequency of indexing of web documents. The values can be ONCE, DAILY, MINUTELY, WEEKLY and MONTHLY. |
timestamp | Specifies the timestamp when the indexing has to start. Example: 21-01-2016 19:05:00. | |
clear | frequency | Specifies the frequency of clearing of indexed documents. The values can be ONCE, DAILY, MINUTELY, WEEKLY and MONTHLY. |
timestamp | Specifies the timestamp when the clearing has to occur. Example: 21-01-2016 19:05:00. |
Index or Stop Indexing the collection
Index URL
https://localhost:8443/searchblox/rest/collection/actions
Method
POST
Media Type
application/json
Headers
content-type : application/json
accept: application/json
Headers
SB-PKEY
LmfxTTDSeYxHTntJMHuhwRrGVICMaVN/wl/zPuQ3LtQDNRMnng5GpKIkgt0q1rCC/h6wDA==
Document Syntax
Indexing collection
{
"apikey": "61282E82E5D6D8D409EFC87E8415CDAA",
"colname": "httpcollection",
"action":"index"
}
Stop Indexing Collection
{
"apikey": "61282E82E5D6D8D409EFC87E8415CDAA",
"colname": "httpcollection",
"action":"stop"
}
Document Description
JSON fields | Value |
---|---|
apikey | API key is accessible in the SearchBlox Admin Console. It is also present in the config.xml file. |
colname | Name of the collection. |
action | - Specifies the type of action to be performed. Index is specified to start indexing the collection. - If the indexing process is going on, stop is specified to stop indexing the collection. |
Response Code Description
5000 | Collection Created Successfully |
50001 | Collection Exists/Collection Type Not Found |
50002 | Invalid JSON |
50003 | Collection Deleted Successfully |
50005 | Collection Path Saved Successfully |
50006 | Specified collection is not a CUSTOM collection |
50007 | Invalid Request/Collection Not Found |
50008 | Collection Indexing/Collection Indexing has been stopped/Collection schedule saved successfully |
50009 | Invalid Request/ Collection Not Found |
601 | API Key Not Valid |
Updated almost 3 years ago