Integrating Custom ML Pipelines

SearchBlox PreText enables you to:

  1. Design, build, and deploy custom machine learning pipelines for natural language processing (NLP)
  2. Deploy your trained ML models as scalable API endpoints
  3. Apply these custom processing pipelines to text data across any SearchBlox collection

The platform supports seamless integration of specialized NLP models while maintaining full compatibility with all existing SearchBlox collections.

ML Pipeline - JSON Request

ML Pipeline - JSON Request

Request Endpoint Sample
https://your_ML_server/generate_keywords

Request Method
POST

Request Body Fields in JSON Payload

{
  "content": "Text content to process",
  "request_id": "unique_identifier",
  "url": "source_url",
  "entity": "entity_type",
  "collection": "target_collection",
  "labels": ["optional","tags"]
}
  • Labels refer to auto-classification ML labels defined within the PreText console.
  • All field values are sent by SearchBlox via POST requests to the configured ML endpoint.

Fields

content: Input text for NLP processing request_id: Unique tracking identifier url: Source document URL entity: Business entity context collection: Target SearchBlox collection labels: Optional classification tags

SAMPLE REQUEST

{
  "content": "President Joe Biden, backed by the full symbolic power of the Western alliance, is locked in a showdown with Russian President Vladimir Putin, who is using Ukraine as a hostage to try to force the US to renegotiate the settled outcome of the Cold War. Neither man is blinking. To do so may be unfeasible, given the huge political stakes both have wagered. new one 11",
  "request_id": "d7e66ca264d18a7b8ba54a5e0778be4e",
  "url": "https://edition.cnn.com/2022/01/21/politics/joe-biden-vladimir-putin-us-russia-ukraine/index.html",
  "entity": true,
  "collection": "cnn",
  "labels": "Business,Investment"
}
  • The above is a sample request body sent by SearchBlox to the custom ML endpoint as a JSON POST.

ML Pipeline - JSON Response

Response Body Fields in JSON Format

"collection"  
"title"  
"description"  
"request_id"  
"response_time"  
"topic"  
"sentimentLabel"  
"sentimentScore"  
"entity_org"  
"entity_product"  
"entity_person"  
"entity_loc"  
"entity_gpe"

Field Descriptions:

Standard Fields:

collection: The target collection name (echoes the value from the request)

request_id: Corresponds to the original request identifier

response_time: The processing duration, measured in milliseconds
Content Enrichment:

title: Machine learning–generated document title

description: Machine learning–generated summary

topic: Primary content classification
Sentiment Analysis:

sentimentLabel: Categorical sentiment classification

sentimentScore: Numerical sentiment confidence score ranging from -1.0 to 1.0

Named Entities (arrays):

entity_org: Organizations

entity_product: Products

entity_person: People

entity_loc: Locations

entity_gpe: Geopolitical entities

SAMPLE JSON RESPONSE

{
    "collection": "cnn",
    "description": "Russian President Vladimir Putin is using Ukraine as a hostage to try to force the US to renegotiate the settled outcome of the Cold War. Neither man is blinking. To do so may be unfeasible, given the huge political stakes both have wagered their political lives on.",
    "entity_gpe": "Ukraine,US",
    "entity_person": "Joe Biden,Vladimir Putin",
    "request_id": "d7e66ca264d18a7b8ba54a5e0778be4e",
    "response_time": 0,
    "sentimentLabel": "NEGATIVE",
    "sentimentScore": "0.8956964015960693",
    "title": "Biden vs Putin - New One 11",
    "topic": "Investment"
}
  • The above is a sample JSON response returned by the custom ML endpoint to SearchBlox for indexing.

  • If the ML field values are empty, SearchBlox will omit the corresponding meta fields from the collection’s search response.