Integrating Custom ML Pipelines

Implementing Custom ML Pipelines with SearchBlox PreText

SearchBlox PreText enables you to:

  1. Develop and implement custom machine learning pipelines for natural language processing
  2. Deploy your ML models as API endpoints
  3. Apply these custom processing pipelines to text content from any SearchBlox collection
    The system supports integration of your specialized NLP models while maintaining compatibility with all existing collections.

ML Pipeline - JSON Request

Request Endpoint Sample
https://your_ML_server/generate_keywords

Request Method
POST

Request Body Fields in JSON Payload

{
  "content": "Text content to process",
  "request_id": "unique_identifier",
  "url": "source_url",
  "entity": "entity_type",
  "collection": "target_collection",
  "labels": ["optional","tags"]
}
  • Here, labels are auto-classification ML labels that are entered in the PreText console.
  • All the field values are sent by SearchBlox along with POST requests using a defined ML endpoint.

Fields

content: Input text for NLP processing request_id: Unique tracking identifier url: Source document URL entity: Business entity context collection: Target SearchBlox collection labels: Optional classification tags

SAMPLE REQUEST

{
  "content": "President Joe Biden, backed by the full symbolic power of the Western alliance, is locked in a showdown with Russian President Vladimir Putin, who is using Ukraine as a hostage to try to force the US to renegotiate the settled outcome of the Cold War. Neither man is blinking. To do so may be unfeasible, given the huge political stakes both have wagered. new one 11",
  "request_id": "d7e66ca264d18a7b8ba54a5e0778be4e",
  "url": "https://edition.cnn.com/2022/01/21/politics/joe-biden-vladimir-putin-us-russia-ukraine/index.html",
  "entity": true,
  "collection": "cnn",
  "labels": "Business,Investment"
}
  • Above is a sample request body sent by SearchBlox using the custom ML endpoint as a JSON post.

ML Pipeline - JSON Response

Response Body Fields in JSON Format

"collection"  
"title"  
"description"  
"request_id"  
"response_time"  
"topic"  
"sentimentLabel"  
"sentimentScore"  
"entity_org"  
"entity_product"  
"entity_person"  
"entity_loc"  
"entity_gpe"

Field Descriptions:

Standard Fields:
collection: Target collection name (echoes request value)
request_id: Correlates with initial request identifier
response_time: Processing duration in milliseconds
Content Enrichment:
title: ML-generated document title
description: ML-generated summary
topic: Primary content classification
Sentiment Analysis:
sentimentLabel: Categorical sentiment
sentimentScore: Numerical confidence (-1.0 to 1.0)
Named Entities (arrays):
entity_org: Organizations
entity_product: Products
entity_person: People
entity_loc: Locations
entity_gpe: Geopolitical entities

SAMPLE JSON RESPONSE

{
    "collection": "cnn",
    "description": "Russian President Vladimir Putin is using Ukraine as a hostage to try to force the US to renegotiate the settled outcome of the Cold War. Neither man is blinking. To do so may be unfeasible, given the huge political stakes both have wagered their political lives on.",
    "entity_gpe": "Ukraine,US",
    "entity_person": "Joe Biden,Vladimir Putin",
    "request_id": "d7e66ca264d18a7b8ba54a5e0778be4e",
    "response_time": 0,
    "sentimentLabel": "NEGATIVE",
    "sentimentScore": "0.8956964015960693",
    "title": "Biden vs Putin - New One 11",
    "topic": "Investment"
}
  • Above is a sample JSON response provided by the custom ML endpoint as responses to SearchBlox for indexing.

  • When the ML field values are empty, SearchBlox will not add the meta field in the SearchBlox collection search response.