Get Started with Serving Watson NLP Models

Published in

IBM Data Science in Practice

4 min readDec 7, 2022

The IBM Watson NLP Library for Embed is a containerized library intended to be embedded into partner applications in order to provide natural language capabilities.

This article will take you through the steps to start serving Watson NLP models using standalone containers. The idea is to build a container image that includes both the models to be served, as well as the runtime that serves the models. When the container runs it exposes REST and gRPC endpoints that client programs can use to make inference requests against the models.

This is a flexible approach to serving NLP models, since these containers can run anywhere. You can serve the models on your laptop using Docker, which is useful when learning the technology or developing applications. The same image can also be deployed on a cloud container service like AWS ECS or IBM Code Engine; or on a Kubernetes or OpenShift cluster.

Build a Container Image to Serve Models

The Watson NLP Runtime is packaged as a container image hosted on the IBM Entitled Registry. You will need an entitlement key to access it. The Runtime image itself does not contain any models. However there are hundreds of pretrained models available that can be found on the registry. You can also train your own models using e.g. Watson Studio on IBM Cloud.

Below, we will serve two Watson NLP pretrained models: one for sentiment analysis, and the other for tone classification.

Create a Dockerfile with the following content.

ARG WATSON_RUNTIME_BASE="cp.icr.io/cp/ai/watson-nlp-runtime:1.0.18"
ARG SENTIMENT_MODEL="cp.icr.io/cp/ai/watson-nlp_sentiment_aggregated-cnn-workflow_lang_en_stock:1.0.6"
ARG EMOTION_MODEL="cp.icr.io/cp/ai/watson-nlp_classification_ensemble-workflow_lang_en_tone-stock:1.0.6"FROM ${SENTIMENT_MODEL} as model1
RUN ./unpack_model.shFROM ${EMOTION_MODEL} as model2
RUN ./unpack_model.shFROM ${WATSON_RUNTIME_BASE} as releaseRUN true && \
    mkdir -p /app/modelsENV LOCAL_MODELS_DIR=/app/models
COPY --from=model1 app/models /app/models
COPY --from=model2 app/models /app/models

Notice that each of the pretrained models is in container image. During the build phase, the script unpack_model.sh is run for each model. Its role is to unzip the model file that is stored in the model image, and copy the resulting files to the /app/models directory within the container's filesystem.

The unpacking script is especially useful when serving models on a Kubernetes or OpenShift cluster, as it allows models to be specified as init containers of a Pod. We won’t go into those details in this article though.

The final container image generated with this Dockerfile uses the Watson NLP Runtime image as the base image. Model files are copied from the intermediate images to /app/models directory. The environment variable LOCAL_MODELS_DIR is set in order to tell the Watson NLP Runtime where to find the models it is supposed to serve.

Build the image with the following command. (Note that Docker will have to be logged in to the IBM Entitled Registry.)

docker build . -t watson-nlp-container:v1

This will create a Docker image named watson-nlp-container:v1.

Run and Test the Service

Start the model service on your local host.

docker run -d -e ACCEPT_LICENSE=true -p 8080:8080 watson-nlp-container:v1

The models are now being served through a REST endpoint at port 8080. You can make inference requests on one of the served models using curl.

curl -X POST "http://localhost:8080/v1/watson.runtime.nlp.v1/NlpService/SentimentPredict" -H  "accept: application/json" -H  "grpc-metadata-mm-model-id: sentiment_aggregated-cnn-workflow_lang_en_stock" -H  "content-type: application/json" -d "{  \"rawDocument\": {    \"text\": \"it's a great day for a picnic\"  },  \"languageCode\": \"en\"}" | jq -r

You will see output similar to the following, in this case providing sentiment analysis of the input text.

{
  "documentSentiment": {
    "score": 0.966315,
    "label": "SENT_POSITIVE",
    "mixed": false,
    "sentimentMentions": [
      {
        "span": {
          "begin": 0,
          "end": 29,
          "text": "it's a great day for a picnic"
        },
        "sentimentprob": {
          "positive": 0.9392276,
          "neutral": 0.044121116,
          "negative": 0.016651293
        }
      }
    ]
  },
  "targetedSentiments": {
    "targetedSentiments": {},
    "producerId": {
      "name": "Aggregated Sentiment Workflow",
      "version": "0.0.1"
    }
  },
  "producerId": {
    "name": "Aggregated Sentiment Workflow",
    "version": "0.0.1"
  }
}

You can further explore the REST API using the Swagger documentation which you can view with your browser at the following address.

http://localhost:8080/swagger/

Conclusion

We have seen in this article that it can be very simple to get started serving models using the Watson NLP Runtime. Though the approach of building standalone container images gives us flexibility in terms of deployment, it may not always be the best approach.

Whenever there is a change in the set of models to be served, you will need to build a new container image and restart the service. This can be a burden when you have many models to serve in a production system. Fortunately, there are other approaches that can be used when serving on Kubernetes or OpenShift clusters that allow us to avoid these drawbacks.

Get ready to Embed AI

The IBM Build Lab team is here to work with you on your AI journey.

As a partner, you can start your AI journey by browsing and building AI models through a Digital Co-Create guided wizard.

You can further browse the collection of self-serve assets on Github, and if you are an IBM Business Partner, you can also start building AI solutions on Tech Zone.

Get Started with Serving Watson NLP Models

Build a Container Image to Serve Models

Run and Test the Service

Conclusion

Get ready to Embed AI

Written by Michael Spriggs