Serve Watson NLP Models Using Knative Serving

Published in

IBM Data Science in Practice

4 min readMar 13, 2023

With IBM Watson NLP, IBM introduced a common library for natural language processing, document understanding, translation, and trust. IBM Watson NLP brings everything under one umbrella for consistency and ease of development and deployment. This tutorial walks you through the steps to serve pretrained Watson NLP models using Knative Serving in a Red Hat OpenShift cluster.

Knative Serving is an Open-Source Enterprise-level solution to build Serverless and Event Driven Applications in Kubernetes / OpenShift cluster. It supports horizontal autoscaling based on the requests that come into a service, allowing the service to scale down to zero replicas. For more information see https://knative.dev/docs/.

In this tutorial you will create a Knative Service to run the Watson NLP Runtime. Pods of this Knative Service specify Watson NLP pretrained model images as init containers. These init containers run to completion before the main application starts in the Pod. They will provision models to the emptyDir volume of the Pod. When the Watson NLP Runtime container starts, it loads the models and begins serving them.

Using this approach allows for models to be kept in separate container images from the runtime container image. To change the set of served models you need only to update the Knative Service Manifest.

Reference Architecture

In this tutorial we are using redhat openshift serverless operator. Red Hat® OpenShift® Serverless is a service based on the open source Knative project. It provides an enterprise-grade serverless platform which brings portability and consistency across hybrid and multi-cloud environments. Please follow the instruction below to to install knative serving

We are going to use watson nlp pretrained model. Create a Docker registry secret in the Kubernetes project that grants access to the Watson NLP Runtime and pretrained models

Configure Knative

The deployment approach that we use in this tutorial relies on capabilities of Knative Serving that are disabled by default. Below you will configure Knative Service to enable init containers and empty directories.

To apply the configuration, use the following command:

kubectl apply -f - <<EOF
apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
  name: knative-serving
  namespace: knative-serving
spec:
  config:
    features:
      kubernetes.podspec-init-containers: "enabled"
      kubernetes.podspec-volumes-emptydir: "enabled"
EOF

Deploy the NLP Model

In this step you will create a Knative Service to run the Watson NLP Runtime. When a Service is created, Knative does the following:

It creates a new immutable revision for this version of the application.
It creates a Route, Ingress, Service, and Load Balancer for your application.
It automatically scales replicas based on request load, including scaling to zero active replicas.

To create the Service, run the following command.

oc apply -f -<<EOF
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: watson-nlp-kn
spec:
  template:
    metadata:
      annotations:
        queue.sidecar.serving.knative.dev/resourcePercentage: "10"
    spec:
      initContainers:
      - name: ensemble-workflow-lang-en-tone-stock
        image: cp.icr.io/cp/ai/watson-nlp_classification_ensemble-workflow_lang_en_tone-stock:1.0.6
        volumeMounts:
        - name: model-directory
          mountPath: "/app/models"
        env:
        - name: ACCEPT_LICENSE
          value: 'true'
        resources:
          requests:
            memory: "100Mi"
            cpu: "100m"
          limits:
            memory: "200Mi"
            cpu: "200m"
      containers:
      - name: watson-nlp-runtime
        image: cp.icr.io/cp/ai/watson-nlp-runtime:1.0.18
        env:
        - name: ACCEPT_LICENSE
          value: 'true'
        - name: LOCAL_MODELS_DIR
          value: "/app/models"
        - name: LOG_LEVEL
          value: debug
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
          limits:
            memory: "4Gi"
            cpu: "2"
        ports:
        - containerPort: 8080
        volumeMounts:
        - name: model-directory
          mountPath: "/app/models"
      imagePullSecrets:
      - name: watson-nlp
      volumes:
      - name: model-directory
        emptyDir: {}
EOF

oc get configuration

You should see output similar to the following.

NAME            LATESTCREATED         LATESTREADY           READY   REASON
watson-nlp-kn   watson-nlp-kn-00001   watson-nlp-kn-00001   True

Set the URL for the Service in an environment variable.

export SERVICE_URL=$(oc get ksvc watson-nlp-kn  -o jsonpath="{.status.url}")

Test Knative Autoscaling

With the parameters used when creating the Service, Knative will autoscale Pods based on requests including scaling to zero when there are no requests. Run the following command to list the Pods in your OpenShift Project.

oc get pods

Pods belonging to the Knative Service should have the prefix watson-nlp-kn. Initially, there should be none. If you do see some, then wait for a minute or two and they will be automatically terminated.

Run following command to trigger the Knative Service to start up Pods.

curl ${SERVICE_URL}

Use ctrl-c to break out of the command.

You can watch the Pods being created to in response to the request, and then later terminated, using the following command.

oc get pods -w

The output will be similar to the following.

NAME                                              READY   STATUS     RESTARTS   AGE
watson-nlp-kn-00001-deployment-6f8b5d7494-cdvqb   0/2     Init:0/1   0          15s
watson-nlp-kn-00001-deployment-6f8b5d7494-cdvqb   0/2     PodInitializing   0          75s
watson-nlp-kn-00001-deployment-6f8b5d7494-cdvqb   1/2     Running           0          76s
watson-nlp-kn-00001-deployment-6f8b5d7494-cdvqb   2/2     Running           0          2m
watson-nlp-kn-00001-deployment-6f8b5d7494-cdvqb   2/2     Terminating       0          3m
watson-nlp-kn-00001-deployment-6f8b5d7494-cdvqb   1/2     Terminating       0          3m20s
watson-nlp-kn-00001-deployment-6f8b5d7494-cdvqb   1/2     Terminating       0          3m30s
watson-nlp-kn-00001-deployment-6f8b5d7494-cdvqb   0/2     Terminating       0          3m32s

Use ctrl-c to break out of the command.

Test the Service

In this step, you will make an inference request on the model using the REST interface. Exceute the following command.

curl -X POST "${SERVICE_URL}/v1/watson.runtime.nlp.v1/NlpService/ClassificationPredict" -H "accept: application/json" -H "grpc-metadata-mm-model-id: classification_ensemble-workflow_lang_en_tone-stock" -H "content-type: application/json" -d "{ \"rawDocument\": { \"text\": \"Watson nlp is awesome! works in knative\" }}" | jq

You will see output similar to the following.

{
  "classes": [
    {
      "className": "satisfied",
      "confidence": 0.6308287
    },
    {
      "className": "excited",
      "confidence": 0.5176963
    },
    {
      "className": "polite",
      "confidence": 0.3245624
    },
    {
      "className": "sympathetic",
      "confidence": 0.1331128
    },
    {
      "className": "sad",
      "confidence": 0.023583649
    },
    {
      "className": "frustrated",
      "confidence": 0.0158445
    },
    {
      "className": "impolite",
      "confidence": 0.0021891927
    }
  ],
  "producerId": {
    "name": "Voting based Ensemble",
    "version": "0.0.1"
  }
}

Conclusion

In this tutorial you deployed a pretrained Watson NLP Model on a Red Hat OpenShift cluster using Knative Service. Model images are specified as init-containers in the Kubernetes manifest. You further observed Knative autoscaling, including scaling to zero.