Deploy DeepSparse Server One-Click App

Overview

DeepSparse is a CPU inference runtime that takes advantage of sparsity to accelerate neural network inference. The server lets you to set up a model-serving endpoint running DeepSparse to send raw data to DeepSparse over HTTP and receive the post-processed predictions.

Configuration

DeepSparse server supports any task from DeepSparse, such as Pipelines including NLP, image classification, and object detection tasks. An up-to-date list of available tasks can be found in the DeepSparse Pipelines Introduction.

The default configuration of this app initialize DeepSparse server with the zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none BERT model, launched with the following command:

deepsparse.server --task sentiment_analysis --model_path zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none

You can customize the configuration of the DeepSparse server by adjusting the Docker args in the Koyeb Service settings page. For example, to perform object detection using a YOLOv8 model, change the model_path to zoo:cv/detection/yolov8-s/pytorch/ultralytics/coco/pruned50_quant-none and the task to yolov8.

Try it out

Once the DeepSparse server is deployed, you can start sending request to the /v2/models/sentiment_analysis/infer endpoint to get predictions. For example, to receive BERT's inference of the sentiment of a Tweet, you can send the following request:

$ curl https://<YOUR_APP_NAME>-<YOUR_KOYEB_ORG>.koyeb.app -X POST \
  -H "Content-Type: application/json" \
  -d '{"sequences": "Just deployed my @neuralmagic DeepSparse Server on @gokoyeb and I must say! Match made in heaven 😍"}'

{"labels":["negative"],"scores":[0.8791840076446533]}%

DeepSparse Server

Overview

Configuration

Try it out

Deploy AI apps to production in minutes