Qwen 2.5 VL 72B Instruct
Deploy Qwen 2.5 VL 72B Instruct with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.
Deploy Qwen 2.5 VL 72B Instruct vision-language model on Koyeb’s high-performance cloud infrastructure.
With one click, get a dedicated GPU-powered inference endpoint ready to handle requests with built-in autoscaling and scale-to-zero.
Get $200 in credit on your first invoice!
Overview of Qwen 2.5 VL 72B Instruct
As part of Qwen’s new flagship vision-language model family, Qwen 2.5 VL 72B Instruct is an advanced open-source model designed for both visual and textual understanding. With 72 billion parameters, it excels at analyzing text, charts, icons, graphics, and layouts within images—beyond standard object detection. Its capabilities make it ideal for tasks such as image captioning, visual question answering, content generation, and generating structured outputs
Qwen 2.5 VL 72B Instruct will be served using vLLM inference engine, optimized for high-throughput and low-latency model serving.
The default GPU for running this model is the Nvidia 2xA100 instance type. You are free to adjust the GPU instance type to fit your workload requirements.
Quickstart
The Qwen 2.5 VL 72B Instruct one-click model is powered by the vLLM engine. vLLM is an advanced inference engine designed for high-throughput and low-latency model serving. Optimized for large language models, it provides efficient performance and compatibility with the OpenAI API.
After you deploy the model, copy the Koyeb App public URL similar to https://<YOUR_DOMAIN_PREFIX>.koyeb.app
and create a simple Python file with the following content to start interacting with the model.
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("OPENAI_API_KEY", "fake"),
base_url="https://<YOUR_DOMAIN_PREFIX>.koyeb.app/v1",
)
chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://images.unsplash.com/photo-1506744038136-46273834b3fb"
},
},
{"type": "text", "text": "Describe the image."},
],
},
],
model="Qwen/Qwen2.5-VL-72B-Instruct",
max_tokens=50,
)
print(chat_completion.to_json(indent=4))
The snippet above is using the OpenAI SDK to interact with the Qwen 2.5 VL 72B Instruct model thanks to vLLM's OpenAI compatibility.
Take care to replace the base_url
value in the snippet with your Koyeb App public URL.
Executing the Python script will return the model's response to the input message.
python main.py
{
"id": "chatcmpl-fc48cf82-0a4d-9041-bd86-3cfcd13d1e73",
"choices": [
{
"finish_reason": "length",
"index": 0,
"logprobs": null,
"message": {
"content": "This image captures a serene mountain landscape during sunrise or sunset. The sky is painted with hues of pink and orange, transitioning into a soft blue, creating a warm and tranquil atmosphere. In the foreground, a calm river flows gently, with rocks and patches",
"role": "assistant",
"tool_calls": [],
"reasoning_content": null
},
"stop_reason": null
}
],
"created": 1738961145,
"model": "Qwen/Qwen2.5-VL-72B-Instruct",
"object": "chat.completion",
"usage": {
"completion_tokens": 50,
"prompt_tokens": 16250,
"total_tokens": 16300,
"prompt_tokens_details": null
},
"prompt_logprobs": null
}
Securing the Inference Endpoint
To ensure that only authenticated requests are processed, we recommend setting up an API key to secure your inference endpoint. Follow these steps to configure the API key:
- Generate a strong, unique API key to use for authentication.
- Navigate to your Koyeb Service settings.
- Add a new environment variable named
VLLM_API_KEY
and set its value to your secret API key. - Save the changes and redeploy to update the service.
Once the service is updated, all requests to the inference endpoint will require the API key.
When making requests, ensure the API key is included in the headers. If you are using the OpenAI SDK, you can provide the API key through the api_key
parameter when instantiating the OpenAI client. Alternatively, you can set the API key using the OPENAI_API_KEY
environment variable. For example:
OPENAI_API_KEY=<YOUR_API_KEY> python main.py