model one-click apps

Deploy model apps with a click, get started with Koyeb in seconds.

/Featured

Ollama

Ollama

Ollama is a self-hosted AI solution to run open-source large language models on your own infrastructure.

DeepSeek-R1 0528 Qwen3 8B

Deploy DeepSeek-R1 0528 Qwen3 8B with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

DeepSeek-R1 Llama 70B

Deploy DeepSeek-R1 Llama 70B with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

DeepSeek-R1 Llama 8B

Deploy DeepSeek-R1 Llama 8B with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

DeepSeek-R1 Qwen 14B

Deploy DeepSeek-R1 Qwen 14B with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

DeepSeek-R1 Qwen 32B

Deploy DeepSeek-R1 Qwen 32B with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

DeepSeek-R1 Qwen 7B

Deploy DeepSeek-R1 Qwen 7B with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Deploy Flux.1 [dev] behind a dedicated API endpoint on Koyeb GPU for high-performance, low-latency, and efficient inference.

FLUX.1 Kontext [dev]

Deploy FLUX.1 Kontext [dev] behind a dedicated API endpoint on Koyeb GPU for high-performance, low-latency, and efficient inference.

Deploy Gemma 2 2B with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Deploy Gemma 2 9B with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Deploy Gemma 3 27B with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Deploy OpenAI gpt-oss-20b with SGLang on Koyeb GPU for high-performance, low-latency, and efficient inference.

Hermes 3 Llama-3.1 8B

Deploy NousResearch Hermes 3 on Koyeb high-performance GPU.

Llama 3.1 8B Instruct

Deploy Llama 3.1 8B Instruct on Koyeb high-performance GPU.

Llama 4 Scout Instruct

Deploy Llama 4 Scout Instruct with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Deploy Phi-4 on Koyeb high-performance GPU.

Phi-4 Multimodal Instruct

Deploy Phi-4 Multimodal Instruct on Koyeb high-performance GPU.

Mistral 7B Instruct v0.3

Deploy Mistral 7B Instruct v0.3 with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Mistral Devstral Small

Deploy Mistral Devstral with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Mistral Magistral Small 2506

Deploy Magistral with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Mistral Nemo Instruct

Deploy Mistral Nemo Instruct with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Mistral Small 3.1 Instruct

Deploy Mistral Small 3.1 Instruct with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Mistral Small 3 Instruct

Deploy Mistral Small 3 Instruct with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Deploy Pixtral 12B with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Pruna AI Flux.1 [dev] Juiced

Deploy Flux.1 [dev], optimized with Pruna AI, achieving 5x to 9x speedups over the base model with lossless quality, on a dedicated API endpoint powered by Koyeb GPUs for high-performance, low-latency, and efficient inference.

Qwen 2.5 1.5B Instruct

Deploy Qwen 2.5 1.5B Instruct with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Qwen 2.5 14B Instruct

Deploy Qwen 2.5 14B Instruct with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Qwen 2.5 32B Instruct

Deploy Qwen 2.5 32B Instruct with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Qwen 2.5 3B Instruct

Deploy Qwen 2.5 3B Instruct with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Qwen 2.5 72B Instruct

Deploy Qwen 2.5 72B Instruct with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Qwen 2.5 7B Instruct

Deploy Qwen 2.5 7B Instruct with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Qwen 2.5 Coder 14B Instruct

Deploy Qwen 2.5 Coder 14B Instruct with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Qwen 2.5 Coder 32B Instruct

Deploy Qwen 2.5 Coder 32B Instruct with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Qwen 2.5 Coder 7B Instruct

Deploy Qwen 2.5 Coder 7B Instruct with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Qwen 2.5 Math 7B Instruct

Deploy Qwen 2.5 Math 7B Instruct with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Qwen 2.5 VL 32B Instruct

Deploy Qwen 2.5 VL 32B Instruct with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Qwen 2.5 VL 72B Instruct

Deploy Qwen 2.5 VL 72B Instruct with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Qwen 2.5 VL 7B Instruct

Deploy Qwen 2.5 VL 7B Instruct with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Qwen 2 VL 7B Instruct

Deploy Qwen 2 VL 7B Instruct with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Deploy Qwen 3 14B with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Deploy Qwen 3 235B with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Deploy Qwen 3 32B with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Deploy Qwen 3 8B with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Qwen 3 Next 80B A3B Instruct

Deploy Qwen Next 80B A3B Instruct with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Qwen-Image-Edit

Deploy Qwen-Image-Edit with FastAPI on Koyeb GPU for high-performance, low-latency, and efficient image editing.

Deploy QwQ 32B with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

R1 1776 Distill Llama 70B

Deploy R1 1776 Distill Llama 70B with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

ResembleAI Chatterbox

Deploy ResembleAI Chatterbox behind a dedicated API endpoint on Koyeb GPU for high-performance, low-latency, and efficient inference.

SmolLM2 1.7B Instruct

Deploy SmolLM2 1.7B Instruct on Koyeb high-performance GPU.

Deploy AI apps to production in minutes

Koyeb is a developer-friendly serverless platform to deploy apps globally. No-ops, servers, or infrastructure management.

All systems operational

Product

overview pricing changelog public roadmap

Resources

documentation tutorials community api deploy startup program system status

Company

blog customer stories partners events careers company terms of service privacy policy

© Koyeb