model one-click apps

Deploy model apps with a click, get started with Koyeb in seconds.

/Featured
Ollama
Ollama

Ollama

Ollama is a self-hosted AI solution to run open-source large language models on your own infrastructure.

Get started
Flux.1 [dev]
Flux.1 [dev]
Deploy Flux.1 [dev] behind a dedicated API endpoint on Koyeb GPU for high-performance, low-latency, and efficient inference.
Gemma 2 2B
Gemma 2 2B
Deploy Gemma 2 2B with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.
Gemma 2 9B
Gemma 2 9B
Deploy Gemma 2 9B with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.
Hermes 3 Llama-3.1 8B
Hermes 3 Llama-3.1 8B
Deploy NousResearch Hermes 3 on Koyeb high-performance GPU.
Llama 3.1 8B Instruct
Llama 3.1 8B Instruct
Deploy Llama 3.1 8B Instruct on Koyeb high-performance GPU.
Phi-4
Phi-4
Deploy Phi-4 on Koyeb high-performance GPU.
Mistral 7B Instruct v0.3
Mistral 7B Instruct v0.3
Deploy Mistral 7B Instruct v0.3 with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.
Mistral Nemo Instruct
Mistral Nemo Instruct
Deploy Mistral Nemo Instruct with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.
Pixtral 12B
Pixtral 12B
Deploy Pixtral 12B with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.
Qwen 2.5 1.5B Instruct
Qwen 2.5 1.5B Instruct
Deploy Qwen 2.5 1.5B Instruct with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.
Qwen 2.5 14B Instruct
Qwen 2.5 14B Instruct
Deploy Qwen 2.5 14B Instruct with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.
Qwen 2.5 3B Instruct
Qwen 2.5 3B Instruct
Deploy Qwen 2.5 3B Instruct with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.
Qwen 2.5 7B Instruct
Qwen 2.5 7B Instruct
Deploy Qwen 2.5 7B Instruct with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.
Qwen 2.5 Coder 7B Instruct
Qwen 2.5 Coder 7B Instruct
Deploy Qwen 2.5 Coder 7B Instruct with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.
Qwen 2 VL 7B Instruct
Qwen 2 VL 7B Instruct
Deploy Qwen 2 VL 7B Instruct with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.
SmolLM2 1.7B Instruct
SmolLM2 1.7B Instruct
Deploy SmolLM2 1.7B Instruct on Koyeb high-performance GPU.

Deploy AI apps to production in minutes

Koyeb is a developer-friendly serverless platform to deploy apps globally. No-ops, servers, or infrastructure management.
All systems operational
© Koyeb