Serverless GPUs: Slashing L4, L40S, A100 Prices and Increasing Efficiency
December comes with some magic for serverless GPUs: yesterday scale-to-zero and today we are dropping our GPU prices!
Combine scale-to-zero, autoscaling, and better prices and you get huge improvements in efficiency - more compute for less. Here is the update:
- L4:
$1.00→ $0.70/hour - L40S:
$2→ $1.55/hour - A100:
$2.70→ $2/hour
TL;DR we lowered prices for our L4, L40S and A100 Serverless GPUs, making high-performance serverless GPUs more affordable, especially combined with scale-to-zero!
Oh, and with our new website we're defaulting to displaying prices by the hour. We're still charging by the second, but it's a bit more readable:
Serverless GPU Instance | Price Per Hour | VRAM | vCPU | RAM |
---|---|---|---|---|
RTX 400 SFF ADA | $0.50/hour | 20 GB | 6 | 44GB |
L4 | $0.70/hour | 24 GB | 6 | 32GB |
RTX A6000 | $0.75/hour | 48 GB | 6 | 64GB |
L40S | $1.55/hour | 48 GB | 15 | 64GB |
A100 | $2.00/hour | 80 GB | 15 | 180GB |
H100 | $3.30/hour | 80 GB | 15 | 180GB |
You can use your AI infrastructure budget to run larger models, do more predictions, or enjoy better performance!
Getting started with GPUs
To get started and deploy your first service backed by a GPU, you can use the CLI or the Dashboard.
As usual, you can deploy using pre-built containers or directly connect your GitHub repository and let Koyeb handle the build of your applications.
Here is how you can deploy a Ollama service in less than 60 seconds with a single CLI command:
koyeb app init ollama \
--docker ollama/ollama \
--instance-type l4 \
--regions par \
--port 11434:http \
--route /:11434 \
--docker-command serve
That's it! In less than 60 seconds, you will have Ollama running on Koyeb using a L4 GPU. Next step, pull your favorite models and start interacting with them!
Unified Per-Second Billing
Understanding your costs upfront and paying only for what you use is crucial when using serverless GPUs or any AI infrastructure. That’s why we bill per second with one single price.
Whether you’re running quick experiments, fine-tuning models, or deploying to production, you’re only charged for the exact time in seconds your GPU Instances are running. With a transparent view of costs, you can confidently develop, build, and scale AI projects.
Multi-GPU A100 Instances
Whether you want to run inference with a 7B model or a 400B model, we offer Instances with multiple A100 options. They all benefit from the A100 price drop:
Serverless GPU Instance | Price Per Hour | VRAM | vCPU | RAM |
---|---|---|---|---|
A100 | $2.00/hour | 80 GB | 15 | 180GB |
2x A100 | $4.00/hour | 160 GB | 30 | 360GB |
4x A100 | $8.00/hour | 320 GB | 60 | 720GB |
8x A100 | $16.00/hour | 640 GB | 120 | 1.44TB |
Serverless GPUs allow you to build AI applications, perform real-time inference, fine-tune and train custom models on high-performance hardware — all without the complexity of managing underlying infrastructure.
Deploy your AI workloads worldwide on world-class infrastructure in seconds.
Need more GPUs?
GPUs are available in self-service. Would you like to chat and discuss production or volume needs? Book a quick call with someone from the team to tell us about your deployment needs and get onboarded.
Since first releasing serverless GPUs on the platform in June, we’ve expanded your AI toolkit with support for scale-to-zero, Volumes, better logs on deployment and so much more.
Want to start using Koyeb Serverless GPUs for your workloads?
- Start deploying via the control panel
- Use the Koyeb CLI to deploy your services
- Read the Koyeb documentation to learn more about deploying your services and workloads
- Book a call with the team, discuss your deployment needs, and start running on the best infrastructure for your AI workloads
We can’t wait to see what you build with our Serverless GPUs!