Dec 12, 2024
4 min read

Serverless GPUs: Slashing L4, L40S, A100 Prices and Increasing Efficiency

December comes with some magic for serverless GPUs: yesterday scale-to-zero and today we are dropping our GPU prices!

Combine scale-to-zero, autoscaling, and better prices and you get huge improvements in efficiency - more compute for less. Here is the update:

  • L4: $1.00 → $0.70/hour
  • L40S: $2 → $1.55/hour
  • A100: $2.70 → $2/hour

TL;DR we lowered prices for our L4, L40S and A100 Serverless GPUs, making high-performance serverless GPUs more affordable, especially combined with scale-to-zero!

Oh, and with our new website we're defaulting to displaying prices by the hour. We're still charging by the second, but it's a bit more readable:

Serverless GPU InstancePrice Per HourVRAMvCPURAM
RTX 400 SFF ADA$0.50/hour20 GB644GB
L4$0.70/hour24 GB632GB
RTX A6000$0.75/hour48 GB664GB
L40S$1.55/hour48 GB1564GB
A100$2.00/hour80 GB15180GB
H100$3.30/hour80 GB15180GB

You can use your AI infrastructure budget to run larger models, do more predictions, or enjoy better performance!

Getting started with GPUs

To get started and deploy your first service backed by a GPU, you can use the CLI or the Dashboard.

As usual, you can deploy using pre-built containers or directly connect your GitHub repository and let Koyeb handle the build of your applications.

Here is how you can deploy a Ollama service in less than 60 seconds with a single CLI command:

koyeb app init ollama \  
--docker ollama/ollama \  
--instance-type l4 \  
--regions par \  
--port 11434:http \  
--route /:11434 \  
--docker-command serve

That's it! In less than 60 seconds, you will have Ollama running on Koyeb using a L4 GPU. Next step, pull your favorite models and start interacting with them!

Unified Per-Second Billing

Understanding your costs upfront and paying only for what you use is crucial when using serverless GPUs or any AI infrastructure. That’s why we bill per second with one single price.

Serverless GPUs Pricing

Whether you’re running quick experiments, fine-tuning models, or deploying to production, you’re only charged for the exact time in seconds your GPU Instances are running. With a transparent view of costs, you can confidently develop, build, and scale AI projects.

Multi-GPU A100 Instances

Whether you want to run inference with a 7B model or a 400B model, we offer Instances with multiple A100 options. They all benefit from the A100 price drop:

Serverless GPU InstancePrice Per HourVRAMvCPURAM
A100$2.00/hour80 GB15180GB
2x A100$4.00/hour160 GB30360GB
4x A100$8.00/hour320 GB60720GB
8x A100$16.00/hour640 GB1201.44TB

Serverless GPUs allow you to build AI applications, perform real-time inference, fine-tune and train custom models on high-performance hardware — all without the complexity of managing underlying infrastructure.

Run AI Workloads in Seconds

Deploy your AI workloads worldwide on world-class infrastructure in seconds.

Deploy Now

Need more GPUs?

GPUs are available in self-service. Would you like to chat and discuss production or volume needs? Book a quick call with someone from the team to tell us about your deployment needs and get onboarded.

Since first releasing serverless GPUs on the platform in June, we’ve expanded your AI toolkit with support for scale-to-zero, Volumes, better logs on deployment and so much more.

Want to start using Koyeb Serverless GPUs for your workloads?

We can’t wait to see what you build with our Serverless GPUs!


Deploy AI apps to production in minutes

Koyeb is a developer-friendly serverless platform to deploy apps globally. No-ops, servers, or infrastructure management.
All systems operational
© Koyeb