Serverless GPUs: Slashing L4, L40S, A100 Prices and Increasing Efficiency

December comes with some magic for serverless GPUs: yesterday scale-to-zero and today we are dropping our GPU prices!

Combine scale-to-zero, autoscaling, and better prices and you get huge improvements in efficiency - more compute for less. Here is the update:

L4: ~~$1.00~~ → $0.70/hour
L40S: $2 → $1.55/hour
A100: ~~$2.70~~ → $2/hour

TL;DR we lowered prices for our L4, L40S and A100 Serverless GPUs, making high-performance serverless GPUs more affordable, especially combined with scale-to-zero!

Oh, and with our new website we're defaulting to displaying prices by the hour. We're still charging by the second, but it's a bit more readable:

Serverless GPU Instance	Price Per Hour	VRAM	vCPU	RAM
RTX 400 SFF ADA	$0.50/hour	20 GB	6	44GB
L4	$0.70/hour	24 GB	6	32GB
RTX A6000	$0.75/hour	48 GB	6	64GB
L40S	$1.55/hour	48 GB	15	64GB
A100	$2.00/hour	80 GB	15	180GB
H100	$3.30/hour	80 GB	15	180GB

You can use your AI infrastructure budget to run larger models, do more predictions, or enjoy better performance!

Getting started with GPUs

To get started and deploy your first service backed by a GPU, you can use the CLI or the Dashboard.

As usual, you can deploy using pre-built containers or directly connect your GitHub repository and let Koyeb handle the build of your applications.

Here is how you can deploy a Ollama service in less than 60 seconds with a single CLI command:

koyeb app init ollama \
--docker ollama/ollama \
--instance-type l4 \
--regions par \
--port 11434:http \
--route /:11434 \
--docker-command serve

That's it! In less than 60 seconds, you will have Ollama running on Koyeb using a L4 GPU. Next step, pull your favorite models and start interacting with them!

Unified Per-Second Billing

Understanding your costs upfront and paying only for what you use is crucial when using serverless GPUs or any AI infrastructure. That’s why we bill per second with one single price.

Serverless GPUs Pricing

Whether you’re running quick experiments, fine-tuning models, or deploying to production, you’re only charged for the exact time in seconds your GPU Instances are running. With a transparent view of costs, you can confidently develop, build, and scale AI projects.

Multi-GPU A100 Instances

Whether you want to run inference with a 7B model or a 400B model, we offer Instances with multiple A100 options. They all benefit from the A100 price drop:

Serverless GPU Instance	Price Per Hour	VRAM	vCPU	RAM
A100	$2.00/hour	80 GB	15	180GB
2x A100	$4.00/hour	160 GB	30	360GB
4x A100	$8.00/hour	320 GB	60	720GB
8x A100	$16.00/hour	640 GB	120	1.44TB

Serverless GPUs allow you to build AI applications, perform real-time inference, fine-tune and train custom models on high-performance hardware — all without the complexity of managing underlying infrastructure.

Run AI Workloads in Seconds

Deploy your AI workloads worldwide on world-class infrastructure in seconds.

Deploy Now

Need more GPUs?

GPUs are available in self-service. Would you like to chat and discuss production or volume needs? Book a quick call with someone from the team to tell us about your deployment needs and get onboarded.

Since first releasing serverless GPUs on the platform in June, we’ve expanded your AI toolkit with support for scale-to-zero, Volumes, better logs on deployment and so much more.

Want to start using Koyeb Serverless GPUs for your workloads?

Start deploying via the control panel
Use the Koyeb CLI to deploy your services
Read the Koyeb documentation to learn more about deploying your services and workloads
Book a call with the team, discuss your deployment needs, and start running on the best infrastructure for your AI workloads

We can’t wait to see what you build with our Serverless GPUs!

Getting started with GPUs

Unified Per-Second Billing

Multi-GPU A100 Instances

Need more GPUs?

Deploy AI apps to production in minutes