Jun 18, 2024
5 min read

Serverless GPUs Public Preview: Run AI workloads on H100, A100, L40S, and more

Welcome to day two of Koyeb launch week. Today we're announcing not one, but two major pieces of news:

  • GPUs are now available in public preview: Everyone can now deploy and run GPU-accelerated workloads on Koyeb.
  • H100 and A100 access: With 80GB of vRAM, these cards are ideal for generative AI processes including large language models, recommendation models, and video and image generation.

Our lineup ranges from 20GB to 80GB of vRAM with A100 and H100 cards. You can now run high-precision calculations with FP64 instructions support and a gigantic 2TB/s of bandwidth on the H100.

RTX 4000 SFF ADAL4L40SA100H100
GPU vRAM20GB24GB48GB80GB80GB
GPU Memory Bandwidth280GB/s300GB/s864GB/s2TB/s2TB/s
FP64---9.7 TFLOPS26 TFLOPS
FP3219.2 TFLOPS30.3 TFLOPS91.6 TFLOPS19.5 TFLOPS51 TFLOPS
FP8-240 TFLOPS733 TFLOPS-1513 TFLOPS
RAM44GB44GB92GB180GB180GB
Dedicated vCPUs615301515
On-demand price$0.50/hr$1/hr$2/hr$2.70/hr$3.30/hr

With prices ranging from $0.50/hr to $3.30/hr and always billed by the second, you'll be able to run training, fine-tuning, and inference workloads with a card adapted to your needs.

Just like any other Instances, GPUs can autoscale based on different criteria including requests per second, concurrent connections, P95 response time, and CPU & memory usage. This provides a fast, flexible, and cost-effective way to optimize your GPU usage and handle traffic spikes.

To get started and deploy on GPU-based Instances, go to the Koyeb control panel and hit the Create Service button.

If you need several GPUs, don't be shy! Schedule an onboarding session and we will grant $200 of credit to your account for a test run.

Get started with GPUs

Book an onboarding session and start with $200 of credit!

Get onboard

Get started with GPUs

To get started and deploy your first service backed by a GPU, you can use the Koyeb CLI or the Koyeb Dashboard.

As usual, you can deploy using pre-built containers or directly connect your GitHub repository and let Koyeb handle the build of your applications.

Here is how you can deploy an Ollama service in one CLI command:

koyeb app init ollama \
  --docker ollama/ollama \
  --instance-type l4 \
  --regions fra \
  --port 11434:http \
  --route /:11434 \
  --docker-command serve

That's it! In less than 60 seconds, you will have Ollama running on Koyeb using a L4 GPU.

You can then pull your favorite models and start interacting with them.

Seamless Experience for GPUs

Build, deploy, and scale your AI workloads with the best infrastructure on the market and a seamless serverless experience.

Koyeb GPUs come with the same serverless deployment experience you've come to expect on the platform with one-click deployment of Docker containers, built-in load-balancing, and seamless horizontal autoscaling, zero downtime deployments, auto-healing, vector databases, observability, and real-time monitoring.

By the way, you can pause your GPU Instances when they're not in use. This is a great way to stretch your compute budget when you don't need to keep your GPU Instances running 24/7.

Serverless GPUs on Koyeb

Launch Week #01 Day 2

Serverless GPUs public preview is the second announcement in a 5-day event of amazing releases!

Check out the recap

What's next?

Our goal is to let you easily build, run, and scale on the best accelerators using one unified platform. Our key focus for GPUs in the coming weeks is to enable scale-to-zero and improve autoscaling performance..

Are you looking for unique hardware configurations, specific GPUs, or specialized accelerators? We'd love to hear from you. We're currently adding more GPUs and accelerators to the platform and are working closely with early users to design our offering. Let's get in touch.

Best Serverless GPUs to Run Your AI Applications

To get started with Koyeb, you can sign up and start deploying your first Service today. If you want to dive deeper into our GPUs, have a look at the documentation.

Wishing you and your apps blazing-fast deployments! 🚀

Global AI Workloads Backed By High-Performance GPUs

Deploy your AI workloads on the best GPUs and accelerators for your workloads. Benefit from built-in autoscaling, autohealing, and more.

Deploy Now

Keep up with all the latest updates by joining our vibrant and friendly serverless community or follow us on X at @gokoyeb.


Deploy AI apps to production in minutes

Koyeb is a developer-friendly serverless platform to deploy apps globally. No-ops, servers, or infrastructure management.
All systems operational
© Koyeb